Stats I

Spatial Analysis: Getting the Most Information from Data

I start with a map of Oakland and two point layers, Points1 and Points2. The latter represent, say, three social service providers and the former represent potential clients. Here's the data.

map01.jpg

Table CLIENTS consists of 20 points, named A through T.  Each has two data fields, C_F1 and C_F2. Table SVC_CENTERS consists of 3 points, named "West," "East," and "North." Each has two data fields, also named generically: SC_F1 and SC_F2.

clients_atttbl.gif
svc_centers_atttbl.gif

Spatial Joins

A spatial join creates a new table by joining two tables on the basis of spatial relationships (e.g., is nearest to, is contained by, etc.).

Depending on what kind of feature class we have (point, line, or polygon), some relationships make sense, some don't; one point can't contain another, for example.

If we right click on a feature class in the table of contents and select Join on the context menu and say we want to do a spatial join (vs. an attribute join) we get the dialog to the right.

Pay attention to the language of the dialog.  It says "What do you want to join TO THIS LAYER?"  In other words, the layer we start with is the primary partner in this matching process.

The first thing I specify is what table I am going to join with. Once I specify this, the program takes note of what kind of geometry I am joining to what and presents me with the appropriate options.

Try selecting the Places layer (polygons) — to each point record (row) will be appended all the atributes of either the polygon it is inside of or closest to because that's what makes sense when it's polygons to points. Can you predict what it would be if it were points to polygons? Right: either each polygon gets a summary of all the points inside it OR each polygon gets all the attributes of the point closest to its boundary (along with a distance field).

If we join points to points then either each point gets a summary of the numeric attributes of points that are closest to it OR each point gets THE attributes of THE point closest to it and a distance field.

join-dialog-01.gif

If I want to create a table of clients showing information about the nearest service center, then I start with CLIENTS table and join the SVC_CENTERS table to it using "option B" (that is, the lower of the two join options).

join-dialog-C+SC_B.gif join-output-C+SC_B.gif

For each client we now know how far s/he is from the nearest center.  We can use ArcMap to draw a histogram of the results (I'm confused about the distance units here — need to investigate —DJR).

join-histogram-C+SC_B.gif

OK, what NEXT? What if I were to choose "Option A" here? Let's look at what it says:

join-dialog-optionA-text.gif

Each client record would get a summary of the service center data for the service center(s) that were nearest to that client record (that is, the service centers that are closer to it than to other clients. This doesn't really make much sense.  Thus, in this case, the "A" option is not something we want to do.

Let's go back and think about the join in the other direction — that is, starting with the service centers.  Here's the dialog box that I get:

join-dialog-SC+C_A.gif

Here our two options are these

OPTION A says "Each SVC CTR point will be given a summary of the numeric attributes of the CLIENT points that are closest to it (i.e., closer to that service center than to one of the others)."  We can then choose which, if any of the summary measures we will see. We'll always at least get a COUNT field that tells us how many clients are closer to this service center than to others.

OPTION B says "Each SVC CTR point will get all the attributes of the nearest client."  Like option A for joining centers to clients, this doesn't really make any sense.

When we run option A, with boxes checked for sum, average, min, max, and standard deviation we get the output shown below (I've hidden a bunch of the standard ID fields and such to make this fit on the page).

Study this a bit so that you understand how the checkboxes in the dialog correspond to the fields we see in the output.<br>

If clients always go to the nearest center, how many go to each center?

Which center has the widest range of F2 (perhaps this is client age)?

Suppose client field F1 represents income.&nbsp; How do the centers compare in terms of the average income of their clients?

join-output-SC+C_A.gif