GEOCODE-LAB

Create a working directory for this lab.

Go to Zillow.com and look under Homes > Recent Home Sales. You should see something like this:

image001.jpg

Click on the first entry to familiarize yourself with the data.

Now select the data for copying, starting in the upper left of the data display area below the map and dragging down to the lower right. You may need to scroll down and then complete the selection by Shift-Clicking in the lower right corner.

Open Microsoft Word and paste the clipboard. What do you see?

The default paste is likely to be in HTML format — in other words, it looks just like the webpage. That's nice, but we just want the data.

Undo the paste with Cntrl-Z and then click the paste-pulldown up on the left side of the home ribbon. Select Paste Special > Unformatted Text.

You should see something like this:

image002.jpg

Here we see the contents of two data records (rows, recall, in a data table), but they are on separate lines with CRLF (end of line, "carriage return line feed," or "paragraph marks") at the end of each line. We need to do a little data cleaning.

We will not carry out every step here, just a few to give you an idea of what is involved.

What we want is a "tab delimited" file (tab characters between the data fields within each record). Our fields will be

Address AptNumber City State Neighborhood Price EstimatedValue Mortgate Beds Baths Lot DateSold YearBuilt Type PricePerSqft

Step 1. Do a search and replace that swaps ", Oakland,CA" for "^tOakland^tCA^t". This separates address, city, state, and neighborhood by tab characters.

Next, change "Beds: " and then "Baths: " to "^t". Tabs show up like arrows pointing right and the characters line up to tab stops (by default each half inch across the page).

image003.jpg

We would then proceed similarly for all the other fields. The last thing we will do is to get rid of the paragraph marks at the end of each line. But we cannot just eliminate all the paragraph marks because we need one at the end of each record.

To handle this we do a little three-step. First, search for all occurrences of two paragraph marks in a row (^p^p) and replace them with QQQ (an arbitrary string of text that is not likely to actually appear in your data). Second, change all the remaining paragraph marks to nothing — not a space, nothing! Third, change the QQQs back to a paragraph mark. At this point we'd have something like this (not quite like this since "in real life" we would have finished the clean up job including putting those field names we mentioned above on the first line — for the lab, you get to start over with a file that's already cleaned up in the next step):

image003a.jpg

We would then save this file as a "text only" file (.txt).

Importing data into Excel

Excel can read the tab delimited file we produced if we tell it to look for text files. Download copy of the data here (you might have to right click) — save it in your working directory.

Open Excel and ask it to open this tab-delimited text file. At the first step of the text import wizard select "delimited" and then at the second, indicate that it is tab delimited. At the third, just click Finish.
Save the file as a regular Excel file (.xlsx).

Can we open it with ArcGIS?

Start ArcGIS and use ArcCatalog to connect to your working directory and add the data file to an otherwise blank map.

If it is not already there, open the Geocoding Toolbar. Specify the 10.0 Northamerican Geocode Service and select the middle icon (Geocode addresses). Confirm the addresslocator and then fill in the appropriate information in the geocode dialog box — be sure to direct ArcGIS to put the geocode result where you want it and to name it something that makes sense for you.

image004.jpg

Export the geocode result as a shapefile with a reasonable name (to your working directory). When it asks whether to add it to the map, say yes. Then remove the geocode results and the data file. Add the OAK_road layer to the map and convince yourself that the geocoding worked by zooming in and comparing a few points with the road segment next to them. Use the info tool to look at the objects and the ruler tool to estimate how close these are.

Why might the results be as you found them?

After you have done this, close everything.

Working with Clean Data

WHAT????? Download this Excel file which contains the data with all the apartment numbers moved to another field and three extra fields added which code dummy variables for each type of structure (single family, condo, and multi — multifamily or multiunit). We'll use these to do some counting. Take a look at the attribute table to make sure you know what we mean.

Working with Geocoded data

Geocode the new Excel file (you'll have to use ArcCatalog to connect to the folder and then use Add Data to add it) following procedure as above.

Add the districts layer from BasicOaklandGIS.

Now do a join of the polygon layer and the point layer. The join will be ON the polygon layer. If all is well, ArcGIS will recognize the point layer, but if not, it should be the geocode result you just produced.

In the join dialog, specify that you would like Average, Sum, Minimum, and Maximum.

After the join is complete, take a look at the attribute table of the join result. Scan across at all the new fields. Don't proceed until you are sure you understand what happened.

Now let's produce some thematic layers. First, let's put graduated symbols on our district layer using the count of the total number of houses sold.

Next, let's create a pie chart that includes the number of single family houses, condos, and multi-family houses with the pie charts scaled by the total number of home sales.

Create an attractive map that shows the pie charts along with streets, district lines, and the actual houses. Use labels to make things intelligible according to your taste.

Get these into the layout window and print a nice copy of your map. If possible, make a pdf copy and save to your folder on the server. Alternatively, do a screen shot and put in a word document.