Lab Creating Your Own Data I

Create a folder called "GIS Data" and a subfolder called "raw"

Download this Excel file into the raw folder.

Data Cleaning

Open the file and take note of its three worksheets. The data is on the first sheet.

Look at the rows and columns. What's in the first few rows?

One row is labeled "check" — it's the sum of the cells in that column and we want to compare it to the total that we copied from the original source. Note that they do not all agree. This suggests that either the original data had errors or we made errors while transcribing. Someone should double check this. But in any case, be aware that as a researcher you need to know that data contains errors; sometimes you create them, sometimes you have to diagnose and fix them.

Although it is useful in a data table on paper to have totals, these are not useful in a database file (in part since they can always be calculated on the fly, but mainly because we want each row to correspond to a single object (here census tracts) in a set).

Delete the rows for totals and the blank row at the top so that the first row contains field names.

Scan over these and note that some are mysterious and some contain spaces or periods or slashes. All these are no-nos for field names. Let's eliminate the spaces and convert punctuation to blanks or underscore as appropriate.

Then, let's use "camelCase" on field names to make them a little more explicit (but let's keep them relatively short). Spend just a few minutes on this.

When you are done, save this file to a new folder called "clean".

Data importing

In ArcMap, add the layer Z:\DJJR\Oakland_Historical_Census_Data\1950 to your table of contents. Have a look at the attribute table and/or field list to familiarize yourself with this layer.
Use ArcCatalog to do a directory link to your H drive (or wherever you put the cleaned up Excel file) and add the Excel file (the Data worksheet — notice that all three worksheets are available here).
Now, right click on the tract layer in the table of contents and select Joins and Relates > Join. We'll try to join this geographic layer with the spreadsheet data we have just imported. And we'll do it by matching on the field Tract.
In the join dialog box we'll (1) "Joint attributes from a table, (2) "based on" the tract field in this layer, (3) we'll get data from the $data table, and (4) we'll match on field…. oh no, it's not working!


To diagnose the problem, let's back out. Cancel the current operation. Go back to the geographic layer, select show properties and examine its fields, specifically TRACT. Note the data type. Repeat for the $data layer.


We have a DATA TYPE MISMATCH. In a GIS two fields can look the same — a 1 looks like a 1 — but if one is the text string "1" and other is the number 1, they will not match.
To fix this, we'll go into our data file and tweak things. To be allowed to edit the file we'll have to remove the data file (and we might even have to close ArcMAP because it may keep the Excel file "locked").
Open the Excel file in your clean directory. There is no simple way to tell how Excel is storing numerical data. If you look at the Tract column you can't really tell. We CAN, however, FORCE ArcMap to treat the data as text when it opens the file. Here's how.
In the first cell in the column — the one containing 1 put a single apostrophe in front of the 1. This is a signal to Excel that the contents of the cell should be treated as text. Now just save the file, restart ArcMap, open the geographic layer and the Excel file and get back to trying to join the data.
Right click the tracts layer. Joins and Relates > Joins. Join to a data file, $data, on field Tract. OK.
Now look at the attribute table for the Tracts layer to verify that the columns have been added.
Next, right click on the layer and select Data > Export Data.
Click on the Browse button to the right of the "Output feature class:" dialog.
In the dialog box be sure to set Save as Type to Shapefile and point it toward your "clean" folder. Name the file OaklandTracts1950wData.


Repeat this process for 1960.

Repeat for 1970 using data downloaded from

SAVE your results to folder with your name on it under \\qnap01\GIS_Student_Projects\SOC128 F2011