We will download some data in PDF form, scrape and clean, join and map.
The data will be state level GINI coefficients compiled by the Census Bureau for 2010.
Step one: go to the data : page 11 of this census report.
Step two: copy the data from the PDF to a word document. Get rid of the extraneous spaces, punctuation, etc. with a series of search and replace operations. Hint: take aim first at the <period><space><period>. If you later change all <space> to <tab> be sure to go back and fix the space between pieces of state names like "New Mexico."
Step 3: when it's clean and tab delimited, copy and paste to Excel worksheet (or save as txt only file and import into Excel).
Step 4: copy over the headers (this is a by hand operation for the most part). Here's the ones I used:
State | GiniState | GiniStateMOE | GiniQ1 | GiniQ1MOE | GiniQ2 | GiniQ2MOE | GiniQ3 | GiniQ3MOE | GiniQ4 | GiniQ4MOE | GiniQ5 | GiniQ5MOE |
Step 5: save the excel file
Step 6 : Get the boundary files for the states from http://census.gov (click on TIGER link, then, »> 2010 TIGER Files »> Tracts »> All in state »> Save ZIP file to, say, desktop. Then extract to working directory.)
Step 7: New map, add the states, add the data file. Open attribute tables to figure out names of fields you will join on. Do the join.
Step 8: Make a few thematic maps using the GINI coefficient data.
PART TWO
Let's repeat for population change (see bottom of this page
Part Three
This time, we want to get a state's county layer from the TIGER files — pick your favorite state.
Now go to census site American Factfinder GEOGRAPHIES > COUNTY > ALL counties in X