Everyone hates traffic, and for most of us, it seems to be getting worse with every passing year. But how does traffic congestion in our home town actually compare to other cities across the country, and what is the government doing about it? These are complicated questions, but we can begin to answer them by comparing data on commuting times with data on public spending on congestion relief projects. The resulting mashup can enable us to see the extent to which government spending through formula grants and related programs from the Department of Transportation aligns with high-congestion regions. Data on commute times is collected by the Census Bureau's American Community Survey each year, and we'll access these data via Data USA's convenient API. Spending data is provided by the Treasury Department on USASpending.gov. In this tutorial, we'll show you how to build a simple web visualization which mashes up these two datasets and allows users to start to get answers to questions about congestion mitigation spending.
In this tutorial we'll walk you through all the steps of replicating our data-visualization which mashes up statistics on average commuting times with data on congestion relief projects funded by the Department of Transportation. You'll learn how to load the required software to build the mashup, how to use basic shell scripting commands to make a directory structure where your visualization app can live, how to pull the code for the app from our open-source code repository on GitHub, and how to run a simple web server. For those who want more information about using API calls to access data dynamically, we'll walk through the structure of those functions in an appendix.
To set up your environment to build this visualization, you'll first need to install some fundamental tools. For instructions on installing each of the tools listed below, please refer to the software's own installation page. (We've included links to those for your convenience.)
Building this visualization will require that you execute simple unix-style commands. If you're not comfortable doing so, you may want to pause here before proceeding and study a simple online tutorials such as the following: a, b, and c.
With your newly-minted unix command-line skills, fire up your terminal and create a directory where the visualization software can live. It should be somewhere in the file system where you can find it again. On a windows machine, c:\Users\
git init
git remote add origin https://github.com/DataUSA/datausa-tutorials.git
git pull origin master
Here's a tutorial from github which walks you through how to perform all these steps: https://try.github.io/levels/1/challenges/1.
Thanks to git, your new PROJECT_ROOT directory now contains a sub-directory called 'js' where we'll store javascript libraries and files. Download/unzip the following javascript libraries into that directory.
In your project root directory, you'll see a sub-directory for csv files where we'll store flat files containing information about congestion relief spending from USASpending.gov.
In your web browser, visit the USASpending advanced download page: https://www.usaspending.gov/Pages/AdvancedSearch.aspx. Select the following options for your data request:
Grants
Other Financial Assistance
2016 (or your choice)
States/Territories
Department of Transportation
Department of Transportation - [6900]
Federal Highway Administration - [6925]
Federal Motor Carrier Safety Administration - [6953]
National Highway Traffic Safety Administration - [6940]
Research and Innovative Technology Administration - [6943]
Surface Transportation Board - formerly ICC - [6959]
Next we'll convert the file from comma-separated-value format to json (javascript object notation) to make it easier for our visualization to read it.
PROJECT_ROOT/csv
.python ../utils/csv2json.py assistance.csv
The csv2json.py
python program converts a CSV from USAspending and returns a JSON. We cover how the code works below which is adapted from http://stackoverflow.com/questions/19697846/python-csv-to-json. First we import the necessary packages:
csv
to import the USAspending file,json
to export to the desired JSON format,sys
to read command line arguments,re
to use regular expressions for splitting lines.usage = "Usage: python csv2json.py csvFilename"
import csv #CSV library
import json #JSON library
import sys #to read command line args
import re #regular expressions library, for splitting lines
Next we define a function readFieldNames
function to get the names of every field and split according to regular expression.
#----------------------------------------------------------------------#
# Helper functions #
#----------------------------------------------------------------------#
def readFieldNames(filehandle):
#read the first line
firstLine = filehandle.readline().rstrip()
#split into fields, store in array
fieldNames = re.split(r'\s*,\s*', firstLine)
# sys.stderr.write(fieldNames)
return fieldNames
For the main program, we define several functionalities. First, to parse the command-line arguments and second, to convert the passed CSV to the desired JSON with the appropriate structure to mash with Data USA.
#----------------------------------------------------------------------#
# Main functionality #
#----------------------------------------------------------------------#
#Check arguments for validity
if (len(sys.argv) != 2):
sys.exit(usage)
#CSV filename should be first argument
csvFilename = sys.argv[1]
if (csvFilename.endswith('.csv') == False):
sys.exit("Input filename should be in .csv format.")
#Formulate name for json file
jsonFilename = csvFilename.replace('.csv', '.json')
try:
csvFile = open(csvFilename, 'r')
fieldNames = readFieldNames(csvFile)
csvFile.close
except IOError:
sys.exit("Error: couldn't open input file.")
#Open the json file for writing
jsonFile = open(jsonFilename, 'w')
#Reopen the csvfile, this time to read the values
csvFile = open(csvFilename, 'r')
reader = csv.DictReader(csvFile)
'''fieldnames = ('unique_transaction_id','transaction_status','fyq',
'cfda_program_num','sai_number','account_title','recipient_name',
'recipient_city_code','recipient_city_name','recipient_county_code',
'recipient_county_name','recipient_zip','recipient_type','action_type',
'agency_code','federal_award_id','federal_award_mod',
'fed_funding_amount','non_fed_funding_amount','total_funding_amount',
'obligation_action_date','starting_date','ending_date','assistance_type',
'record_type','correction_late_ind','fyq_correction',
'principal_place_code','principal_place_state','principal_place_cc',
'principal_place_country_code','principal_place_zip',
'principal_place_cd','cfda_program_title','agency_name',
'project_description','duns_no','duns_conf_code','progsrc_agen_code',
'progsrc_acnt_code','progsrc_subacnt_code','receip_addr1',
'receip_addr2','receip_addr3','face_loan_guran','orig_sub_guran',
'fiscal_year','principal_place_state_code','recip_cat_type',
'asst_cat_type','recipient_cd','maj_agency_cat','rec_flag',
'recipient_country_code','uri','recipient_state_code','exec1_fullname',
'exec1_amount','exec2_fullname','exec2_amount','exec3_fullname',
'exec3_amount','exec4_fullname','exec4_amount','exec5_fullname',
'exec5_amount','last_modified_date')
'''
#Empty list to store the values
output = []
#Create new associative array for key/value pairs, append to output list
for each in reader:
row = {}
for field in fieldNames:
row[field] = each[field]
output.append(row)
json.dump(output, jsonFile, indent=2, sort_keys=True)
This utility will have created a large file called assistance.json. Move that file to the json directory:
mv assistance.json ../json/
In order for your browser to be able to render the visualization, you need to have a web server running which can serve up the files to the browser. There are various ways to run your own web server (you can download and run XAMPP or similar). But let's use a simpler solution, and invoke the web server from the command line.
python -m SimpleHTTPServer 8000
This will start serving HTTP files from port 8000.
Let's look under the hood a little bit to understand how this mashup pulls data on commuting times from Data USA's API. This functionality is found in the file ./js/build_commute_times_viz.js
, which also calls supporting functions contained in ./js/barGraphHelperFunctions.js
.
This set of functions calls the Data USA three times – once to get the commute times for all MSAs, a second time to get the names for each metropolitan statistical area (MSA) to label the bars, and a third time to get the national average. (Future versions of Data USA will allow calls 1 and 2 to be accomplished with a single API call.) The results from each API call are folder together and reformatted into an array of javascript objects such that d3plus can render them into a nice bar graph.
We use an asynchronous XMLHTTP request to retrieve the information in each of the three API calls. Inspired by Marijn Haverbeke's Eloquent Javascript book (pp 309ff), we perform the call by instantiating a new promise for each request which allows us to perform the calls in sequence, as each one completes. The code snippet below shows how the XMLHttpRequest is wrapped in a function which creates a new promise object, which returns succeed once the http get request completes.
function get(url) {
return new Promise(function(succeed, fail) {
var req = new XMLHttpRequest();
req.open("GET", url, true);
req.addEventListener("load", function() {
if (req.status < 400)
succeed(req.responseText);
else
fail(new Error("Request failed: " + req.statusText));
});
req.addEventListener("error", function() {
fail(new Error("Network error"));
});
req.send(null);
});
}
As each promise reports completion to us, we fold the resulting data together using utility functions. For example, to combine the average commuting time value for each MSA with the name of the MSA, we use the function in the code snippet below:
function addMSANames(data, msaNames) {
var dataWithNames = [];
//Only need fields 8 and 9
//Add field 7 to back end of data where field 8 matches geo
for (var i = 0; i < data.length; i++) {
var newDataRow = {};
newDataRow = data[i];
for (var j = 0; j < msaNames.length; j++) {
if (msaNames[j][8] == data[i].geo) {
newDataRow.msaName = msaNames[j][7];
break;
}
}
dataWithNames.push(newDataRow);
}
return dataWithNames;
}
The way this function works is we first create an empty array to hold the results of the merge. Then we loop through each object of the commute times data array. For each of those objects, we find the row of the MSA names data which matches, append the name of the MSA to our object, and push the row into our results array. Finally, we return the results array of objects.