Data USA on Congestion

topics covered

Languages and libraries: JavaScript, shell scripting, HTML, python, D3.js, D3plus, leaflet, and jQuery

Data: Average commute times from the American Community Survey via Data USA.io and Congestion relief projects from USASpending.gov

foreword

Everyone hates traffic, and for most of us, it seems to be getting worse with every passing year. But how does traffic congestion in our home town actually compare to other cities across the country, and what is the government doing about it? These are complicated questions, but we can begin to answer them by comparing data on commuting times with data on public spending on congestion relief projects. The resulting mashup can enable us to see the extent to which government spending through formula grants and related programs from the Department of Transportation aligns with high-congestion regions. Data on commute times is collected by the Census Bureau's American Community Survey each year, and we'll access these data via Data USA's convenient API. Spending data is provided by the Treasury Department on USASpending.gov. In this tutorial, we'll show you how to build a simple web visualization which mashes up these two datasets and allows users to start to get answers to questions about congestion mitigation spending.

In this tutorial we'll walk you through all the steps of replicating our data-visualization which mashes up statistics on average commuting times with data on congestion relief projects funded by the Department of Transportation. You'll learn how to load the required software to build the mashup, how to use basic shell scripting commands to make a directory structure where your visualization app can live, how to pull the code for the app from our open-source code repository on GitHub, and how to run a simple web server. For those who want more information about using API calls to access data dynamically, we'll walk through the structure of those functions in an appendix.

Step 1: Install basic building blocks

To set up your environment to build this visualization, you'll first need to install some fundamental tools. For instructions on installing each of the tools listed below, please refer to the software's own installation page. (We've included links to those for your convenience.)

For Windows, you'll want a UNIX emulator like Cygwin or Powershell.
Mac users can use your native Terminal.
Python 2.7 or higher.

Building this visualization will require that you execute simple unix-style commands. If you're not comfortable doing so, you may want to pause here before proceeding and study a simple online tutorials such as the following: a, b, and c.

Step 2: Create root directory

With your newly-minted unix command-line skills, fire up your terminal and create a directory where the visualization software can live. It should be somewhere in the file system where you can find it again. On a windows machine, c:\Users\\sandbox might be a good place. We'll refer to this directory in the instructions below as 'PROJECT_ROOT'.

Step 3: Pull source code from github

Register as a github user on github.io.
Change directories to your new root directory.
Initiate a github project in that directory:
```
git init
```

Connect your new git project to our tutorial source code repository:

git remote add origin https://github.com/DataUSA/datausa-tutorials.git

Use git pull origin master to pull the tutorial source code for this project:
```
git pull origin master
```

Here's a tutorial from github which walks you through how to perform all these steps: https://try.github.io/levels/1/challenges/1.

Step 4: Load required libraries

Thanks to git, your new PROJECT_ROOT directory now contains a sub-directory called 'js' where we'll store javascript libraries and files. Download/unzip the following javascript libraries into that directory.

D3plus.js
leaflet.js, version 0.7.7 or later.

Step 5: Download flat files, convert to json

In your project root directory, you'll see a sub-directory for csv files where we'll store flat files containing information about congestion relief spending from USASpending.gov.

In your web browser, visit the USASpending advanced download page: https://www.usaspending.gov/Pages/AdvancedSearch.aspx. Select the following options for your data request:

Spending type:
- Grants
- Other Financial Assistance
Fiscal year:
- 2016 (or your choice)
Search within:
- States/Territories
Awarding Agency:
- Department of Transportation
Awarding Bureaus:
- Department of Transportation - [6900]
- Federal Highway Administration - [6925]
- Federal Motor Carrier Safety Administration - [6953]
- National Highway Traffic Safety Administration - [6940]
- Research and Innovative Technology Administration - [6943]
- Surface Transportation Board - formerly ICC - [6959]
Submit your request and download and unzip the result to the csv subdirectory.

Next we'll convert the file from comma-separated-value format to json (javascript object notation) to make it easier for our visualization to read it.

Go back to your command line, and change directories so you're working directory is PROJECT_ROOT/csv.
Now we're going to execute a python utility which will convert the file from cvs to json:
```
python ../utils/csv2json.py assistance.csv
```

The csv2json.py python program converts a CSV from USAspending and returns a JSON. We cover how the code works below which is adapted from http://stackoverflow.com/questions/19697846/python-csv-to-json. First we import the necessary packages:

csv to import the USAspending file,
json to export to the desired JSON format,
sys to read command line arguments,
and re to use regular expressions for splitting lines.

usage = "Usage: python csv2json.py csvFilename"

import csv #CSV library
import json #JSON library
import sys #to read command line args
import re #regular expressions library, for splitting lines

Next we define a function readFieldNames function to get the names of every field and split according to regular expression.

#----------------------------------------------------------------------#
# Helper functions                                                     #
#----------------------------------------------------------------------#
def readFieldNames(filehandle):
        #read the first line
        firstLine = filehandle.readline().rstrip()
        
        #split into fields, store in array
        fieldNames = re.split(r'\s*,\s*', firstLine)
#        sys.stderr.write(fieldNames)
        return fieldNames

For the main program, we define several functionalities. First, to parse the command-line arguments and second, to convert the passed CSV to the desired JSON with the appropriate structure to mash with Data USA.

#----------------------------------------------------------------------#
# Main functionality                                                   #
#----------------------------------------------------------------------#

#Check arguments for validity
if (len(sys.argv) != 2):
        sys.exit(usage)

#CSV filename should be first argument
csvFilename = sys.argv[1]
if (csvFilename.endswith('.csv') == False):
    sys.exit("Input filename should be in .csv format.")

#Formulate name for json file
jsonFilename = csvFilename.replace('.csv', '.json')
    
try:
    csvFile = open(csvFilename, 'r')
    fieldNames = readFieldNames(csvFile)
    csvFile.close
except IOError:
    sys.exit("Error: couldn't open input file.")

#Open the json file for writing
jsonFile = open(jsonFilename, 'w')

#Reopen the csvfile, this time to read the values
csvFile = open(csvFilename, 'r')
reader = csv.DictReader(csvFile)

'''fieldnames = ('unique_transaction_id','transaction_status','fyq',
    'cfda_program_num','sai_number','account_title','recipient_name',
    'recipient_city_code','recipient_city_name','recipient_county_code',
    'recipient_county_name','recipient_zip','recipient_type','action_type',
    'agency_code','federal_award_id','federal_award_mod',
    'fed_funding_amount','non_fed_funding_amount','total_funding_amount',
    'obligation_action_date','starting_date','ending_date','assistance_type',
    'record_type','correction_late_ind','fyq_correction',
    'principal_place_code','principal_place_state','principal_place_cc',
    'principal_place_country_code','principal_place_zip',
    'principal_place_cd','cfda_program_title','agency_name',
    'project_description','duns_no','duns_conf_code','progsrc_agen_code',
    'progsrc_acnt_code','progsrc_subacnt_code','receip_addr1',
    'receip_addr2','receip_addr3','face_loan_guran','orig_sub_guran',
    'fiscal_year','principal_place_state_code','recip_cat_type',
    'asst_cat_type','recipient_cd','maj_agency_cat','rec_flag',
    'recipient_country_code','uri','recipient_state_code','exec1_fullname',
    'exec1_amount','exec2_fullname','exec2_amount','exec3_fullname',
    'exec3_amount','exec4_fullname','exec4_amount','exec5_fullname',
    'exec5_amount','last_modified_date')
'''

#Empty list to store the values
output = []

#Create new associative array for key/value pairs, append to output list
for each in reader:
    row = {}
    for field in fieldNames:
        row[field] = each[field]
    output.append(row)

json.dump(output, jsonFile, indent=2, sort_keys=True)

This utility will have created a large file called assistance.json. Move that file to the json directory:

mv assistance.json ../json/

Step 6: Start web server

In order for your browser to be able to render the visualization, you need to have a web server running which can serve up the files to the browser. There are various ways to run your own web server (you can download and run XAMPP or similar). But let's use a simpler solution, and invoke the web server from the command line.

Open another terminal window.
Change directories to your PROJECT_ROOT directory.
Execute this command:
```
python -m SimpleHTTPServer 8000
```

This will start serving HTTP files from port 8000.

Step 7: Fire up your mashup in the browser and explore!

Now you're ready to open your browser, and direct it to this url: http://localhost:8000/tutorial_layout.html
Select your state and metro area from the dropdowns, and you're off to the races.

Appendix: Understanding the Data USA API call

Let's look under the hood a little bit to understand how this mashup pulls data on commuting times from Data USA's API. This functionality is found in the file ./js/build_commute_times_viz.js, which also calls supporting functions contained in ./js/barGraphHelperFunctions.js.

This set of functions calls the Data USA three times – once to get the commute times for all MSAs, a second time to get the names for each metropolitan statistical area (MSA) to label the bars, and a third time to get the national average. (Future versions of Data USA will allow calls 1 and 2 to be accomplished with a single API call.) The results from each API call are folder together and reformatted into an array of javascript objects such that d3plus can render them into a nice bar graph.

We use an asynchronous XMLHTTP request to retrieve the information in each of the three API calls. Inspired by Marijn Haverbeke's Eloquent Javascript book (pp 309ff), we perform the call by instantiating a new promise for each request which allows us to perform the calls in sequence, as each one completes. The code snippet below shows how the XMLHttpRequest is wrapped in a function which creates a new promise object, which returns succeed once the http get request completes.

function get(url) {
    return new Promise(function(succeed, fail) {
        var req = new XMLHttpRequest();
        req.open("GET", url, true);
        req.addEventListener("load", function() {
            if (req.status < 400)
                succeed(req.responseText);
            else
                fail(new Error("Request failed: " + req.statusText));
        });
        req.addEventListener("error", function() {
            fail(new Error("Network error"));
        });
        req.send(null);
    });
}

As each promise reports completion to us, we fold the resulting data together using utility functions. For example, to combine the average commuting time value for each MSA with the name of the MSA, we use the function in the code snippet below:

function addMSANames(data, msaNames) {
    var dataWithNames = [];

    //Only need fields 8 and 9
    //Add field 7 to back end of data where field 8 matches geo
    for (var i = 0; i < data.length; i++) {
        var newDataRow = {};
        newDataRow = data[i];
        for (var j = 0; j < msaNames.length; j++) {
        if (msaNames[j][8] == data[i].geo) {
            newDataRow.msaName = msaNames[j][7];
            break;
        }
        }
        dataWithNames.push(newDataRow);
    }
    return dataWithNames;
}

The way this function works is we first create an empty array to hold the results of the merge. Then we loop through each object of the commute times data array. For each of those objects, we find the row of the MSA names data which matches, append the name of the MSA to our object, and push the row into our results array. Finally, we return the results array of objects.

IMPACT OF SPENDING ON TRAFFIC CONGESTION

by PETER VIECHNICKI, STRATEGIC ANALYSIS MANAGER, DELOITTE SERVICES LP,

ZACH WHITMAN, SENIOR CONSULTANT, DELOITTE ADVISORY LP,

JONATHAN SPEISER, DATAWHEEL

DAVE LANDRY, DATAWHEEL

edited by STAR YING, COMMERCE DATA SERVICE

JUNE 2016