TOUGH CROWD

A DEEP DIVE INTO BUSINESS DYNAMICS

by GUO LUI + DAVE GOODSMITH, DATASCIENCE, INC
edited by JEFF CHEN, COMMERCE DATA SERVICE
SEPTEMBER 2016
As part of the Commerce Data Usability Project, DataScience, Inc. in collaboration with the Commerce Data Service has created a tutorial that utilizes Census' Business Dynamics API to understand business survival rates in US Cities. If you have question, feel free to reach out to the Commerce Data Service at DataUsability@doc.gov.


INTRODUCTION:

Every year, thousands of entrepreneurs launch startups, aiming to make it big. This journey and the perils of failure have been interrogated from many angles, from making risky decisions to start the next iconic business to the demands of having your own startup. However, while the startup survival has been written about, how do these survival rates shake out when we look at empirical evidence? As it turns out, the U.S. Census Bureau collects data on business dynamics that can be used for survival analysis of firms and jobs.

In this tutorial, we build a series of functions in Python to better understand business survival across the United States. Kaplan-Meier Curves (KM Curves) are a product limit estimator that allows for calculation of survival of a defined cohort of businesses over time and are central to this tutorial. By comparing survival rates in various Metropolitan Statistical Areas (MSAs), we find regions that may fair far better in business survival than others.

GETTING STARTED:

In order to get started, we're going to first load in a series of Python packages that will allow us to build out a survival analysis:

  • Basics
    • io. Provides the Python interfaces to stream handling.
    • requests. Allows Python to send 'organic, grassfed' HTTP
    • zipfile. Provides tools to create, read, write, append, and list a ZIP file
  • Data
    • plotly. A data visualization library.
    • pandas. A library the allows for easy data manipulation and processing.

Loading up packages follows the usual routine. If you get an error for plotly.offline,make sure you pip install it."

In [1]:
import io, requests, zipfile
import pandas as pd
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
import plotly.graph_objs as go

init_notebook_mode()