Every year, thousands of entrepreneurs launch startups, aiming to make it big. This journey and the perils of failure have been interrogated from many angles, from making risky decisions to start the next iconic business to the demands of having your own startup. However, while the startup survival has been written about, how do these survival rates shake out when we look at empirical evidence? As it turns out, the U.S. Census Bureau collects data on business dynamics that can be used for survival analysis of firms and jobs.
In this tutorial, we build a series of functions in Python to better understand business survival across the United States. Kaplan-Meier Curves (KM Curves) are a product limit estimator that allows for calculation of survival of a defined cohort of businesses over time and are central to this tutorial. By comparing survival rates in various Metropolitan Statistical Areas (MSAs), we find regions that may fair far better in business survival than others.
In order to get started, we're going to first load in a series of Python packages that will allow us to build out a survival analysis:
Loading up packages follows the usual routine. If you get an error for plotly.offline,make sure you pip install it."
import io, requests, zipfile import pandas as pd from plotly.offline import download_plotlyjs, init_notebook_mode, iplot import plotly.graph_objs as go init_notebook_mode()