To Map or Not to Map

to Map or not to Map?
the next time you look at one,
think about who doesn’t appear in it.
ZIP File TAR Ball GitHub

Use Python, PIL, and PyProj
to show latent bias in maps

Credits
Author

Joshua Tauberer, Ph.D.

Edited by the Commerce Data Service

Tyrone Grandison, Ph.D.
Star Ying

Foreword from the Commerce Data Service:
In today's world, data is king. We are surrounded by lots of it - different types, coming in at different rates, representing multiple facets of things that we are interested in. So much of it that we often find it difficult to make sense of it all. Thus, we hunt for intuitive and convenient ways to represent the data so that we can make sense of it. The most commonly used visualization currently is the map.

Have you ever thought about the inherent bias that is in a map? Joshua Tauberer's recent post on Medium (avaialble here) provides an interesting and compelling discussion of the topic. As data practitioners, we more easily question and discuss the appropriateness of using one visual tool or another. Unfortunately, this rigor is not as common when it comes to maps.

The Commerce Data Service was so impressed with Joshua's analysis and message that we collaborated with him to create a Commerce Data Usability Project tutorial from his Medium post.

Maps can lie
Maps tell a story. And maps can also be used to deceive. Think back to the famous episode of The West Wing (here) about the almost unconscious choice by most to accept and use a deceptive map projection — Greenland is not that big and Africa is not that small.

But there’s another problem: Because we don’t all live evenly spaced throughout the world, demographic maps favor populations that live in low density areas. Here’s why that’s meaningful:

Although racial minorities make up 26% of the U.S. population, they account for just 16% of the space on typical demographic maps of the United States. This has real policy implications.
The 50% of the U.S. population in the most dense part of the country lives in just 1% of the nation’s land area. Who actually lives in the 99% of the country we see?
Maps of political sentiment normally give more space to a specific political group. For example, although Obama took just over 50% of the vote in the 2012 election, his voters account for only 38% of the space on a typical election map — making it look like his opponent should have won.

It’s not surprising that there are disparities in how populations are represented following the urban-rural divide. But it is alarming just how large those disparities are.

Just the Basics
Let’s start with the basics: Most Americans in the contiguous United States live in or near metropolitan areas.

All of those people — 50% of the U.S. population — inhabit just 1% of the land area:

And, the converse is also true: about 50% of the country’s land area is inhabited by just 1% of the population.

In a typical demographic map, counties (or equivalent) are colored according to some measurement. Grouping people into counties evens out the population density, slightly. When you group individuals into counties, half of the map (57%) is inhabited by 5% of the population. Here’s how that looks, with most of the Midwest and West empty:

That’s a problem.

No one actually lives in most of the places demographic maps portray.

It’s the same 5% of the population who live in most of these maps who we are looking at over and over again.

And most of the other parts of the map are not representative of the population as a whole either because urban areas are disproportionately squeezed.

How to know
To quantify how people show up on a map, Josh combined the U.S. Census’s 2010–2014 American Community Survey 5-Year Estimates (ACS) with the U.S. Census’s 2014 TIGER geospatial data, which says where those people are located. He made a separate image of the 71,954 Census tracts and the 3,108 counties in the contiguous United States and counted up how many pixels each tract and county took up in the image. (See the end of the post for further details.)

When he mentions a “typical map”, he is referring to county-by-county demographic maps, and when he mentions “land area” he is using the smaller Census tracts to see where exactly people live. When he says a population “accounts for” a proportion of the pixels on a map, he has apportioned parts of each pixel on the map to demographic groups based on who is living in each pixel.

This is a problem for anyone who lives in a city.
Although 5% of the U.S. population takes public transit to work, they account for only 1% of the pixels on a typical map. 38% of the U.S. population lives in multi-family housing structures (what the ACS calls single-family-attached and 2-or-more-unit housing structures), but those individuals account for just 29% of the pixels on a typical map.

When Josh previously wrote about this issue in 2013, he found that six congressional districts, all in New York City, are smaller than one pixel in a typically sized map. One-person-one-vote loses meaning if some votes are too small to see.

And it’s a problem for racial minorities
And it’s a problem for racial minorities (those who identified as a race other than “white”). These individuals are given half as much space in a typical map per capita as white Americans!

That’s because racial minority individuals are even more clustered in high-density areas than the population as a whole. Here’s where 95% of racial minority individuals live:

Here’s how that looks on a county-by-county map (if you want to compare that to the map earlier of the U.S. population as a whole):

If you apportion all of the pixels on a typical map by the demographic breakdown in the pixel, although racial minorities make up 26% of the U.S. population, they account for just 16% of the pixels. In other words, for every pixel that represents a white person, only 0.53 pixels represent a racial minority.

The next time you look at a map, think about who doesn’t appear in it.

The rural-urban divide
As mentioned in the introduction, the rural-urban divide also affects political maps. Using the 2012 presidential election results, while Democrats were 51% of the population they accounted for only 38% of the space on a typical election map. Or, for every pixel that represents a Republican, only 0.59 pixels represent a Democrat.

The last demographic measurements that Josh looked at were household income and poverty status, and fortunately the distortion doesn’t affect, if not helps, these individuals:

16% of the population lives with an income below the poverty line (as defined by ACS). They account for 17% — about the same proportion — of the space on a typical map.
The 50% of the population living in the tracts with the lowest median household income accounts for 79% of the space on a typical map — i.e. far more than their proportional share.

Poverty affects both rural and urban areas, of course, so it makes some sense that we wouldn’t see a distortion here. The distortion in income is probably explained just by the cost of living being lower in rural areas.

What is there to do about it?
For one, use a cartogram wherever possible. A cartogram allots equal area in a map to equally meaningful units. For a demographic map, a cartogram would ensure that every individual is represented by the same amount of space.

The New York Times often uses cartograms — though their use of them has has been on the decline recently. The picture above sizes U.S. states by the number of health insurance plans available.

If this were about demographic statistics, the states would be sized so that their area is proportional to their population.

Cartograms are often derided as ugly. And bad ones certainly are. But they can also be quite elegant, as in the The Telegraph’s U.K. election results map in the form of a grid of hexagons (which I’m told is common for representing the U.K.). Each hex represents one seat in Parliament.

It’s hard to make cartograms, and even harder to make cartograms that are intuitive to understand. That’s in part because of readers’ lack of familiarity, which can be fixed by creating more cartograms.

Alternatively, don’t use a map.

No map at all is better than a map that perpetuates injustice.
So as Josh's friend Ben Klemens sums up-

.@joshdata asks why we keep drawing maps of land when it's the people we're interested in: https://t.co/9U0Cdq365I
— Ben Klemens (@b__k) February 13, 2016

This isn’t the huge-Greenland-tiny-Africa problem.
You’ve probably seen the West Wing episode when cartographers explain that maps distort the relative size of countries. Greenland looks big but it isn’t. This is a well-known problem with the Mercator projection, which is fantastic if you are navigating the high seas but awful if you are coloring in the boundaries of countries to display things like population demographics.

With an equal-area map projection like Gall-Peters, every area on the Earth’s surface is represented by an appropriately sized area on the map, but at the cost of distorting shapes.

JAn equal-area projection was used for the maps in the post. The problem being discussed here isn’t the projection: it’s the distribution of people.

Methodology
Josh put each of the the 71,954 Census tracts and 3,108 Census counties in the contiguous United States that have at least one resident into an 800-by-500 pixel map of the contiguous United States in an Albers Equal Area projection. Then he counted how many pixels each tract and county took up in the final map. Where these geographic units co-occurred at a pixel, he apportioned the pixel equally among the units. Tracts and counties were drawn no smaller than a single pixel.

He then apportioned the pixels to people according to population ratios within each tract or county. For instance, if a tract of 4,000 people is 75% right-handed and that track is represented by 4.8 pixels on the map, he apportioned 3.6 pixels to right-handed individuals and 1.2 pixels to left-handed individuals. That’s what he means when he says a population “accounts for” a certain number of pixels.

Josh's images, except where noted, are plotting tracts. That’s because tracts are a better representation of how much land area people are inhabiting, and they produce nicer images. But most demographic maps are drawn at the level of counties. So when reporting numbers, Josh used “land area” to refer to an analysis based on tracts and “typical map” to refer to an analysis based on counties. Most of the numbers come out around the same in both levels of detail.

Getting Started
In this tutorial, we'll go over how to collate the required data and then walk through creating the visualizations. Follow along in the Jupyter Notebook below or check out the code files at Github.

Download Jupyter Notebook Version