What is the point of all this beautiful data? December 9, 2009

From Gapminder.com

Click on the graph above to see a truly amazing collection of data. This graph plots the Income per person of every country vs. the Life Expectancy of a person born in that country, with the relative size of the data point reflecting the size of the population. But wait – there’s more. The circles are color coded by region to more easily distinguish among areas of the world. But wait – there’s still more – you can adjust the year shown on the graph to view data from any of the last 207 years. Or hit “Play” and watch the circles grow and shrink and move across the graph on their own. If you select one or more countries on the list, and hit the play button, you end up with a colored path across the grid charting the growth, income changes and life expectancy changes of just that country. Of course, I checked out the path for the United States and noticed a dip in the life expectancy trend in the early 20th century. Pupfiction made a guess that the seeming abnormality was a result of an epidemic of disease that had a significant impact on the country that year. That made sense to me at first, but then I was wondering if widespread disease could so significantly affect the life expectancy like that. To put it in modern terms for comparison, say the life expectancy of an American born in the US last year was 85. With technological and medical developments being made all the time, the age ought to be getting older and older – maybe 86 this year (hypothetically). BUT- the H1N1 virus (swine flu) is spreading quickly and fatalities are mounting. Perhaps the levels of swine flu do not yet compare with previous plagues, but could the potential for epidemic be enough to lower the the life expectancy of an American baby born today? The part where I get stuck is that the actual average age of people who die in a given year ought not be that strong of a factor in determining the life span of people born in that year – I do not know know how life expectancy is calculated, but I imagine there must be more to it than that.

Well – clearly you see this graph has challenged me to consider things I do not normally think about. Do you spot anything interesting that you can’t quite wrap your head around? Or something enlightening that makes you wonder about the data or what it represents? Do you think I must be crazy for loving a graph I do not understand? No matter what, though, you must admit – this graph is an impressive piece of work.

Numbers Don’t Lie, or Do They? Simpson’s Paradox Explains December 2, 2009

The Numbers Guy over at The Wall Street Journal had a really interesting article today. He explains a concept called Simpson’s Paradox, which essentially says aggregated data is sometimes misleading. For example,

… in both 1995 and 1996, Derek Jeter of the New York Yankees had a lower batting average for each season than David Justice, then of the Atlanta Braves.

Combining the two years, however, Mr. Jeter had a better average. The paradox resulted from the fact that in 1995 Mr. Jeter had only 48 at-bats with a .250 average while Mr. Justice had more at-bats (411) with a .253 average. The following year, Mr. Jeter had 582 at-bats with a .314 average while Mr. Justice had only 140 at-bats with a higher average of .321, pushing the two-year average in Mr. Jeter’s favor.

Other examples of the paradox can be found in all types of data, from air travel delay statistics and medical procedure success statistics, to education and unemployment data.

In the graph below, you can see that although the unemployment rates for each of the separate groups are higher now than they were in 1983, because the size of the group with the lower rate is so much bigger, the overall unemployment rate is lower than it was in 1983.

Confused? Don’t worry about it. The lesson here is to be wary of “hard data,” and remember that statistics can still be spun to fit any argument. This WSJ graph shows that unemployment is both better than in 1983, and worse. It only depends on which point you want to make.

The State of Print October 27, 2009

Here’s an interesting graph (best to zoom in) showing the subscription rates for newspapers from 1990 to the present. What’s most interesting to me is that the only paper to achieve a noticeable amount of growth (in purely print sales, WSJ included online) is essentially a sensationalist tabloid!