what’s wrong with this picture?

Here’s a regression line from a study purporting to show a relationship between Global Index on Legal Recognition of Homosexual Orientation, a multivariate index of gay rights, and GDP, relayed by Jay Michaelson of the Daily Beast.


(via The Dish)

Do you see a problem here? That is not a particularly strong relationship to begin with, and Taiwan appears to be dragging the line considerably. (Unhelpfully, the graph itself does not report the r-squared, which is buried in the report– if I’m reading the table correctly, it’s .19, meaning 19% of the variation in GDP is explained by GLRHO.) There’s no black-and-white definition of what constitutes an outlier when it comes to simple linear regression, although there are some conventional benchmarks. But if I saw that data point myself, I would blanch. Unfortunately, the study itself does not include the word “outlier” and there is no information on typical diagnostics used to assess outliers such as studentized residuals or DFBetas. Rather than giving a lot of necessary information about the plots themselves, there’s lots of simple explanations of correlation and regression. That’s great! But give me plots and diagnostics too, please.

The report’s caveats do not fill me with more confidence. In particular:

A slight complication in measuring the relationship of GDP per capita and GILRHO comes when including a dummy variable for year fixed effects, as in row C of Table 3. Because both GDP per capita and GILRHO are trending upward for most countries, adding year dummies takes away some of the statistical effect of GILRHO (and vice versa). The correlation is still positive, with each point in the GILRHO adding approximately $320 to per capita GDP (about 3% of average GDP per capita in this sample), but that effect is only weakly statistically significant at the 15% level. That 15% level is slightly higher than normally considered a statistically significant effect, but it is still suggestive that we are not seeing this result simply by chance.

Oof. Oof oof oof. To be clear, this means that we would expect to see the observed effect merely by random chance 15% of the time. An alpha of .15 isn’t “slightly higher” than normal. It’s three times the .05 value that is itself the more forgiving, more subject to Type I error p-value we tend to use in the social sciences. This is very weak tea, guys. In my own research I would be quite unlikely to draw strong conclusions from this relationship. I’m not a subject matter expert, and I leave that analysis to them. I’m not saying we throw out this study. I am saying that we’re looking at a fairly weak relationship, utilizing a predictor made up of many types of data that are themselves subject to questions of independence of observation, and we should proceed with caution.

If we’re all going to be running regressions all the time, we really need to open up our data by providing important plots, such as residuals and qq plots, and access to data sets, so that others can try to duplicate our findings and run important diagnostic measures. (When people on sports websites run regressions of sports statistics, for example, I worry a great deal about constancy of variance.)  I don’t expect every reader to be able to perform these kinds of analyses themselves, but the more people can peer into our data, the smarter we’ll all be in the long run.  It’s great we’re doing so much with data, but let’s be careful.