It’s really to the credit of FiveThirtyEight that they’re making some of their data publicly available. That open access comes in part thanks to this wonderfully smart post by Brian Keegan, a postdoc from Northwestern. In it, Keegan advocates for availability of data by doing a replication of research by FiveThirtyEight’s Walt Hickey. In it, Hickey uses data concerning the Bechdel test to argue that, contrary to conventional Hollywood wisdom, movies featuring prominent women characters do not suffer at the box office. Or, as Keegan points out, he demonstrates that– but goes a little further than his data really shows.
Here’s the nut of Keegan’s piece, for me:
Both models (whatever their faults, and there are some as we will explore in the next section) apparently produce an estimate that the Bechdel test has no effect on a film’s financial performance. That is to say, the statistical test could not determine with a greater than 95% confidence that the correlation between these two variables was greater or less than 0. Because we cannot confidently rule out the possibility of there being zero effect, we cannot make any claims about its direction.
Hickey argues that passing the test “didn’t hurt its investors’ returns”, which is to say there was no significant negative relationship, but neither was there a significant positive relationship: The model provides no evidence of a positive correlation between Bechdel scores and financial performance. However, Hickey switches gears an in the conclusions, writes:
…our data demonstrates that films containing meaningful interactions between women do better at the box office than movies that don’t…
I don’t know what analysis supports this interpretation. The analysis Hickey just performed, again taking the findings at their face, concluded that “passing the Bechdel test did not have any effect on a film’s gross profits” not “passing the Bechdel test increased the film’s profits.” While Bayesians will cavil about frequentist assumptions — as they are wont to do — and the absence of evidence is not evidence of absence, the “Results Differ finding” is not empirically supported in any appropriate interpretation of the analysis. The appropriate conclusion from Hickey’s analysis is “there no relationship between the Bechdel test and financial performance,” which he makes… then ignores.
(You should read the whole thing.)
This is an important point. I have sympathy for Hickey; the culture in journalism is to push for the stronger and (crucially here) more easily-interpretable conclusion. But as Keegan says, Hickey’s research shows the irrelevance of the Bechdel test to box office gross. That becomes a perfectly intuitive result when you survey the list of movies that satisfy the Bechdel test. The test is intentionally a low bar, and while it’s incredible how many movies don’t pass it, it’s so broad that there’s little reason to think that the result could affect the box office.
The beauty of it is that, with access to the data set, I don’t have to take Keegan’s word for it. Here’s my own simple regression analysis, which took all of about 5 minutes to conduct. First, a one-way ANOVA to demonstrate how influential the Bechdel test is on the international gross of movies in Hickey’s data set.
At first blush, this looks like a meaningful result– with a p-value of .0014, it looks like the result of the Bechdel test has a highly significant impact on international gross. However, initial enthusiasm would probably be dampened by the r-squared of .005749. R-squared (which is simply the Pearson r, the most commonly used coefficient of correlation for interval scale data, squared) describes the percentage of change in the response variable that is determined by the predictors. In other words, only slightly more than one half of one percent of the variation is determined by whether the movie passes the Bechdel test. The strong significance represented here is largely a result of the large sample size. Significance is a product of the size of the effect, the variability in the response, and the number of repetitions, so even a very small effect like that here will be significant– that is, unlikely to be the result of random error– with a sufficient number of observations.
Now let’s run a multilinear regression, with both budget and the Bechdel test entered as predictors. We can use the Bechdel test results as an indicator variable in a regression analysis because it is binary; if there were three (or more) levels, we would have to use it as a categorical variable, as a regression would interpret the difference between levels as an interval variable in a way that wouldn’t be appropriate.
Now, once budget is in the model, the Bechdel test variable ceases to be significant by any reasonable alpha. Budget is simply soaking up too much sums of squares, and is both highly significant and produces a far higher r-squared– we’re now explaining just about 50% of the variation. Here’s a simple linear regression of gross on budget demonstrating that relationship.
Now clearly, there are some potentially nasty outliers there– nasty enough to make me think that they might be an error in data entry. (But maybe it’s just Avatar and Titanic?) If I were doing a serious analysis here I would definitely look at some diagnostic measures such as leverage, studentized residuals, or Cook’s Distance. In any case, I think this general relationship likely holds. Budget explaining about half of the outcome in gross makes some sense. The biggest movies are almost uniformly those with big budgets, but your hugely expensive Lone Ranger-style dud also drags on the correlation. Of course, none of this is meant to dispute the general point that we need more women in major movies who have concerns beyond getting the guy. As Hickey’s original analysis explained, in its more responsible form, there’s no relationship between the result of the Bechdel test and a movie’s gross– meaning there’s absolutely no reason executives should fear that movies with prominent female characters won’t sell tickets.
None of this is remotely fancy– it may not be Stats 101 but it’s certainly Stats 501, say. And it doesn’t add anything to Keegan’s excellent analysis. The point is that I can do it, and you could, too. That’s what making data publicly available allows us to do. The other half of this equation is transparency in how research was conducted. I really believe that we could be on the verge of a very cool time in journalism and analysis, because this stuff isn’t just useful for checking each other’s work, it’s fun.