Yesterday the American Statistical Association (that’s right, a different ASA) released a “Statement on Statistical Significance and P-values.” It’s a thoughtful consideration of the issue, and it even includes 21 commentaries right off the bat, sort of a prepackaged colloquium to nerd out over. FiveThirtyEight did a nice little article on the statement, and I hope that sociologists will take a look and give some thought to what the statement says about how we go about doing our research. (And yes, the narrow issue is limited to quantitative analysis, but the real questions are much more fundamental than how to report statistics.)
I make no claims to being a methodologist. I’ve done a lot of data analysis over the years, but until recently most of it was not based on sample surveys and I was therefore not reporting p-values. But when it came time recently to run some chi-squares and report the statistical significance of the differences between categories of respondents in survey results, I reached back to a phrase from my old-school grad stats professor. Before “p-hacking” was a thing, he warned us against reporting results indicating multiple levels of significance. “P-level envy” was what he called it, and that label helped his warning stick with me across the decades. (The stats sequence was taught in psychology. It’s a pun.) Anyway, in reporting said results, I stuck with good ‘ol “p<.05” and either one asterisk or none. But no one does this anymore, right? Just to reassure myself I hadn’t been imagining all of those journal articles I’ve been reading over the years, I grabbed a couple of issues from the pile on the corner of my desk that never seems to get itself read and flipped through until I found what looked like regression results. Yup, in table after table, three asterisks for three different p-values … oh, and even a brave plus-sign, because p=.09 should really count, too, shouldn’t it?
The AmStat statement calls out this obsession over p-size …er, level of statistical significance, in its principle 5: “A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.” But what other purpose could there be for reporting multiple p-values in a table, if not to imply that some results are “more significant” than others? And don’t get me started on the difference between “significant” and “meaningful.” (Actually, it doesn’t take much to get me started on this, I’ve been wound up about it for years.) Even worse, in our quest to perfect the “fit” (theoretically informed and stepwise, of course) of our sociological models, I find myself asking again and again, “Does any of this actually involve people?”
But, I digress. Or, perhaps I don’t. Because for me the “P-Values Manifesto” (if I may be so bold) should be just the beginning of a fresh look at how we do sociological science. Thanks to the folks at that other ASA for giving me a reason to rant, and perhaps others will care to join the fray.