Data Users Struggle to Prevent Suppression of Race, Ethnic,
and Gender Statistics
by Roberta Spalter-Roth, ASA Research and Development Department
The strength of the data user community made itself felt in an effort to prevent the Science Resource Statistics (SRS) branch of the National Science Foundation (NSF) from suppressing data on the race, ethnicity, and gender of new doctorates in science, technology, engineering, and mathematics (STEM) fields. The final outcome is still unclear, however.
Each year since 1957, the SRS has issued a report based on the Survey of Earned Doctorates (SED), a census of all doctorates receiving their degree in a particular year. This survey provides a wealth of information on the new PhD recipients including their educational history and their career plans. This information is available by race, ethnicity, and gender. The user community for these data includes many programs whose goal is to increase the participation of women and underrepresented minorities in the STEM workforce. The data serve as a benchmark for progress in these fields.
In 2007, in response to a new set of privacy rules for federal statistical agencies issued by the White House Office of Management and Budget (OMB), SRS decided to suppress (not publish) certain small data cells containing information on gender and racial and ethnic minorities in specific degree fields to maintain confidentiality in an era of data miners. Under their revised rules, no cell would be published that had fewer than six individuals. This meant that no data on women or underrepresented minorities would be published in small fields such as analytic chemistry, atmospheric science and meteorology, computer and information sciences, comparative psychology, statistics, and demography. In addition, all zeros (fields in which there were no women or minorities) were also to be suppressed. Finally, if the size of the sub-field could be deduced through subtraction from larger fields then the larger field was suppressed also. The result of this scheme would be the loss of data that had been widely used for years.
SRS made this decision without the input of members of the user community. At first the response was slow, but over time both the number and the decibel level of the phone calls, memos, and letters to NSF, including those to Arden L. Bement, Jr., the Director of NSF, grew. In the course of these communications, users found that the leadership of NSF, as well as directors of programs that encouraged the use of these data, did not know of the SRS decision. Members of SRS’s Advisory Board (the Human Resources Experts Panel) also strongly voiced their concern about the value of the SED data.
No Longer Hush Hush
Declaring this non-transparent decision-making process a "grave mistake," Lynda Carlson, head of SRS, attempted to open-up the process. SRS then issued the 2006 tables in their original format to those who requested them. They also developed a series of alternative tables that re-aggregated the data in several ways, with users being asked to discuss which alternatives they preferred. These methods included combining years of data, sub-fields, or minority groups. Simultaneously, Carlson asked several organizations, including the Commission on Professionals in Science and Technology (CPST) to conduct a series of meetings where users would be invited to discuss the various alternatives presented by SRS staff. The Quality Education for Minorities (QEM) Network agreed to conduct the meetings and write a report of their findings. (In the interest of full disclosure, I was chair of the SRS Human Experts Committee and President of CPST’s Board, as well as the ASA representative to one of the QEM meetings, during this period.)
QEM set up a series of eight meetings across the country during fall 2008 and developed a report from the information they gathered at these meetings. According to their report, users were dismayed by the idea of suppression of data that had long been available and the alternatives presented. Users argued that SRS had failed to explain how publishing the actual data could lead to the identification of individuals, had not provided any examples of the negative impact of the availability of small data cells, and did not understand the impact of its scheme on equal opportunity programs. More specifically, they agreed that small cells needed to be published, zeros needed to be displayed, aggregating separate race/ethnic categories into one "underrepresented minority" category was not useful, separate years needed to be published to understand trends, and field aggregation must be meaningful, not haphazard. See the QEM report at www.cpst.org/pastmeet.cfm.
New Decision Rules
In February 2009, QEM’s President, Shirley McBay, presented the results of the study to NSF’s Committee on Equal Opportunities in Science and Engineering, an advisory body for issues concerning underrepresented minority groups in STEM fields. At this meeting, Carlson surprised the audience and delighted data users by announcing that SRS would not use any of the schemes that they had proposed previously. Instead SRS presented three new decision rules, which would suppress far less data than previously strategies. These are:
- Establish a bigger minimum criteria. SRS will publish all race/ethnic/gender degree counts for sub-fields of degrees if at least 25 PhDs were granted in the broad field. Numbers in sub-fields and zeros will be published. For example, the number of women and specific underrepresented minority groups in sociology with 467 PhD recipients in 2006 would be published and similar data on criminology with 88 2006 PhDs would be published.
- Aggregate some small degree fields. SRS would aggregate small fields of degrees into broader categories with at least 25 PhDs granted. The field of demography with 8 PhDs in 2006 would not be published separately, and would likely be folded into sociology. Choices about aggregation of small fields will be guided by the Classification of Instructional Programs (CIP) taxonomy. These aggregations could change over time as small fields become bigger and big fields become smaller.
- Report all minority groups separately in dissagregated form.
Users seem pleased with the SRS’s changed strategy to protect confidentiality. McBay stated, "SRS heard the concerns expressed…and has reconsidered its approach." There were still questions by members of the Human Resources Expert Panel about why the new concern with confidentiality was necessary. The story is not over. At the request of SRS, the Committee on National Statistics of the National Academies convened an expert panel to review the QEM report and the SRS confidentiality decision rules in order to provide advice on how SRS might proceed. Steve Cohen, the Chief Statistician for SRS stated, "These decisions are not carved in stone." The tenor of the meeting appeared to slant toward concerns about confidentiality rather than concerns about providing information to increase the participation of women and underrepresented minorities in the science workforce.