Preserving At-Risk Public Data
An Interview with the Data Rescue Project Steering Committee
Federal data are an essential public good, enabling cutting-edge research and underpinning decision making by governments, businesses, and individuals. However, continued public access to these data is no longer assured as federal agencies have removed datasets from government websites. Librarians, archivists, and other information professionals dedicated to the preservation and accessibility of knowledge have responded by building an independent infrastructure to preserve at-risk federal data for continued public use.
The Data Rescue Project, a collaboration between SSIST, RDAP, and the Data Curation Network, is among the best organized and highest profile efforts to safeguard vital federal data. Since February, the Data Rescue Project and over 500 volunteers have identified and preserved public access to over 1,200 vulnerable federal reports, datasets, and resources. We exchanged emails recently with members of the Data Rescue Project Steering Committee to learn more about their work, the value of federal data, preservation and access policies of previous administrations, and the types of data that are particularly vulnerable to being altered or removed from the public sphere.
We thank Lynda Kellam, Lena Bohman, Halle Burns, and Mikala Narlock for taking the time to share information about this critical undertaking.
Tell me about what the Data Rescue Project does, how it formed, and who is participating.
The Data Rescue Project (DRP) is a volunteer organization focused on rescuing at-risk federal data through volunteer coordination and communication in the broader data rescue ecosystem.
In late 2024, librarians across the profession were talking about the possible expanding threats to public data and ways to coordinate data preservation efforts. Many of us are connected to the 2017 data rescue efforts or to organizations such as the End of Term Web Archive or the Preservation of Electronic Government Information.
In February 2025, after the dismantling of the USAID and the temporary removal of CDC data, a group of data librarians and archivists formalized our work as the DRP, focusing on asynchronous data rescues and communication across rescue efforts. Our founding organizers came from three primary organizations for data professionals: IASSIST, RDAP, and the Data Curation Network, as well as the organization Saving Ukrainian Cultural Heritage Online (SUCHO). Many of our DRP members are librarians, but researchers, technologists, students, and the general public are also helping us save federal public data.
What types of data have the federal government produced in the past? How have these data been used by both the research community and the general public?
The US federal government is among the world’s largest producers of public data. The government and its agencies collect data on various topics, from economics to weather to museums. The decentralized federal statistical system is spread throughout the government, with 13 principal statistical agencies and many other offices. As such, federal public data touches on all aspects of our lives and is used for many purposes.
The public uses federal data to understand educational attainment, crime rates, demographic changes, community patterns, and more. Local governments use federal data to plan services, like deciding where to build a new road using traffic data from the Department of Transportation. Businesses use federal data to plan strategies. Some companies are also built on top of federal data collections. For example, if you see a rating for a school district in a Zillow listing, much of the data powering that system comes from the Department of Education. And, of course, researchers use federal datasets to complete research projects to understand various phenomena. For instance, researchers can use important datasets such as NIH’s All of Us to learn more about public health problems in the US.
How was government data preserved in the past? Was that a government function or did universities play a role?
As the Federal Statistical System is decentralized, each office and agency may follow different data maintenance and preservation practices based on its own interpretation of guidelines, such as the Federal Data Strategy. Giving one answer to the first question is therefore challenging.
Some have pointed to the Federal Depository Library Program (FDLP) as an answer, but the FDLP exists to disseminate and provide access to government information. Its members do not have a mandate to preserve. PEGI recently posted about the FDLP and the challenges for preservation. At the end of the post, they note that “the FDLP cannot ensure long-term preservation for all of the federal information and data that many rely on. We encourage all who care about this access to support and participate in collaborative efforts.” Long-term data preservation should be a government function, but the reality is more complex.
Therefore, yes, universities and other institutions can play a role in supporting the preservation of government information of all kinds (not just data). A recent publication by Jacobs and Jacobs called Preserving government information: Past, present, and future provides an excellent discussion of these challenges and the requirements for a federal digital preservation infrastructure.
How does the Data Rescue Project prioritize which data to preserve? Are certain types at particularly high risk of disappearing?
Initially, we prioritized datasets not being rescued by other organizations, especially social data or data about people. EDGI, an organization created in 2017, and its affiliate, Public Environmental Data Partners (PEDP), have expertise in environmental and climate data and have those areas covered. We were focused on data that did not fit within their portfolio. As time passed, we realized that three major concerns could impact data access:
- The removal or modification of data. This area has been the most prominent in the news because of the CDC, but we at the DRP haven’t seen as many datasets in this category.
- The dismantling or reduction in force of particular agencies. This area has been a primary concern for DRP. The actions of DOGE were unlike anything that happened in 2017. We are concerned about the continuing stewardship of data in these agencies.
- The discontinuation of contracts. Some of the datasets are at risk because federal contracts with vendors who provide storage or other dissemination tools were ending. We have seen this particularly with climate data.
Are you able to track how the data you’re preserving is being used? Can you share some examples?
Because our volunteers add data to DataLumos, ICPSR’s crowd-sourced repository for public data, we have download statistics up to July 2025. The data DRP added to DataLumos has been downloaded almost 3,000 times as of July 10. The most downloaded data were the USAID’s Demographic and Health Survey Indicators (https://doi.org/10.3886/E224462V1). We also created the Data Rescue Project Portal, where we track data rescues across efforts and organizations. Finally, we collect data user stories in which our volunteers talk about the importance of public data for their work or research. Our most recent data user story is on flood predictions.
The federal government has played a huge role in promoting open science and scholarship and has funded a network of subject repositories where researchers can share data collected with or without federal funds. Have you seen indications that data deposited in these repositories is at risk?
We have seen the addition of disclaimers to some NIH repositories that say, “This repository is under review for potential modification in compliance with Administration directives.” As of right now, it is unclear what will happen in the future with these datasets. A recent article in the Lancet outlined some changes that have already happened to datasets, but these were datasets published by the government itself rather than in NIH repositories.
Whether from active attempts to modify datasets or the discontinuation of data series, it seems likely that the repositories most at risk are those directly run by government agencies, rather than those run as public-private partnerships (e.g., those housed at universities). The most significant loss of data so far has been through the laying off of staff and ending storage contracts rather than concerted efforts to deface data.
How is your work supported?
Our work is supported by our volunteers and by our connections to the broader data rescue network. We depend on everyone’s effort and interest in what is happening. You can stay up to date on what is happening by subscribing to our newsletter or following us on Bluesky. And, if you work with researchers, let them know about our Data Rescue Project Portal as a place to find datasets that have been taken down (or add it to your LibGuides).