In the Eye of the Beholder
What’s a Digital Preservation System Anyway?
Today in celebration of the World Digital Preservation Day (WDPD), we would like to update you on a Ithaka S+R research initiative on the preservation front. Held on the first Thursday of every November, WDPD aims to promote greater awareness of the critical role preservation plays in providing enduring access to knowledge. Times like this further underscore the importance of preservation, given the imperative to archive diverse sources of information about the pandemic–not only for future generations but also for those who are already studying various scientific, sociological, political, and cultural aspects of the pandemic. WDPD is an opportunity to celebrate the hard work of the community and the progress made on many technical, managerial, and policy aspects of preservation over the years.
Cultural heritage organizations increasingly depend on digital platforms to support the curation, discovery, and long-term management of digital content. Yet, some of these systems and tools have been shown to have substantial sustainability challenges. The potential operational and financial implications of the COVID-19 pandemic for cultural heritage organizations further underscores the importance of deploying operationally and financially durable and effective systems. In August 2020, with generous funding from the Institute of Library and Museum Services (IMLS), we launched an 18-month research project to examine and assess how digital preservation systems are developed, deployed, and sustained through a series of case studies.
To date, we have conducted interviews with 15 colleagues with expertise in digital preservation. As one interviewee put it, “We’re all over the place with this as institutions of all sizes struggle with resourcing digital preservation.” We hope to contribute to the community’s efforts to expand our understanding of what constitutes a digital preservation and curation system (DPCS).
What is digital preservation?
Although digital preservation is a well-established concept, it continues to be a situated and interpretive process, highly variable across different institutional settings. As one interviewee stated, “institutional repositories, curation systems, and digital preservation services, are all used interchangeably, sometimes in a confusing way.” Key terms such as “archiving” and “preservation” mean different things to different communities. Storage management is a crucial preservation strategy; nevertheless, it does not equate to preservation. Depending on the institutional context, digital preservation can be seen as retrieving information from legacy media, implementing microservices such as file format transformation, or simply digitizing analog content for retention and online access. The emergence of web archiving and research data management programs has further blurred the boundaries.
The initial scope of our study is broad, comprising digital asset management software packages, long-term storage services, and software as a service (SaaS) products used by cultural heritage organizations to undertake digital preservation and curation work. Rather than trying to adjudicate what does and does not “count” as digital preservation, we are studying the systems and services that cultural heritage organizations might use toward meeting digital preservation goals.
What did we learn from our initial interviews?
Since the framing of digital preservation as a critical program area for long-term accessibility of the social, economic, cultural and intellectual heritage in the early 1990s, a considerable amount of progress has been made. The digital preservation community is getting larger, representing deeper expertise around a wide range of digital content types. There is also a growing appreciation of the need to engage beyond technological challenges with a range of organizational, business, and policy issues. Open source and commercial digital preservation systems and tools have proliferated, and several successful collaborative initiatives demonstrate the value and power of broad community efforts. In order to tap into this expertise, we initiated our project with a literature review and a series of interviews involving fifteen colleagues with expertise in digital preservation during September-October 2020. What did we learn?
- Policy is crucial to preservation. Many institutions lack consensus on what collections they should preserve and the level of preservation needed. One of our interviewees explained that at many institutions, 80 percent of the labor that goes into digital preservation “is not technology—it’s policies, workflows, [what you need] in order to use the systems effectively.”
- Sustainability requires a holistic understanding of costs. Cultural heritage institutions vary significantly in their ability to reserve staff resources and technical expertise for preservation. As a result, deciding what to build in-house and what to outsource is critical for creating sustainable digital preservation policies. One approach to controlling development and maintenance costs is for libraries, archives, and museums to pool resources on a regional basis toward distributed digital preservation (DDP) systems, such as in the Statewide & Regional Stepping Stones to the National Digital Platform Project. Another relevant initiative is the MetaArchive Cooperative’s Getting to the Bottom Line: 20 Cost Questions for Digital Preservation, a resource aimed at helping institutions compare the short- and long-term costs of digital preservation solutions. Going forward, we are interested in understanding whether and how such tools are being used in decision making by preserving institutions.
- Institutions struggle to integrate disparate tools and systems. One interviewee described an example: “Let’s say a research university decides they need Preservica, an all-in-one turnkey digital preservation system. They will still pay for and maintain an ILS for collection management, an institutional repository, possibly a separate institutional repository specifically designed to house research data, and Archive-It for web archiving. The challenge of integrating these separate systems—not to mention paying for them—is daunting for many institutions. This is compounded by the organizational structure of most libraries, which usually separates collections from technology and digital asset functions.”
- Understanding open source sustainability is an ongoing community effort. A number of groups both within and outside the cultural heritage space—including the OSS Health Index Project and CHAOSS) are working to develop objective and, when possible, quantifiable metrics for measuring the health of community-driven open source software projects. As we move forward, we are interested in assessing how these tools are being used within the digital preservation community and beyond.
- It’s not as simple as commercial vs. community. Some institutions may feel a values-driven allegiance to community based systems, while others are wary of the potential “hidden costs” of implementing and managing systems that may not be as user-friendly or agile as commercial products. Both commercial and community systems can also pose a risk of discontinuation. One interviewee cautioned against equating system provider motives with business models: not all nonprofit products may be offered with the community’s best interests in mind, while not all for-profit products are shaped solely by the profit motive. Educopia’s framework for evaluating the adherence of publishing ecosystem players to community values and principles offers one way to approach the issue of community alignment with greater nuance.
- There are opportunities to broaden the conversation. Traditional digital preservation efforts have largely been siloed from rapidly growing efforts around data management and FAIR data sharing. There is potential for these two communities to share knowledge and build cross-system integrations by thinking about a spectrum, rather than a dichotomy, between digital preservation and data management.
Through a series of eight case studies, we will analyze the business approaches of community-based and commercial initiatives, offer lessons learned, and propose alternative sustainability models for long-term maintenance and development. To select the case studies, we are developing a working taxonomy of digital preservation systems. This work is informed by an environmental scan of scholarship, reports, and project websites; our initial interviews with community leaders and the expertise of our Advisory Board; and by existing inventories. We particularly appreciated the 2013 Digital POWRR Tool Grid, which relates digital and preservation tools to the OAIS reference model, and its successor, Community Owned Digital Preservation Tool Registry (COPTR), a database of preservation tools indexed by file format and the DCC Curation Lifecycle Model stages which is still being updated.
We look forward to working with various stakeholders, knowing that we will deeply benefit from their perspectives and insights.
As we learn from previous studies, we greatly appreciate the work of our colleagues on initiatives such as Preserving Digital Objects With Restricted Resources, National Digital Platform at Three, Architecting Sustainable Futures: Exploring Funding Models in Community-Based Archives, It Takes a Village: Open Source Sustainability, Community Cultivation: A Field Guide, Community Health Analytics Open Source Software, Understanding What Constitutes a Vibrant Open Source Community, Why Are So Many Scholarly Communication Infrastructure Providers Running a Red Queen’s Race?, Invest in Open Infrastructure, Mapping the Scholarly Communication Landscape, Restructuring Library Collaboration, Sustaining the Open Sector: A Brief Look Back. We have immensely benefited from the generous advice of the following colleagues as we explored the findings and recommendations of previous related initiatives: Mercè Crosas, University Research Data Management Officer, Chief Data Science and Technology Officer, Institute for Quantitative Social Science, Harvard University; Megan Forbes, CollectionSpace Program Manager; Laurie Gemmill Arp, Director, DuraSpace, Community Supported Programs; R. F. (Chip) German Jr., Program Director, Academic Preservation Trust; Matt Germonprez, Associate Professor, Information Systems and Quantitative Analysis, University of Nebraska Omaha; Grant Hurley, Digital Preservation Librarian, Scholars Portal, Ontario Council of University Libraries; Georg J.P. Link, Co-founder, CHAOSS, Director of Sales, Bitergia; Kari May, Digital Archivist & Preservation Librarian, University of Pittsburgh Library System; Trevor Owens, Head of Digital Content Management, Library of Congress; Amy Rudersdorf, Senior Consultant, AVP; Ashley E. Sands, Senior Library Program Officer, Institute of Museum and Library Services; Jaime Schumacher, Senior Director, Digital Collections & Scholarship, University Libraries, Northern Illinois University; Katherine Skinner, Executive Director, Educopia; Matt Schultz, Director of Digital Curation and Preservation, Educopia; Ann Marie Willer, Director of Preservation Services, NEDCC Northeast Document Conservation Center.
The Institute of Museum and Library Services (IMLS) is celebrating its 20th Anniversary. The Institute of Museum and Library Services is the primary source of federal support for the nation’s approximately 123,000 libraries and 35,000 museums. Our mission is to inspire libraries and museums to advance innovation, lifelong learning, and cultural and civic engagement. Our grant making, policy development, and research help libraries and museums deliver valuable services that make it possible for communities and individuals to thrive. To learn more, visit www.imls.gov and follow us on Facebook, Twitter and Instagram.
This project was made possible in part by the Institute of Museum and Library Services LG-246365-OLS-20