Last updated on March 10, 2021

While scholars generally believe in the value of sharing and preserving research datasets, many do not believe it’s worth their time to do so. And, when they do invest their time in data sharing and preservation, they tend to have a preference for doing so in an independent and self-reliant fashion. These are issues that we have not only documented through our long-standing national faculty survey but ones that we have contended with in our own work as social science researchers conducting large-scale survey studies.

Data sharing can be valuable for a whole variety of reasons. It permits others to replicate analyses and results, spurs additional research with pre-existing datasets, improves methods of data collection through the scrutiny of others, and broadly encourages alternative perspectives which can promote a diversity of analyses and conclusions. Additionally, sharing research data contributes to societal knowledge and can prevent other researchers from sinking resources into duplicating data collection efforts by allowing them to work off of pre-existing data. Particularly during the COVID-19 pandemic when faculty are encountering challenges in conducting research with newly-generated data, leveraging data that has already been collected and analyzed can be particularly useful. Many scholars weigh these benefits against the aforementioned challenges, along with funder mandates, when determining whether and how to deposit their data.

Since there is a robust landscape of research data sharing spaces, we decided to conduct exploratory, high-level research on a number of data repositories, primarily to inform our own data deposit protocols. We regularly deposit data from the US Faculty Survey, Library Director Survey, as well as several other research projects with ICPSR. Recognizing that our research on a variety of characteristics of data repositories may yield utility for other researchers, today we are publishing a summary of our findings.

Below you can find seven repositories compared side-by-side in tabular format. We have highlighted particular factors that are key for informing decision-making: disciplinary scope, typical timelines for processing datasets, associated costs, and services offered (such as data curation).

Repository nameDisciplinary scopeOffers data curation?Length of time to curate dataCost of data depositAccessing data deposits
DryadGeneral repository with a focus on scientific and medical datasetsYesApproximately one dayThere are a variety of paid membership plans available to institutions and publishers for depositing datasets. Pricing is based on factors such as the level of research grant funding.No cost associated with accessing datasets
FigshareGeneral repositoryNo, but available for Figshare for InstitutionsN/ANo cost associated with depositing datasetsNo cost associated with accessing datasets
Harvard DataverseGeneral repositoryYesFree consultation and assessment takes between 1-3 hours, but actual length of time for curation varies depending upon the complexity of the dataNo cost associated with depositing datasetsNo cost associated with accessing datasets
ICPSRGeneral repository with a focus on social science datasetsYesOnce assigned to a curator, the curation process for most studies takes anywhere from 4-8 weeks, but can take up to several months depending on the complexity of the data and the level of curation needed.No cost associated with depositing datasets; there may be additional fees for particularly large datasetsAccess to ICPSR requires paid membership through a member institution, though some datasets are open-access.
Mendeley DataGeneral repositoryNoN/AFree and paid memberships are available for storing and depositing datasets with three different paid monthly plans based on total storage spaceNo cost associated with accessing datasets.
Roper Center for Public Opinion ResearchPrimarily includes public opinion survey datasetsYesApproximately one weekNo cost associated with depositing datasetsBoth members and non-members are able to access data. Non-members pay a fee associated with the data.
ZenodoGeneral repositoryNoN/ANo cost associated with depositing datasetsNo cost associated with accessing datasets

Naturally, there are different tradeoffs associated with choosing one repository over another.

Reach and impact: A number of these repositories are general in terms of disciplinary scope, whereas some primarily cater to the social sciences or sciences. This could help shape which repository researchers might select depending on the intended audience for re-using their data. Similarly, who has the ability to access datasets in each of the repositories, and at what cost, should be considered. If open-access is a priority, it might make sense to select Mendeley Data, Zenodo, or Dryad, as datasets in these repositories are freely accessible to the public. Harvard Dataverse and Figshare let scholars choose whether datasets are freely accessible or restricted. On the other end of the spectrum, ICPSR and The Roper Center require payment or membership to access datasets.

Cost to deposit: A number of the repositories require institutional or individual membership or have fees associated with depositing research data. If cost of dataset deposit is a concern, Figshare, Harvard Dataverse, The Roper Center, and Zenodo do not charge for depositing research data, and Mendeley Data has a free membership option as well.

Data curation: Data curation services involve processes that validate data, such as ensuring that there is alignment with the questionnaire, codebook, and dataset of research projects. Data may also be made available in multiple file formats, such as CSV, SAS, and SPSS files. Data curation services can also serve as an additional check prior to data being made available to others, and is a feature that we highly value at Ithaka S+R. Dryad, Harvard Dataverse, ICPSR, and The Roper Center all offer data curation services, whereas Figshare offers data curation through an additional subscription service, and Mendeley Data does not offer data curation. It is important to note that data curation can add to the length of time before a dataset becomes available in any given repository. For Dryad, the length of time to curate and deposit data is typically one day, while for The Roper Center this can take about one week, and for Harvard Dataverse, this typically varies depending upon the complexity of the data. If the length of time before a dataset is available is not of great concern, ICPSR takes approximately four to eight weeks to curate most datasets. However, depending upon the complexity of the data, this process can take several months, so ICPSR also has developed and offers another service–openICPSR–that does not offer data curation in which data can be quickly deposited. If data curation is not important and speed is ideal, Figshare and Mendeley Data may be good choices.

We hope that the 2020 snapshot summarized here can help to serve other researchers, especially those in the social sciences, as they weigh the pros and cons of each repository. Of course, these repository providers often change and adapt their services and offerings. As you consider preserving and sharing your research data, we would be happy to discuss these options with you. Please email me at nicole.betancourt@ithaka.org.

I thank Janan Shouhayib, a PhD student at The Graduate Center, and intern with the Ithaka S+R surveys and research team over the spring and summer of 2019, for her contributions to this exploratory research.