The Usability of Research Data: If We Curate, Will They Reuse It?
During the last seven years, Ithaka S+R has conducted in-depth qualitative analyses of the research practices of academics in several fields. While the studies have highlighted disciplinary differences in research data sharing and reuse decisions in various academic communities, it is striking to observe that most of the scholars described similar requirements and roadblocks when it comes to reusing data. For instance, in their recent report on the changing research practices of civil and environmental engineers, Danielle Cooper and Rebecca Springer highlight the importance of trust: researchers need to be able to assess the credibility of data and its source to ensure it is reasonably accurate and reliable. This includes having access to contextual information to understand the research, information gathering, and data recording processes behind the final dataset. Also, the data must be available in a format that is compatible with a researcher’s tools, techniques, and analysis applications. One interviewee indicated that they would prefer to wait until someone contacts them to ask for data before investing this time, rather than reformatting data that might sit in a repository unused.
Researchers are encouraged by funders and other stakeholders such as data curators and archivists to deposit their data in a relevant repository and to consistently cite the location of underlying resources in their final research papers and other outputs. The end game is to encourage the reuse of data in various textual, numeric, visual, and multimedia formats to make the research enterprise more efficient, effective, and equitable. But, while the value of research data sharing is widely acknowledged, the factors that promote or deter data reuse post deposit deserve more attention. Another important research area is the discovery of research data for reuse, which is a prerequisite to any use.
Despite the clear benefits, data reuse is complicated and “the devil is in the details.” Open research data are neither inherently better nor easier to use as their utility is confined to the attributes that make them usable. Sharing research data in a way that will support future use and repurposing by others not involved in the original research can be tedious and time consuming. For instance, comprehensive data documentation, including contextual information, is paramount to the future usability of data. Without sufficient description of the research questions and methodologies, it is unlikely that the data can be easily discovered and understood, or effectively used. Also important are the perceptions and attitudes of scientists towards data reuse, and their data reuse behavior. These additional research findings on reuse and usability issues shed more light on the determinants of user experience:
- Perceived Efficacy and Efficiency: It is time-consuming to repurpose already-collected data to new research questions. Renata Gonçalves Curty and colleagues state that the perceived efficacy and efficiency of data reuse are strong predictors of reuse behavior. It is often difficult to identify relevant data and assess their relevance and authority due to the lack of attributes indicating data quality.
- Confusion about Repository Landscape: The study regarding changing research practices of civil and environmental engineers exemplifies the challenges associated in navigating the current landscape of general, subject-specific, institutional, and publisher data repositories. Given the interdisciplinary and cross institutional nature of research, most scholars prefer domain-based repositories such as Dryad and ICPSR. General-purpose repositories such as FigShare and Zenodo are gaining further uptake.
- Upstream Research Lifecycle: There is growing evidence that an essential prerequisite for reuse is the utilization of appropriate data gathering and organization tools and workflows early on in the research lifecycle to improve data quality. In a recent study of data reuse, more than 70% of researchers failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments. The overwhelming conclusion was that there is a need for more robust experimental design, better statistics, and better mentorship.
- Social Networks: The domain-specific, institutional, and publisher repositories of research data serve an important role in curating, storing, and archiving content and ensuring long-term access. However, social interactions within the communities of practice continue to play an important role in identifying and leveraging existing data through personal collections.
The deposition of research data in public repositories is becoming common practice across the disciplinary spectrum, with many journals requiring the archiving of data associated with papers. Public access policies that mandate data sharing can only be successful, however, if the data are usable. The data curation professionals at libraries and archives see their primary roles and responsibilities as improving data management practices, assisting researchers in meeting funder requirements, and advocating for access through repositories to lead towards a more efficient research process and better-quality data. We need more case studies highlighting how these professionals can better enable reuse–and overcome the impediments in the way–in order to leverage our investment and enable reuse for various purposes including initiating new research projects, writing grants, teaching, considering alternative methodologies, or verifying information. There are certainly a range of technology- and policy-related impediments, but the thorniest questions seem to be related to sociocultural and pragmatic issues. Therefore it is important to consider research data initiatives holistically by understanding the motivations, behavior, and preferences of potential users upstream. In their upcoming brief (look for it on Monday, May 13), my colleagues Rebecca Springer and Danielle Cooper will present an evidence-based conceptual model to describe scholarly activity surrounding research data and what we can learn from successful research data archives. Stay tuned!