Data communities provide social and practical incentives for scientists to voluntarily share and reuse data with colleagues. In order for data communities to emerge and grow, they need support. Information professionals, such as data librarians and research computing specialists, can advise data communities on best practices for data sharing and help them create or improve the required infrastructure, such as online repositories and metadata schemas. However, research scientists and information professionals rarely have structured opportunities to meet together, especially across institutional lines, for focused discussions about how they can collaborate to sustain data communities.

To address the need for collaboration among scientists and information professionals to understand, support, and promote the growth of data communities, Ithaka S+R and the Data Curation Network partnered together to host Leveraging Data Communities to Advance Open Science. This NSF-sponsored workshop series provided a forum for scientists from a variety of institutions and fields who are already involved in data communities—or who would like to be part of starting a data community—to collaborate with information professionals who are expert curators in their research area. Over a series of meetings that culminated in a two-day online workshop (Feb 28-March 1, 2022), data communities and information professionals met online to discuss community specific issues as well as broader strategies for moving forward.

To complement our final report on the workshop, we’ve invited several participants to reflect on what they’ve learned from the experience. Today’s blog post features an interview with Amanda Rinehart, a life sciences librarian at Ohio State University. The previous blog post interview, featuring Jordan Wrigley, a data librarian at the University of Colorado at Boulder, was published in July.

You come from a science background and now work as a data librarian. Can you tell us a bit about why you made that transition?

I worked for the USDA for more than a decade studying microbial communities in subtropical crop systems. So I have collected, organized, and analyzed a lot of research data related to those studies. As part of a committee assignment, I was put in charge of the “library” at my research facility. I put “library” in quotes because it was really a room with hundreds of books and journals in boxes, as well as whatever else people wanted to store. As I tried to make the books and journals available for the other researchers, I realized how much about running a library was hidden from the casual library user. I then talked to some librarians and took a cataloging class to find out how to do it properly, and a few years later I had a masters’ degree in library and information science just as data librarianship became an emerging job market. I took the leap and switched careers. Being a data librarian was a perfect combination of my love of science, my expertise with research data, and my interest in librarianship. One of the best choices I’ve ever made!

You’ve been involved in data management as a librarian and as a researcher. Based on your experiences in those roles, what are some of the key challenges to sharing research data that researchers struggle with? 

Changing expectations, technological limitations, lack of time, lack of knowledge, and of course, increasing competition for grant funding. Among the many challenges, there are two overarching key challenges that really limit appropriate data sharing.

First is the urgency of the situation. Data sharing has been tied to grant funding, and since grant funding is increasingly hard to get, and is directly related to researchers retaining their jobs, stress levels around research data sharing are elevated. This stress and urgency can result in poor practices and even a level of hostility toward the topic. So it’s important for researchers to have access to trusted experts that can help them tailor realistic solutions on a timeline that works for the project. Data librarians and other information professionals fill this role admirably as they have the bandwidth to keep up on changing expectations, know about other available resources, and understand current standards and practices. Librarians are also champions at listening, reflecting back, and clarifying needs, which can reduce stress levels to the point where researchers can do more than the bare minimum.

The second key challenge is locating proper expertise. Because the nature of research is to  grapple with problems that have yet to be solved, the data sets that research generates are custom-built collections of diverse materials. How do we tap into existing expertise and develop new expertise to curate it all? One solution is to form collaborations that swap or share their expertise. One such group is the Data Curation Network, while other collaborations around sharing research data include state or national repositories (such as Canada’s Digital Research Alliance and the Texas Data Repository). These distributed networks of expertise ensure that researchers have access to the best and brightest data curators, no matter where they may be employed.

Can you tell us a bit about the research group you were paired with for the workshop? What kinds of challenges were they facing? How were you able to help them?

I had the pleasure of working with the Michigan Maple River Dam research project, a data community that charts changes to the river ecosystem both before and after the dam removal. There are a number of different science disciplines involved, as well as a diverse array of stakeholders, including indigenous communities, homeowner associations, state natural resources departments, as well as academic researchers and private companies. Their data is  collected and organized around specific questions, resulting in a custom-built database. This makes it hard to choose just one metadata schema or a single data repository. They also struggle with getting researchers to meet high metadata standards and meeting the expectations of the many stakeholders who may want the data presented in different ways.

Impressively, they have a data manager that had already consulted with a data librarian and started a database with metadata elements. I had a few specific suggestions, such as creating a memorandum of understanding with researchers that clarify the expectations on metadata collection and paying extra attention to stakeholders who may be wary about data sharing. For example, reaching out to indigenous communities to get their input can enhance the quality of the data shared, and consulting with experts to ensure appropriate communication with all of the stakeholders. My general suggestions included celebrating the good practices they already had in place and prioritizing which efforts should be done first.

What did you learn from participating in the workshop? How will this inform your approach to helping research communities on your campus share data?

For me, this workshop underscored the value of working in a team to share data. Data support in libraries is often considered informational or transactional, and this underestimates the group effort required, the necessity of dialogue, and the immeasurable value of trust. It was illuminating to see the workshop participants, from vastly different research backgrounds, start to self-navigate towards similar solutions for their problems by comparing and sharing what their efforts had cost and whether they were successful. This kind of rapid development of trust between research groups can lead to better real-world practices, something you don’t always see in a more transactional environment where the ‘sage on the stage’ just tells researchers what to do. I will definitely be connecting research groups that I hadn’t connected with in the past, as this workshop demonstrated how researchers are adept at self-navigating to solutions given the right information and environment.

Many research teams in our workshop cohort did not have established ties with data librarians. What kinds of steps can universities take to better support these kinds of collaborations? 

Helping researchers understand what resources are available (locally, regionally, nationally, internationally), and how they relate to each other is imperative for researchers to find solutions that work for them. Universities can provide information around new expectations, raise awareness of existing resources (including data librarians!), identify gaps in expertise and infrastructure, and provide the necessary resources so that existing support efforts can be scaled to meet need. One specific way that universities can capitalize on their current investments in expertise and infrastructure is to promote cross-unit teams to address research data challenges. Librarians, as one of the key contributors of expertise around research data management, should be included on such a team, along with research administrators and other information and computing professionals. This holistic support of researchers results in saved time and grant funds, as well as better data curation and sharing. Data curation and sharing is no small task, but the universities that are successful will enjoy the advantages of increased research funding and prestige.

How might data librarians and information professionals better define the value of the expertise they bring to these collaborations?

Value comes from more than just knowledge and experience; it’s also the result of effective processes that translate knowledge and experience into outcomes. The processes in science (and most social science) research are team efforts, so the value of any one contributor to the outcomes can be difficult to measure. Instead, defining the value of the research support team to the larger organization may be a more advantageous option than focusing on just the data librarian or information professional component. What is the value of the research support team in helping researchers attain their goals? For most institutions, this work directly relates to the securing of grant funds and thus to the larger mission of doing excellent research. Do institutional leaders want to have the funding and prestige to employ the best and brightest? Do they want to have successful researchers that are solving real-world problems? If they do, then they should understand that data librarians and information professionals are integral to that goal.

What questions should researchers ask as they begin to establish a data community? Who should they be collaborating with? 

The first questions that researchers should ask as they begin to establish a data community is what are their collective goals and priorities? After those are defined, what are the barriers to achieving those goals? If these aren’t immediately apparent, or if there is uncertainty around expectations, using the FAIR (findable, accessible, interoperable, and reusable) standard can help. Achieving FAIR data is a goal that often reveals competing priorities, such as wanting to share data, yet also wanting to protect sensitive data. Overcoming barriers and resolving competing priorities is made easier by speaking with other researchers who share the same challenges, data librarians, IT personnel, and other research support staff. In the example posed above, consulting a privacy officer or Institutional Review Board person would help define what options are available for sharing sensitive data. Collaborating with these support entities early in the research process allows for better, easier data curation and sharing. Forming data communities results in finding expertise and resources faster, improving practices based on real-world experience, and developing comradery with those who are facing the same challenges.

What developments related to data sharing on the horizon in the next five years or so are you most excited about?

I’m so excited about the development of large collaborations around data sharing and the renewed emphasis around open science! Part of open science is managing research data so that it is appropriately curated and shared, and to that end, data librarians and informational professionals are forming multi-institutional collaborations to provide the required expertise and infrastructure. I recently had the pleasure of joining OCLC Researchers Rebecca Bryant and Brian LaVoie in examining the costs and benefits of three multi-institutional collaborations and am both excited and re-energized to hear about their amazing accomplishments! This study is based on the idea that collaboration is a strategy to reduce an individual institutions’ risk and costs, while also building communities that can provide more efficient, higher levels of support.

However, opening science is not easy and collaboration does have challenges. It requires that we rethink how credit is allocated, shift the expenses of curation and publication, and develop experts that help researchers incorporate new workflows and standards. But it’s worth it; open science and collaboration have clearly demonstrated how they can expand capacity and accelerate research, something that has become an existential necessity. This message is echoed in the recent Office of Science and Technology blog post entitled “New Guidance to Ensure Federally Funded Research Data Equitably Benefits All of America,” announcing the new “Guidance on Desirable Characteristics of Data Repositories for Federally Funded Research.” It is clear that data communities, composed of researchers and those of us who support data sharing and curation, can transform the future of humanity.