Data communities provide social and practical incentives for scientists to voluntarily share and reuse data with colleagues. In order for data communities to emerge and grow, they need support. Information professionals, such as data librarians and research computing specialists, can advise data communities on best practices for data sharing and help them create or improve the required infrastructure, such as online repositories and metadata schemas. However, research scientists and information professionals rarely have structured opportunities to meet together, especially across institutional lines, for focused discussions about how they can collaborate to sustain data communities.

To address the need for collaboration among scientists and information professionals to understand, support, and promote the growth of data communities, Ithaka S+R and the Data Curation Network partnered together to host Leveraging Data Communities to Advance Open Science. This NSF-sponsored workshop series provided a forum for scientists from a variety of institutions and fields who are already involved in data communitiesor who would like to be part of starting a data communityto collaborate with information professionals who are expert curators in their research area. Over a series of meetings that culminated in a two-day online workshop (Feb 28-March 1, 2022), data communities and information professionals met online to discuss community specific issues as well as broader strategies for moving forward. 

In advance of our final report on the workshop, which will be released later this summer, we’ve invited several participants to reflect on what they’ve learned from the experience. Today’s blog post features an interview with Jordan Wrigley, a data librarian at the University of Colorado at Boulder.  

You come from a science background and now work as a data librarian. Can you tell us a bit about why you made that transition?

My work brings my background in environmental and social science together with my librarianship. Having worked extensively with librarians as a nascent researcher, I quickly realized the value of professionals who understood the systemic flows and practicalities of data research in the information (and misinformation) age. These experiences led to a fascination with the nexus of research data, data communities, and data for societal good expressed through librarianship commitments to equity in knowledge access and education.

You’ve been involved in data management as a librarian and as a researcher. Based on your experiences in those roles, what are some of the key challenges to sharing research data that researchers struggle with? 

One overlooked struggle of research data sharing is its rapidly changing landscape including regulations, grant requirements, and societal expectations. There’s also the ever-growing mass of tools, file types, metadata schema, etc. The speed of change in data practices is comparable to the swiftness of today’s glacier melt in the face of climate change. Many struggles in data sharing are related to the challenge of simply keeping up with changes. A few examples might be concerns around “scooping” or stealing intellectual contributions from open data, the restructuring of traditional roles and associated credit-sharing in data and article publishing, and increasing emphasis on transparency in both public expectations and federal standards without an accompanying publicly-supported technical infrastructure. Whether socially or technologically-based, these changes make it difficult for data communities and information professionals alike to maximize the positive impacts of data sharing.

The speed of change in data practices is comparable to the swiftness of today’s glacier melt in the face of climate change.

Can you tell us a bit about the research group you were paired with for the workshop? What kinds of challenges were they facing? How were you able to help them?

I had the pleasure of working with the Montana State Consortium for Research on Environmental Water Systems (CREWS) who had already benefited from the expertise of an integrated information professional savvy in their particular data needs. Largely, my help consisted of supplying suggestions based on “grapevine” discussion with other data professionals and communities. The information professional that was already part of the group was then able to evaluate the potential of my suggestions within the specific context of that group. The key challenges of this group were storage and publishing platforms as well as outreach and creating buy-in for long-term data sharing among broader community members. Because of my background as an environmental researcher and information professional, I was able to make some suggestions for engaging participation through incentives such as dual publication (both manuscript and data) for increased citation and streamlined data submission processes. CREWS is fortunate to have an integrated information professional on their team who may adapt my suggestions or adopt other novel approaches according to the shifting needs of the community.

What did you learn from participating in the workshop? How will this inform your approach to helping research communities on your campus share data?

Research communities are inherently ingenious and often include shrewd problem solvers. During the course of the workshop, the ingenuity of each group in identifying and generating tailored solutions to their data challenges within a range of resource contexts was illuminating. Equally illuminating was the subsequent challenge of interoperability between these data communities and others. The uniqueness of each set of solutions, while ingenious, was also inherently limited in its scope and ability to relate to outputs generated from other data communities. 

This observation impressed on me the importance of scanning for topically-similar or adjacent data communities when making recommendations or developing research data workflows and infrastructure. Universal interoperability may not be a realistic goal but finding a few to a handful of data communities with affinity for a consulting community may prove an effective approach to “bottom-up” community building to increase the applicable scope of a given data community.

Many research teams in our workshop cohort did not have established ties with data librarians. What kinds of steps can universities take to better support these kinds of collaborations? 

First and foremost, we need to fully fund data specialist positions. Data librarianship is a unique specialty that requires consistent engagement with new information and persistent evaluation of complex social and technological systems. The extreme pace of change in data landscapes makes past librarianship approaches such as resource or textual recommendations insufficient. Data librarians, much like programmers or data scientists, are now essential resources, because many text-based resources become obsolete within a very short time frame. Underfunding these roles impacts how researchers perceive their value: a half-funded data position, in addition to being burdensome to a professional, will have half the authority needed to effect change.

Data librarianship is a unique specialty that requires consistent engagement with new information and persistent evaluation of complex social and technological systems.

In practice, fully funding these positions is not plausible for all communities due to limited resources. Communities of practice such as the Data Curation Network (DCN) can help fill this gap by providing support for overburdened data professionals. A national level network of established experts like DCN adds authority to recommendations as well as a peer-network to crowdsource feedback and solutions. Recently, I have observed a growing interest in co-consulting between institutional data librarians and specialists. This presents a very interesting opportunity for interorganizational community development with data librarians and information professionals serving as catalyzing lynch pins in novel networks.

How might data librarians and information professionals better define the value of the expertise they bring to these collaborations?

The value of data librarians is becoming more apparent to non-librarian researchers as they respond to cultural changes in their own intellectual communities. Data librarians (and all librarians) are irreplaceable and integral to the positive impacts of open data for societal good. Rather than expending energy seeking to define the value of our expertise (a potentially Sisyphean task land-mined with racism, misogyny, ableism, etc.), data librarians should identify the communities who are open to librarian and information professional expertise. 

On a more immediate level, data librarians will benefit from the cultivation of curiosity. When conducting reference interviews, the impact of a few well-articulated initial questions may be greater than a multipage defense of one’s expertise. Furthermore, curiosity demonstrates engagement and respect for the value of the research and its creating community. This affirmation can go a long way in developing trust between the data librarian and researchers.

What questions should researchers ask as they begin to establish a data community? Who should they be collaborating with? 

Reverse engineering may be the greatest tool in developing data and research communities. Many of us will have experienced that moment of intellectual euphoria at having discovered an ambiguously defined but intuitively existent research object. Mine was a deed for an archeology site missing from a geospatial dataset that took me almost two years of archival research to locate. My hands shook as I lifted it from its folder. As data creators, curators, and researchers, we need to try and imagine the audience and user group for this data one, two, five, ten, or more years in the future, as well as those who need this data right now. How will they find it? How will they use it? Could the data be used to help or improve conditions? How might it be used to harm? Domain researchers themselves may not always have the intersectional knowledge or identities to answer all these questions. These are the questions that may separate a transitory data output from an evergreen one, and they speak to the question of who data communities should collaborate with.  

As data creators, curators, and researchers, we need to try and imagine the audience and user group for this data one, two, five, ten, or more years in the future, as well as those who need this data right now.

What developments related to data sharing on the horizon in the next five years or so are you most excited about?

The future development of data librarians and information professionals as professions, is by far the most interesting topic for the future. Much like the data landscape itself, our roles, skills, and intersections are stochastic and defined only to be redefined a short  time later. Some of us are programmers and technically skilled in a range of particular data tools, such as GIS and visualization tools, while others could easily fit into research computing departments. Data librarians may be curation-, workflows-, dissemination-, or policy- savvy. Others may be the sole known expert in data practices for a particular domain, field, or topic. Across the board, we are diverse and, while we are often adaptable, we are by no means generic. More than anything, this illustrates the range of data communities we contribute to and our ability to enhance the positive impacts of data sharing via community participation.


The next post in this series, featuring Amanda Rinehart, a life sciences librarian at Ohio State University, will be published in July. 

This material is based upon work supported by the National Science Foundation under Grant No.2103433.