Emergent Data Community Spotlight III

An Interview with Kitty Emery and Rob Guralnick on ZooArchNet

Rebecca Springer

Successful data sharing crosses disciplinary silos. As Danielle Cooper and I argued in a recent issue brief, “data communities” — formal or informal groups of scholars who share a certain type of data with each other — emerge both within and across disciplinary boundaries. In order to understand how these data communities emerge — and to understand how they can best be supported — I’ve been seeking out leaders who are at the forefront of efforts to grow and strengthen data communities. What does it take to make valuable, researcher-driven data reuse possible, even across disciplinary boundaries?

So far, I’ve interviewed leaders from two very different data communities, spinal cord injury research and literary sound recordings. The final installation in this blog post series highlights a data sharing project with interdisciplinarity at its core. I’m delighted to introduce Dr. Kitty Emery and Dr. Rob Guralnick, the principal investigators of ZooArchNet, a zooarchaeological data project that “aims to mobilize data about archaeological animal remains in a way that best supports open data and open science approaches for both biological and anthropological research.” ZooArchNet builds on existing data sharing infrastructures and communities to create new tools to facilitate zooarchaeological data sharing among both biologists and anthropologists. Dr. Emery is Associate Curator of Environmental Archaeology at the Florida Museum of Natural History and Dr. Guralnick is Curator of Biodiversity Informatics at the Florida Museum of Natural History. The Florida Museum is part of the University of Florida and is Florida’s state natural history museum.

In this interview, Dr. Emery and Dr. Guralnick speak about changing the way interdisciplinary collaboration happens — by deploying technologies like linked open data, developing mutually comprehensible metadata standards, and facilitating communication — in order to enable research that answers critical questions about the human relationship with our environment.

ZooArchNet is a project to optimize and share data about archaeological animal remains for both biological and anthropological research. Let’s start with the obvious question: what kind of data is this exactly, and why is it useful for such different fields?

Kitty Emery: Zooarchaeology is the study of animal remains from archaeological sites, so these are data about animal specimens, usually represented by skeletal remains, but sometimes other tissues if preservation is good. In that way, these are biological data on things like taxon (what species or genus the animal was), portion (what body part or skeletal element or fragment thereof is represented), age/sex, and condition (e.g. pathologies). But, being from human-related contexts, these are also cultural data on things like how the animal was used (as food, as tool, as symbol), how it was valued (elite or slave food for example, or symbol of royalty or symbol of a witch as another example), how it was moved across the community through sharing or across the broader cultural unit through trade or taxes. Together these data give us a combination of data about human impact on animals and landscapes, but also environmental impact on people and their decision-making.

How was the project conceived?

Kitty Emery: I’ve been interested in open zooarch data and data compilations for a long time. My first NSF-funded project back in the early 2000s was on those topics, and I was elected to the first tDAR scientific advisory committee (now part of Digital Antiquity). Eric Kansa of Open Context was also on that committee and his archaeological data publishing initiative with his wife Sarah Whitcher Kansa (also a zooarchaeologist) was just starting up. I have been publishing my data with their group for some time now and my last edited book will include all the data used in all the 20 some odd chapters as open access published data. So when Rob was hired as our new curator of informatics at the Museum, it was an incredible opportunity for me to work with someone from the biological side of the game — and to try and figure out how we might really move things forward by bridging the gap between biological and archaeological data. I’ll let him tell you all about his projects, but his innovations with VertNet and those of his close colleague John Wiezcorek on the Darwin Core provided the foundation on which to build the idea. The University of Florida and our Florida Museum both believed in our idea and provided start-up funding so we could get the idea off the ground.

Rob Guralnick: I’ve been involved in mobilizing natural history collections data for more than two decades, starting waaaaay back in 1992-1993 when I was a graduate student at the University of California Museum of Paleontology. I remember struggling with early text parsing tools in order to make our database of fossil vertebrates searchable online and embedded in the UCMP website (which still exists today, almost thirty years later!). Those were early days in the Wild West of the Internet. My interests in Open Data and finding ways to move the digitization of natural history collections forward never waned and I have been lucky enough to continue efforts in that direction throughout my academic career.

As you might be able to guess from my background in Paleontology, I’ve also always been interested in questions that span multiple timescales, and to me it is a missing piece of the puzzle to not have ways to assemble zooarchaeological data resources when we think about human impacts through time. I was so excited to find a collaborator and kindred spirit in Kitty, who also saw the potential. One key reason why the collaboration works is because we are both curious and interested in the work we do together and not only the areas of overlap but also where each of our knowledge and skills don’t overlap. It makes it a very fun and enriching collaboration.

Prior to ZooArchNet, if an anthropologist wanted to use zooarchaeological data in their research, what would that entail? What about if a biologist wanted to use the same data?

Kitty Emery: In either case, since most zooarchaeological data is not available in openly accessible settings and in formats useful for integrated data analysis (for exceptions see Open Context and Neotoma), researchers who want to use zooarchaeological data have to comb the literature and hope that people have published actual specimen-level data with the detail needed to address their particular questions. The reality is that most researchers who want to use zooarchaeological data usually gather their own. And that precludes biologists who study modern or paleontological animals from gathering such data since they don’t typically do zooarchaeology or know where to look in the literature and gray literature of old reports.

How common is it for biologists and anthropologists to work together?

Kitty Emery: It is actually quite common for biologists and anthropologists to work together because really, we are all asking the same sorts of questions about animals — whether the animals are human or not. The difficulty has been that the way we have been able to work together has been limited to a discussion of interpretations after the research is done. What we want to change is the way we do collaborative research — to allow us to compare data and use combined datasets to reach conclusions.

Rob Guralnick: Kitty is absolutely right and I’d argue that we are starting to accelerate collaborations by not just talking at the end of the research enterprise, but right at the beginning, around data. This will make it easier for a common language to develop across our disciplines. This is not saying we are going to have “research Esperanto” — in the same way, diversity of languages is a benefit and enrichment. I just mean that we’ll have more chances to overlap our shared vocabulary and understanding earlier.

Tell me a little about how ZooArchNet is building on the data sharing achievements of both the biology and archaeology communities (VertNet, Darwin Core, GBIF, ADS, etc.).

Kitty Emery: The archaeological data community is growing rapidly and they have worked in different ways to address some of the important concerns inherent in sharing data about people and cultural heritage. Here is one example: the “social value” of human-related data is a key difference between the approaches that archaeologists and biologists have toward their data and public access to it. Our actions and those of our ancestors speak to our own cultural norms, defined by the history of our politics, economics, and social interactions. These are very “personal” data in some ways and their use throughout history to both valorize and sometimes denigrate cultural groups is a complicated part of our science. This reality has, in some ways, hampered the development of broad global initiatives to share archaeological data freely because even the way we each speak about our past and the terms we use to define our material remains, actions, and historical facts all also carries cultural impact. That makes it near impossible to standardize and link our data. But this very hurdle has led to fabulous initiatives throughout the digital humanities, and particularly in archaeoinformatics, in the use of linkages among data points, or Linked Open Data (LOD), as a way to connect across different datasets, terminologies, and systems. This practice has not been used as much in the biodiversity informatics world and so ZooArchNet is able to build on the archaeological innovations and incorporate them into the biodiversity data sharing systems.

Rob: On the “neontology” side, summarizing thirty years of “learning by doing” with infrastructure is no easy task, but I would argue the most important single activity has been building standards that can support data integration without forcing communities to lose often very expressive data. Zooarchaeology is a great example – the data here are incredibly rich – much more so, in my view, than what is typical for a modern specimen. With modern specimens, much of the context can be done via associations with other datasets such as historical climate data or land use. For zooarchaeological specimens and data, there is a dense spatiotemporal site and cultural information and its complex interpretations. Standards such as the Darwin Core provide critical community definitions that standardize how we report specimen data and metadata, but its greatest strength for our work is that it can be flexible enough to support some critical, zooarchaeology-specific content. It does this by providing a lot of space for adding additional content, which itself can be standardized enough to make it discoverable. As an example, the ZooArchNet project has created a Darwin Core “extension” to deal with the sticky issue of chronometric data (the “age” of the specimens as measured by carbon-14 or other means) which has not, until recently, been something that could be easily recorded in the Darwin Core. The ability to integrate this essential data makes the Darwin Core really ideal for adaptation to archaeological data. But as Kitty mentioned, there is more needed still for proper provisioning for such records, and it’s really how we can link data across infrastructures that is crucial here.

When information professionals talk about data sharing, one theme that commonly arises is how to balance interoperability with domain-specific reporting, formatting, and metadata practices. Tailor data sharing standards too narrowly to specific communities and you risk siloing information; impose universal standards and the data will lose value in many research contexts. How does ZooArchNet navigate that tension?

Kitty Emery: This is a great question!! And it is an issue that we have grappled with in many ways. As mentioned above, archaeoinformatics has come up with some stellar solutions like LOD, that allow us to link or map our data to other types of data without the requirement that anyone change their original data terminology. That ability to link also allows us to separate somewhat the detail that is offered in different data fora — so, for example, ZooArchNet publishes biological data in iDigBio, Global Biodiversity Information Facility (GBIF), and other biological data repositories in great detail and bundles limited cultural data in metadata-type fields, but it also publishes cultural data in Open Context or other archaeological data repositories in great detail. Both sides of the coin then have the ability to embed the published dataset into their larger datasets, while providing links between the platforms so that users can pull out whatever combination of details suit their research questions.

Both archaeological and biodiversity informatics methods have recognized the need to allow for flexibility, too, in what is reported. Locations of sacred sites and endangered species cannot be openly reported to the general public, so site locations are “buffered” in ways that are agreed upon by the various data communities and are reported in both locations as metadata. And there are many other ways that we are working toward a good balance between “standards” and “domain detail.” It’ll take a while to get it all right, but that is the goal.

Rob Guralnick: Wow. I just discussed this topic and how critical it is to “find balance” in both infrastructure and standards that are used at the base of that infrastructure. One reason why I think zooarchaeological specimen data mobilization is only happening now is that there have been maturing infrastructure(s) in different disciplines that are now ready to handle the challenge. Additionally, l the people behind those infrastructures are starting to really embrace interdisciplinary efforts to link content together smartly. Those two things have powered ZooArchNet, and I think in a broader scope this is something of a revolution.

What are the most important supports biologists and anthropologists need in order to cultivate a thriving data community?

Kitty Emery: Money? A life-long sabbatical to get the work done? No really, we need open collaborative minds and the time and ability to get together and hash through our domain specifics so that we can understand how to link our data most effectively — and what aspects of our various study areas will result in limitations no matter what we do and so must be carefully explained as metadata so that the non-specialists from either side will understand the methods and assumptions of the data they will be using. As an example, paleontologists and zooarchaeologists use fossils and zooarch specimens as our basic counting unit — when I say I have one specimen, I mean one boney bit. Neontologists (biologists who study modern life forms) like Rob use individuals, so when he says he has one specimen, he means one individual. So if a neontologist combines my count of 100 deer bones, all from perhaps three individuals, with his count of 100 individuals, his conclusions about animal distributions will be wrong. It’s a simple explanation because we can define our units in the metadata, but there isn’t yet a way to embed that definition in the dataset in such a way that a quick glance will warn the user. And without a lot of chatting together, we aren’t aware of even these minor differences between our methods.

Rob Guralnick: Ha! More time and money! So true. But neither is sufficient to solve this — I agree with Kitty. But I’ll phrase it slightly differently: we need to foster communities of practice and we need to make sure the credit models are there so that people can feel rewarded, beyond sense of pride (which also matters a lot). Kitty also touches on the shared language aspects I mentioned earlier. So true!

What would you say has been the greatest success of the emergent zooarchaeological data community so far?

Kitty Emery: Well first I want to emphasize that the zooarchaeological data community is by no means “emergent” because zooarchaeologists and other environmental archaeologists have been among the earliest proponents of the shared research use of archaeological data for answering questions about our past. And interestingly, when the “top questions” of archaeology are described (see, for example, Kintigh et al.), many of them point at the environmental archaeologies as a valuable source of data. What is really “emerging” out of ZooArchNet and other such initiatives (Neotoma for example) is the realization that we absolutely must cross the disciplinary boundary in a very detailed and functional way before we can answer the questions that we’ve been working on for so long. We need a lot more data from all sides of the equation to answer our fundamental questions, but we can’t enter the “big data” arena if we can’t combine (in the case of zooarchaeology) our archaeological and biological data. So ZooArchNet is really just providing a new avenue to the research questions that the zooarchaeology data community has been tackling for a long long time.

What’s on the horizon for zooarchaeological data sharing?

Kitty Emery: So far a lot of our focus has been on just getting the data to speak to each other and be mutually accessible in appropriate ways. Our next exciting adventure is going to be to build the community and the available data so that we can start answering the big questions we have been aiming at since the beginning. Once we have big enough datasets that span the paleo-, archaeo-, and neontological time periods, we can really get a handle on how animal populations have changed through time, how the human-animal-environment synergy has developed over time, and how we can learn from all that to improve the outlook for our next couple of centuries — or if we get it right, the next couple of billion years, although only if we act fast.

Interested in developing or improving research data services on your campus? Ithaka S+R is embarking on a collaborative research project on Supporting Big Data Research. Participants will work alongside Ithaka S+R conduct a deep dive into their faculty’s needs and craft evidence-based recommendations. To find out more about having your library participate as a research site for this project or future projects, please email danielle.cooper@ithaka.org.

Topics:

Data management

Libraries

Research practices

Scholarly communication

Tags:

Data communities