Data-driven decision making brings with it—for policy makers, advocates, businesses—the promise of objectivity. In some cases, this can instead be the illusion of infallibility. We don’t doubt our ability to make smart decisions with well-analyzed data, but what about the origins of that data?

Over this year, Joseph Esposito, Roger Schonfeld and I have been conducting a research project studying the acquisitions of academic libraries, towards the end of better understanding various trends among vendors, publishers, disciplines and formats. We started where we often start: talking with librarians to try to better understand how they work. Each librarian we spoke with knew how to run robust reports from their own databases, and most were more than happy to do so. But we noted that that among the group there were discrepancies: each had their own definitions of what a “book” was or whether they owned it.

Our first priority was to understand existing workflows and standards. We needed to craft a data request that was sufficiently broad, so multiple institutions could accommodate it, but also sufficiently granular, to allow us to explore the data and discover key findings.

After a series of conversations we felt that cloud based Integrated Library Systems—which help libraries manage a slew of records, from MARC to Acquisition to Order—might provide a solution. Systems designers are making different choices about how much flexibility to afford libraries and how much standardization to impose, for example using preset vendor codes and publisher names. Libraries lose some flexibility with this latter approach, but standardized data fields lead to better data sharing among institutions.

In collaboration with the Product Working Group for Alma (Ex Libris’s cloud based ILS) we have managed to construct a report that can be made available to Alma customers with a single click. The report provides us with longitudinal data for materials purchased at the item level. No post processing is necessary, which minimizes the threat of human error.

To me, this is a salient reminder of the value of protocol in data management. Strict protocols add value to information, and can allow organizations to make creative insights as a result. Whether designing systems for institutions or internal workflows, the values of flexibility and convenience must be balanced with an adherence to pre-existing standards if the richness of that data is to be extracted.

We are now moving ahead with designing reports for other cloud systems and hosted ones as well. I expect to see quite a range of possibility in terms of data extraction and analysis. In addition to analyzing the data, we hope that sharing some of the process we are experiencing in data gathering will be helpful to the community.