Fostering data and code sharing among scholars is an important component to fostering a culture of open research—but how can this work be done most effectively? At Ithaka S+R we are exploring the crucial contextual elements that optimize research data sharing. We’ve found that data communities—formal or informal groups of scholars who share a certain type of data with each other regardless of disciplinary boundaries—provide important clues to understanding how research data sharing works.

Identifying and supporting scholarly communities that are just beginning to develop an active, sustainable data sharing culture is an important strategy for those who wish to support data sharing. In order to understand how data communities can be built from the ground up, we are interviewing experts who are at the forefront of growing emergent data communities in a variety of research areas. We’ve highlighted promising developments in the areas of spinal cord injuries, literary sound recordings, and zooarchaeology.

Today we share about the Open Energy Modelling Initiative (openmod), which has about 600 listed participants, most who are full‑time researchers or analysts. Openmod has a strongly collective ethos and so this interview was conducted with Robbie Morrison serving as a facilitator with input on the responses provided by the entire community through their public forum.

What does openmod’s energy modeling community look like? What kinds of researchers are involved in this work (e.g. disciplinary and organizational affiliations), how do they collaborate, and what kinds of formal structures have been established to organize them?

The openmod community arose in Berlin, Germany in September 2014, and most of us engage with energy modeling with the goal of informing public policy for a rapid trajectory to net-zero carbon. The statutory reporting processes for assembling and publishing the types of data we rely on (more on that below) are often archaic and error prone, leading to poor quality disclosure. Projects within the openmod community assemble and curate this information so it can be more readily utilized by modelers and analysts. One such project is the OPSD portal.

Most people involved are completing their higher education or classify as early‑stage researchers. A few are mid‑stage researchers and beyond. And some work for consultancies, companies, start‑ups, or government agencies.

Geographically, the community started in the German‑speaking DACH world, later spread to the United States, and is now making inroads into the United Kingdom. Other participants are sprinkled throughout the planet, including the Russian Federation, India, and the Global South. Aside from the first workshop, the working language has always been English.

The community has no formal structures. Its ethos derives from open source software development. By common understanding, those running the various online services or twice‑annual physical workshops are accorded complete dominion. The mailing list is the principal place for making community decisions.

Note that much of the discussion that follows below centers on European Union law—in part, because Europe provides a more restrictive legal context for data than that found in the United States. But this focus is equally a reflection of our roots.

How and what kinds of data are typically incorporated into energy modeling? What infrastructure is currently available to facilitate the sharing of this data among researchers?

Energy system models require general information about component technologies and their engineering and cost characteristics. Technologies such as wind farms, coal‑fired electricity generation, and high‑voltage transmission lines. Cost information is necessarily estimated in most cases because this information is normally commercially sensitive. Notwithstanding, the European Commission, as well as other governing agencies around the world, could collect cost and performance information under a public interest rationale and make key metrics available in generic form. Future costs and performance projections, which can also be subject to technological learning, are necessarily speculative.

Energy system models require specific details about the system being modeled—including the location, age, and connectivity of all represented assets. That includes information about the networks under investigation—usually the electricity grid but perhaps also gas and district heat infrastructure. Current and potential future demand profiles are needed. Location‑specific resource potentials are needed too, including solar and wind assessments and land availability. And possibly also information concerning the built environment and mobility, depending on the scope of the model. Some models may also require historical market clearance information or information on how households and firms may take short‑run and long‑run decisions.

The bulk of models capture national and supra‑national systems, but some research groups investigate municipal systems, islanded microgrids, and standalone systems. Most research questions provide natural boundaries. Information on future climate patterns is sometimes required, but this information can be readily sourced from the climate science literature and is not legally encumbered. Modelers do not generally deal with personal information—as defined under EU law. If such information is required for numerical models, it can normally be anonymized from real data, generated using estimated statistics, or otherwise synthesized—the key issue is that the information remains representative but need not be exact.

What infrastructure is currently available to facilitate the sharing of this data among researchers in your community?

Within the orbit of the openmod, the Open Energy Platform (OEP) is the primary resource. This platform is specifically designed to handle the needs of energy system modeling and, in particular, scenario analysis. Energy system modeling differs from other forms of computational science in that testable outcomes are not possible and a range of speculative scenarios—each with their own explicit objectives, constraints, and assumptions—must instead be analyzed and traded‑off against one another.

In addition, there are initiatives specifically aimed at allowing data to be transferred between different modeling projects in order to facilitate cross‑model comparisons. Each model has necessarily evolved its own data interface and internal semantics.

Why is open data sharing important to energy modeling? What are the typical positions on this issue among stakeholders engaged with energy modeling?

We adopt the European Commission description for open data (EU Directive 2019/1024, recital 16): “Open data as a concept is generally understood to denote data in an open format that can be freely used, re‑used and shared by anyone for any purpose.”

Data sharing reduces duplicated work, improves data quality and coverage, and facilitates cross‑model comparisons—that last point being necessary for strengthening confidence in both the direct results and in subsequent interpretations.

Conversely, data without appropriate open licensing may well be legally encumbered and this lack of certainty makes it unsuitable for open modeling.

What challenges or barriers to widespread data sharing are unique to research involving energy modeling?

Our primary challenge is the lack of open licensing, particularly on public sector information and information published under statutory reporting. European Union legislation on the terms of use of public sector information is unclear and contradictory, and legislation on energy sector disclosure is silent on licensing. These defects need fixing at the level of the European Parliament. The best that researchers can do until then is to push relentlessly for Creative Commons CC‑BY‑4.0 licensing on all such information.

That means that suitable open licensing is key. In most cases, such licenses do not grant binding permissions but rather confer certainty. Particularly given the presence of Directive 96/9/EC database protection within the European Economic Area (EEA) in which one cannot know if a data extraction from a public portal was insignificant or not.

The power exchanges that run the wholesale electricity markets are particularly resistant to providing disclosed information in any kind of usable form—and deploy techniques like serving data that cannot be highlighted and copied to evade recovery. This is certainly against the spirit of the legislation, even if technically compliant.

Another emerging problem is the proliferation of national open data licenses—such as the recent German Government dl‑de/by‑2‑0. Such licenses could well lead to legally siloed data when not inbound compatible with the CC‑BY‑4.0 license, even if only on some trivial legal point.

Data lacking CC‑BY‑4.0 licensing (or CC0‑1.0 waivers or something inbound compatible) is particularly problematic in the United Kingdom because the threshold for copyright is effort‑based, and addressable collections of data may also attract database protection. The situation in the United States is considerably better because datasets and databases are unlikely to be intellectual property. Europe falls somewhere in between.

What are the most important supports needed in order to cultivate a thriving data community among energy modelers?

Recognition by science funding organizations of several necessities would help. First, the need to require suitable licenses on all appropriate outputs. Second, support for ongoing maintenance, once the underlying data projects have completed. Third, to provide stable online archiving for non‑deliverable artifacts such as project websites, wikis, public mailing lists, and code repositories.

But beyond that, most solutions have to come from within the modeling community.

How is openmod working to address the open data sharing needs of the energy modeling community? Who else is doing important work in this area and what else is on the horizon?

For the openmod, the concept of genuinely open data was central from day one. But maturity has brought forward two vitally important related agendas:

  • a community ontology—a shared worldview
  • agreement on collection protocols and metadata—the latter being data about data

Both initiatives are interconnected, both involve deep buy‑in from within the community, and both will take significant effort to work through and bed‑in. The Open Energy Ontology is addressing the first and the EERAdata initiative is pursuing the second. The EERA and openmod communities have begun to work together on the latter.

Open is not the only paradigm for energy system modeling. Another is the closed consortium that effectively remains only within the reach of government ministries, multilateral agencies, and allied research teams. How that paradigm evolves in an increasingly open world remains to be seen. In any case, there is virtually no crossover between these two realms at present. A third paradigm is the single‑institution closed project—and again one whose future looks doubtful.

An upcoming challenge is the tracking of both data provenance and data versioning at scale—taken together these represent active research questions in computer science and are certainly not unique to the domain of energy analysis.

The prospect of supporting and using linked open data (LOD) is now surfacing. Some in our community are working with the DBpedia Databus project to explore the possibilities that semi‑smart knowledge graph systems can offer.

Returning to the present, another issue is dataset forking and fragmentation. Under this process, researchers grab whatever data they need for the issues at hand, modify it to suit their needs, and perhaps later publish as a static archive to support transparency and reproducibility. But any corrections and improvements are not propagated back upstream for wider uptake and benefit. LOD clearly has the potential to assist here.

Finally members from within the openmod community make written and oral submissions to European Union public consultation on law reform and science policy. Making one’s voice heard in such processes is an important and necessary activity.

Where can those who are interested learn more?