In recent months, several publishers have announced that they are licensing their scholarly content for use as training data for LLMs. These deals illuminate how major publishers are grappling with their strategy amid uncertainty. To understand the dynamics around this fast-developing market, Ithaka S+R has launched a tracker of these licensing deals. In it, we catalog (when the information is available), the publisher, the purchaser, the deal type and size, and the impact of and strategy behind the deal. Roger Schonfeld provides more analysis about these deals in his Scholarly Kitchen piece, “Tracking the Licensing of Scholarly Content to LLMs.” 

So far, several major publishers have announced deals. For them, there is a substantial near-term revenue upside. The basic idea behind these deals is to generate revenue for the publishing house in exchange for easy, reliable, and legal access to the content for the LLM.  A number of companies are in the hunt for this content, including not only OpenAI and Google but also Apple and more specialized providers. 

Thus far a standard set of terms or overall model from which to build these deals has yet to emerge. Pricing of course is at the top of mind for everyone, but there are many other considerations as well. There are technical and reputational questions about how corrections or retractions will propagate through an LLM and whether an author can opt out, and there are business model issues such as whether provenance will be tracked through the output from an LLM such that a citation or link can be provided back into the scholarly record, just to take several straightforward examples. 

We will update the tracker periodically. You may also access the tracker as a Google Sheet. If you are aware of other deals that we have not yet documented in this tracker, please contact us using the form below.


If you have questions, comments, or suggestions about the tracker, please contact us.

Name(Required)
Would you like to be contacted about Ithaka S+R's Gen AI initiatives?