Persistent Identifiers, or “PIDs”, have become a popular topic for anyone involved in the research communications world. PIDs play a crucial role in the scholarly ecosystem by providing long-lasting references to and between all kinds of digital resources. PIDs usually consist of a unique identifier and a service that resolves resource references over time. While various types of PIDs are available, the Digital Object Identifier (DOI) is perhaps the most recognized in scholarly communities and has a particular focus on the ‘P’: Persistence. As organisations explore the world of PIDs and try to navigate what is increasingly becoming a subjective landscape with unintentional obfuscation, it becomes evident that the choice of identifier infrastructure can significantly impact global inclusivity and accessibility, especially for lower-income countries.
In this post, we present the facts about DataCite and Crossref DOIs and their infrastructures, their use, their growth, their governance, their transparency, and their cost—so that organisations that care about the rigour of the scholarly record can make more informed decisions. By doing so, we dispel some misconceptions that may risk damaging a healthy open research ecosystem for future generations.
DataCite and Crossref DOIs Can Lower Financial Barriers
One of the key advantages of DataCite and Crossref DOIs is their cost-effectiveness. Unlike other identifier systems that require substantial development resources to implement and maintain, DOIs offer a more budget-friendly alternative. Hidden costs associated with implementing some other PIDs, such as development and maintenance expenses, can often outweigh the perceived upfront savings, posing financial barriers to their long-term sustainability. These lower costs can make a significant difference for many organizations, particularly those in lower-income countries. It's crucial to recognize that DataCite and Crossref DOIs are not merely identifiers but valuable metadata resources for downstream services, fostering economies of scale within the entire research communications ecosystem. This collective and reciprocal network ultimately supports broader dissemination, reuse, and recognition of outputs and resources on a global scale.
Together, the nine scholarly-related DOI Registration Agencies have enabled the community to create and manage 300 million DOI records. These DOIs are used approximately 1.3 billion times every month - and growing. With an estimated 8.8 million researchers worldwide, that’s the equivalent of one article, dataset, preprint, report, sample or other output being read or used 150 times a month. The users are not just individuals but also the systems and tools that are baked into research and academia. These open, free-to-use, mineable DOI records are incorporated into platforms that span academia, government, and industry.
Recognising the disparity in trying to run a “global” system and meet the needs of the very wealthy as well as the least wealthy, both organisations have long-run programs that encourage all low-income countries to participate, offering education, support, and outreach alongside fee waivers and support programmes such as collective savings like consortia and sponsorship.
DataCite and Crossref Work Globally, Acknowledging There Is More To Be Done
DataCite is a worldwide community with members in 52 countries, as well as over 2,950 repositories in many other countries across the globe. It's important to note that repositories are not required to be members themselves; instead, they can be affiliated with a DataCite Consortium Organization or Member. Moreover, there are 20 national or regional consortia operating in emerging regions. DataCite's open infrastructure services are a valuable resource for repositories in emerging regions, including the National Academic Digital Repository (Ethiopia), International Institute of Tropical Agriculture, IITA (Nigeria), Université Gaston Berger (Senegal), Corporacion Ecuatoriana Para El Desarrollo De La Investigacion Y La Academia (CEDIA) (Ecuador), NRCT Data Center (Thailand), International Centre for Integrated Mountain Development Regional Database System (Nepal), and countless others across the globe.
Crossref has members in 151 countries, creating and stewarding DOI records. Crossref’s research of ISSN data shows that it already includes over 50% of the journals produced in the following low-income countries: Bhutan, Central African Republic, Kyrgyzstan, Mali, Sudan, and Tajikistan. And it is actively working towards filling in the gaps where it covers fewer than 49% of the journals published in the countries where research is also increasingly produced: Bangladesh, Cote d'Ivoire, Ethiopia, Ghana, Madagascar, Nepal, Nicaragua, Senegal, and Tanzania — among others.
DataCite’s GAP Program is an ongoing initiative to improve access and enable communities in lesser-represented regions to further benefit from DataCite's open infrastructure services. Crossref’s GEM Program has seen a tripling of members in some of the lowest-income countries in the world from 100 to 305 from January to September 2023. The two organisations, with other partners, are working on co-creating a guide to consider how academic and government bodies might collaborate at a country level to get involved with foundational global open infrastructures in a sustainable way.
Of course, more can definitely be done, and practical and informed ideas are always very welcome.
DataCite and Crossref DOIs Are Community-Governed Infrastructure
Responsible open infrastructure organizations do not operate as monopolies. Instead, they emphasize collective community governance and ownership. Most DOI Registration Agencies (RAs) exemplify this approach; they are predominantly not-for-profit and offer participatory models that extend beyond membership structures. Organizations can join these agencies at no charge, work through consortia or sponsors, or opt for annual membership fees, ensuring that financial constraints do not hinder accessibility. This community-driven governance model safeguards against monopolization and ensures that PIDs remain accessible to diverse stakeholders. Further, providing open data and open source code bases further safeguards the community investment in open infrastructure services.
Both DataCite and Crossref have tools that allow each member to manage their resources in the way that they want. Provenance metadata is a high priority so that everyone can see who is asserted as the steward of a research object, whether that's changed, who and how much is paid to maintain the record, and any other contributors and acknowledgements.
Every member gets a say in the governance, policies, fees, and key decisions that are made. By voting in or standing for board elections, joining fee committees or consultations, and approving budgets through open governance, thousands of institutions around the world are effecting change to make the whole system properly broadly governed and more and more inclusive as every month and year passes.
DataCite and Crossref DOIs are Persistent and Openly Available
Another advantage of Crossref and DataCite DOIs is their persistence. Once registered, the associated metadata is openly available without the need for further financial or resource commitments. For an initial one-time registration fee, each DOI record is maintained for free - forever. No registration fees are levied for the numerous updates and additional metadata added, which will continue to add and enrich the scholarly record for and with future generations. This is a crucial differentiator compared to some local identifiers, which may require ongoing investments to maintain accessibility. The stability and longevity of DOIs make them a reliable choice for organizations seeking to ensure the enduring accessibility of their digital resources.
DataCite and Crossref DOIs come with infrastructure support that isn’t free
The DOI itself is only one component of our activities; what can be done over and above a mere PID tells a very different—and fuller—story. A DOI is not just an identifier but a link, a locator, but even then, it's almost as useless as a URL; the numerous community-led initiatives that have extended and built upon this identifier have real community value. It's the metadata, the relationships, and the connection with other parts of the digital infrastructure. These initiatives take time, resources, volunteers, tools, consultation, and sometimes funding. Also, expertise and experience. Examples are:
- Building an open data citation corpus
- Co-creating and advising on FAIR projects
- Developing infrastructure to connect clinical trials to outputs
- Urgent flagging of ‘free-to-read’ COVID content
- Connecting funding and funders with outputs
- Founding and contributing to ORCID and ROR and countless other initiatives
- Text-based plagiarism checking and other research integrity tools
- Ability to track retracted, withdrawn, or corrected outputs
- Co-developing and supporting open-source tool development
- Creating a public resource by opening critical data about retracted papers
It’s surprising to occasionally hear the argument that DOIs should be free to create when the extensive support systems and initiatives developed by Crossref and DataCite are not free to run (nor are distributed systems, for that matter). The cost of supporting a functioning global infrastructure should not be underestimated. If the community wants a persistent and robust scholarly record, and if it wants a say in its governance, then it should be aware of the cost not just of technical things like resolver systems, data storage and APIs - but also of community engagement, collaboration, and technical support.
In 2024, Crossref will employ ten full-time staff and six contractors on membership support alone (that’s not including proactive outreach or engagement activities), plus the cost of the tools and systems needed to manage their 19,000 members and the 3500+ support requests the team receives each month. In 2024, Crossref is projecting an outlay of up to 1 million USD for data costs with physical and cloud storage and processing.
Similarly, DataCite has three full-time staff dedicated to supporting and partnering with emerging regions and three full-time staff members who support technical community support and best practice development. Particular emphasis lies in the coordination with more than 50 national or regional consortia to cater to their unique needs. With over 2,950 repositories globally and hundreds of new repositories joining the collective community effort each year, our efforts are directed towards fostering an open, global, interconnected ecosystem.
DataCite and Crossref also support and develop other PID services - both ORCID iDs and ROR IDs have been or are supported operationally, financially, or both. These evolved from community collaborations and continue thriving as essential open infrastructures interacting with DOIs.
We know that collectively, we benefit from economies of scale and can reduce costs as a community when we cooperate globally. It would be far more costly for the community to replicate Crossref and DataCite infrastructures at the national level, let alone at the level of each research-performing or research-publishing organisation.
When drawing an analogy to regional electric outlet systems, it becomes apparent that the issue of interoperability presents significant challenges.
In the context of persistent identifiers in research communications, adhering to universal design principles may, on the surface, seem to address interoperability issues. However, there is a potential risk of exacerbating fragmentation within the ecosystem, thereby disadvantaging emerging regions. Rather, we should advocate for collaborative—and global—initiatives aimed at preserving open infrastructure under responsible governance that avoids further fragmentation.
Collaboration and Gathering Context
In discussing the merits of various identifier systems, gathering insights and context from stakeholders with practical experience in implementing both is vital. Hypothetical arguments should be avoided, as DataCite and Crossref actively collaborate with communities to address real-life challenges by enhancing their services.
We recommend that people explore the facts. Ask questions such as “what is the expected resource requirement for us now and in the future?”, “who is on the board?” and “how can I get on the board?”, “how open is the metadata, i.e. is it both available and accessible?”. “What is the likelihood of this going away?” and “What safeguards are there for long-term preservation?”. Find out what training and support is offered. Talk to your local community that speaks your language, and ask the members in 151 countries using DataCite and/or Crossref, e.g., an Ambassador, Consortia Lead, or Sponsor.
In conclusion, DOIs through DataCite and Crossref represent a cost-effective, community-governed, and persistent solution for identifying and referencing digital resources. By embracing DataCite and Crossref DOIs, the global scholarly community is empowered, reducing financial barriers and fostering broad creation, dissemination, and recognition of research outputs and resources.
DataCite and Crossref are actively investing in working on community collaboration, making it essential for stakeholders to engage in constructive dialogues that contribute to the ongoing improvement of research services. If you tell us more specifics about your needs and experience, we’ll tell you more about the reality of running infrastructure, and we’ll figure something out to help you get involved. Together, we can build a more equitable and accessible scholarly landscape for all communities and all countries.
Thanks to the following people for reviewing and adding context to this post: Britta Dreyer, Ed Pentz, Helena Cousijn, and John Chodacki.
Copyright © 2023 Ginny Hendricks, Matt Buys. Distributed under the terms of the Creative Commons Attribution 4.0 License.