Desirable Characteristics of Persistent Identifiers

John Chodacki; Todd Carpenter; Maria Gould

doi:https://doi.org/10.54900/c3hdq-0ev76

Considerations in the context of open scholarship and open infrastructure

Persistent identifiers (PIDs) in scholarly communications and research infrastructure have garnered growing attention over the last several years, especially from governments who are recognizing the vital role PIDs play in creating a more efficient and trustworthy research ecosystem. However, varying degrees of familiarity in the subject result in questions regarding the specific attributes that constitute a good PID and/or what features are most desirable in PID infrastructure. In this post, we attempt to answer these questions by exploring various guiding principles for research infrastructure that have been articulated over recent years, examining their relevance and application to the domain of PIDs. With these principles as contextual resources and guidance, we propose a set of desirable characteristics.

Background

Several countries, including Australia, Canada, Finland, Korea, the United Kingdom, and more, are at various stages of developing national PID strategies to help guide best practices in support of open scholarship and innovation. In the US, a set of recommendations entitled Developing a US National PID Strategy, utilizing the framework created by the Research Data Alliance, was created in late 2023 and early 2024 in collaboration with members of the Higher Education Leadership Initiative for Open Scholarship (HELIOS) and the Community Effort on Research Output Tracking workstreams organized by the Open Research Funders Group (ORFG). The recommendations in “Developing a US National PID Strategy” outline guidance for national research stakeholders on the use and adoption of persistent identifiers. But what are the key factors stakeholders might need to be aware of when determining when and how to implement a persistent identifier strategy? Our aim in this post is to highlight some underlying principles that should be understood and considered.

Overarching and guiding principles

Inherent to the adoption and success of any PID is trust in the underlying infrastructure, in its data, and in the associated services that are available to the community. To develop that trust, the infrastructure upon which these PIDs rely is best tied to existing frameworks for open infrastructure and corresponding guiding principles. Below we outline some of these principles and their application as they relate to PIDs, building and expanding on community-developed resources and guidance.

FAIR (Findable, Accessible, Interoperable, and Reusable) Principles

The FAIR Principles are not just guidelines for research data management but also a pivotal framework for understanding the important function of PIDs as powerful enablers of open research, ensuring that scholarly outputs are not only identifiable but also contributing to the broader ecosystem of knowledge in a meaningful way.

Findability: PIDs inherently excel in making digital objects findable. Each PID is unique and persistent, offering a stable reference point irrespective of a resource’s current location or format. This unique identification, coupled with rich metadata, enhances the visibility and discoverability of research outputs across various platforms and services. To further align with FAIR principles, it is crucial that the metadata associated with PIDs is detailed, accurate, and openly available, thus facilitating efficient search and retrieval by both humans and machines.
Accessibility: While PIDs ensure that digital objects can be uniquely identified and found, the principle of accessibility mandates that these objects are readily obtainable. This doesn't imply unrestricted access to the content itself, which may be bound by legitimate considerations such as privacy or commercial constraints, but rather the accessibility of metadata. This metadata should be accessible even when the digital object is not, providing essential context and allowing stakeholders to understand the nature and terms of access to the data.
Interoperability: Interoperability is a core tenet of the FAIR principles, emphasizing the need for digital objects to play well within the larger ecosystem of data and services. PIDs contribute to this by adopting standardized identifier systems and metadata schemas, enabling seamless integration and interaction among different data systems and repositories. This interoperability is vital for collaborative research, allowing data from disparate sources to be combined and analyzed in a cohesive manner.
Reusability: The ultimate goal of the FAIR principles is to ensure that digital research outputs can be reused effectively, contributing to the cycle of knowledge creation. PIDs support this by providing a persistent link to the context, provenance, and conditions of reuse associated with the digital object. Clear, accessible metadata plays a critical role here, detailing the license under which the data is made available and any related restrictions or obligations. This clarity not only fosters trust but also encourages the integration and reuse of data in new research contexts.

To fully leverage the potential of PIDs within the framework of open scholarship, it is essential that PID strategies are consciously designed with the FAIR principles in mind. This involves not just the technical implementation of PIDs but also the policies, governance, and community engagement that surround them. By doing so, we can ensure that these PIDs do more than just point to digital objects; they enhance the openness, efficiency, and collaborative potential of the research ecosystem.

The Principles of Open Scholarly Infrastructure (POSI)

POSI, originally developed in 2015 and updated most recently in 2023, “offers a set of guidelines by which open scholarly infrastructure organisations and initiatives that support the research community can be run and sustained”. Most relevant for our guidance here on PIDs are some of the considerations that POSI lists under governance, particularly that open infrastructure should have:

Coverage across the scholarly enterprise, being inclusive to a broad range of academic disciplines, geographies, institutions, and actors;
Stakeholder governed, since this “builds confidence that the organisation will take decisions driven by community consensus and a balance of interests”;
Transparent governance, since this helps build and maintain trust in the community; and
Incentives to fulfill mission, particularly that “organisations and services should regularly review community support and the need for their activities”.

All of the above, while certainly a non-exhaustive list, is crucial to ensuring that any PID and its supporting infrastructure are designed with community needs at the forefront, continue to prioritize these needs and respond as they evolve, and overall build the trust that is necessary for the PID to be adopted and integrated into research systems to its fullest extent. Additional important considerations taken from POSI are that open infrastructure should be open source and that data should be both easily and openly available. These factors will be key to ensuring that equitable access and use of PIDs or PID services are not restricted just to well-resourced actors in certain institutions, disciplines, or geographical regions.

Open Science Toolkit

Open Science Toolkit, released in 2022 by UNESCO as part of the implementation phase of its Recommendation on Open Science, includes a guide on Bolstering Open Science Infrastructure for All, further fleshes out what UNESCO thinks are “factors [that] need to be considered by those who develop, fund and/or use open science infrastructures”. These include:

● Transparency of costs and benefits;
● Interoperability to enhance re-use;
● Cooperative co-creation;
● Shared attention and benefits; and
● Harmonization with open science policy and monitoring.

It is important to note that under ‘Core infrastructure’, UNESCO writes, “Due attention should be given to unique persistent identifiers of digital objects”, making it clear they consider PIDs a crucial part of the larger open science infrastructure to which the above considerations apply.

Scholarly Communication Infrastructure Guide

The Scholarly Communication Infrastructure Guide, released by the HELIOS Shared Infrastructure Working Group in 2023 built on UNESCO's vision of open science infrastructure. This guide emphasized transparency, collaboration, technical capabilities, governance, public access compliance, and timeliness. While not pushing for particular providers or solutions, it walks institutional decision-makers through the important questions they should ask when deciding to buy, build, or partner on scholarly infrastructure. These same questions could be adapted and asked when deciding whether to adopt particular PIDs or related services. For example:

Are short-term pricing and long-term costs transparent and affordable?
Are there opportunities to partner with open-source or non-for-profit providers?
Does the infrastructure rely on standard, interoperable formats and protocols?
What is the sustainability plan for the governing organization(s) supporting the infrastructure?
Are users able to provide guidance on its direction, usability, and development?
Is the infrastructure owned and/or governed by members of the academic community?
Will the infrastructure enable meeting funding agency policy requirements?

Guidance for Implementing National Security Presidential Memorandum 33 (NSPM-33)

Guidance for Implementing National Security Presidential Memorandum 33 (NSPM-33) on National Security Strategy for United States Government-supported Research and Development was released in 2022 by the National Science and Technology Council. The report has a section which outlines the new US federal requirements on PIDs, and under ‘Common/core standards that a [PID] service should meet’, they indicate that it should be “Provided by an open, non-proprietary, researcher-driven platform”, and interoperable with international standards. These guidelines align well with the POSI, UNESCO, and HELIOS recommendations.

Overall, we believe that—especially when combined—the community-developed infrastructure principles described above provide a solid foundation to guide the adoption of PIDs and related services. The above factors (cost transparency, governance, interoperability, etc.) must be carefully considered in the context of PIDs. We also must keep in mind that the desirable move towards ubiquitous uptake of PIDs also needs to be considered from a perspective of inclusion, where benefits translate across disciplines, geographies, and institution types. However, we also offer additional related but more specific considerations for PIDs below.

PID basics and applications of principles

A number of organizations have described ideal principles for PID infrastructure, such as the European Open Science Cloud (EOSC) and its PID Policy and the ISO Subcommittee on Identification and Description (ISO TC 46/SC 9) and its Principles of Identification. Across these many efforts, PIDs are similarly defined as stable references to a digital object designed to address the two core functions of identification (by assigning a unique and unambiguous designation to a digital object or entity) and persistence (providing long-term and stable access to that thing regardless of where it might be located or how it might change over time).

Furthermore, as is defined on the Department of Energy (DOE) Office of Scientific and Technical Information (OSTI)'s website, “A persistent identifier (PID) is a digital identifier that is globally unique, persistent, machine resolvable, has an associated metadata schema, identifies an entity (e.g. person, researcher, publication, award, organization, or research output), and is frequently used to disambiguate between entities. PIDs are long-lasting, managed, and registered unique digital references (often in the form of a URL) to an object that can be represented or described online.”

This means that to be considered a “persistent identifier” or a “PID”, an identifier and its underlying infrastructure must capture robust metadata and make that metadata available in consistent and reliable ways. In addition, this means that to be considered a PID, it also must exhibit persistence over time, which can be facilitated and perhaps best achieved when there is community ownership and public governance to ensure sustainable infrastructure support and widespread adoption. When working correctly, PIDs interoperate and reference each other within research discovery and management systems, linking descriptive information to other objects, without restrictions.

Another guiding source for desirable characteristics and principles for PIDs is the foundational 2017 article “Identifiers for the 21st Century,” which emphasizes several key criteria for PID infrastructure:

● Built for connection and expansion
● Promoting interoperability
● Filling gaps specific to research communication
● Ensuring open availability of metadata
● Establishing persistent identifier trustworthiness
● Emphasizing community ownership and governance
● Fostering organizational sustainability

These characteristics, rooted in the principles of openness, transparency, and community involvement, align with the evolving landscape of research communication and support our vision of PIDs as essential tools in advancing open scholarship.

Inspired by these resources, we recommend focusing on the following characteristics for PIDs to maximize the promise of open scholarship.

Desirable Characteristics of PID Infrastructure

Using these definitions and criteria as background, we recommend the following as a desirable characteristic of PID infrastructure in the US and beyond:

Open availability of core metadata: All PIDs should include freely available and open exchange of basic metadata (through services such as data dumps, feeds, APIs, or other forms of machine access). More robust access, higher-throughput services, and real-time data access may be provided for a fee to support the infrastructure system, but basic services and openly licensed metadata should be available for integration and reuse in other systems.
Use of well-established resolver services: A core value derived from using PIDs is the ability to use the identifier to link various elements of the ecosystem together, by resolving the PID to more information about the object to which it refers. Connecting the ecosystem in this way reduces duplication of data entry that could lead to errors and allows it to be easily maintained across disconnected systems. Examples of such services include doi.org, hdl.net, identifiers.org, and n2t.net. These resolver services help ensure that URIs remain functional even if underlying resources change.
Documentation of identifier policies: PID providers should document and publish their identifier policies alongside schema descriptions so that users and other actors can access and understand how identifiers are handled within their systems. This helps provide clarity and transparency about how identifiers are assigned, managed, and referenced. Policies should include information related to identifier management, versioning, handling changes, avoiding reassignment, and ensuring persistence. These policies are essential for ensuring consistency, transparency, and reliability in using PIDs in the scholarly ecosystem.
Monitoring and reporting services: PID providers should actively monitor assigned PIDs to ensure that they remain functional. If any of the referenced URIs become “dead” or nonfunctional, PID providers should have ways to report the issue to the original data provider and/or the community.
Ease of assignment/metadata creation and curation: The assignment of PIDs is a critical stage in the deployment of identification systems. While each PID system handles assignment and the record metadata creation process differently, this process should be as simple and as user-friendly as possible to facilitate the use of the system. PID assignment should be as closely associated with the creator and the creation event as practical. Responsibility for creating PIDs and maintaining metadata records after the PID is assigned should be managed using best practices for user-centered design. Users of PID systems should be engaged in the management process.
Standardized structures, metadata, and services that allow for community input: Consistency in how the community accesses the data, how data is structured, and what services are made available is driven by standardization of the PID system. These structures need to be driven, in part, by community consensus processes to ensure the robustness of the service is suitable for a diverse user base.
Extensibility: No system can be developed to serve every use case, nor can every implementation be projected. Therefore, PID systems should allow for extensibility and have a process in place to extend the system to adapt to new use cases and demands on the system.
Community governance: PID infrastructure systems should be accountable to the user community that adopts them. As such, a wider community should be involved in governance structures that manage the PID systems.

Looking beyond the identifiers themselves, and in agreement with the desirable characteristics outlined above, PID systems should also draw fundamentally from the Principles of Open Scholarly Infrastructure (POSI), UNESCO, and other PID guidance in how they are developed and managed. It is important that the organizations and services responsible for providing PIDs follow a similar set of best practices when it comes to their operations and governance models.

Conclusion

In a rapidly evolving digital landscape, PIDs stand as crucial components in the advancement and dissemination of research, underpinning the efficiency, transparency, and interconnectivity of scholarly communication. Their strategic implementation, guided by internationally recognized principles and practices, is essential in fostering an open and collaborative research environment. By aligning with frameworks such as FAIR and POSI, and incorporating insights from global initiatives, we can ensure that PIDs not only facilitate access to and reuse of research outputs but also contribute to a more inclusive and equitable scholarly landscape. Embracing these guiding principles and actively participating in the collective endeavor to enhance PID infrastructures will empower us to unlock the full potential of open scholarship, making research more discoverable, attributable, and impactful. As we move forward, it is imperative that the scholarly community, funders, and policymakers collaborate to champion the widespread adoption of PIDs, thereby ensuring that the fruits of research are accessible to all and that knowledge continues to build upon itself in a transparent and cohesive manner.

This post was derived from Section 3.2 of Developing a US National PID Strategy. The full list of working group members is listed in the final report.

For more information, please refer to:
Developing a US National PID Strategy. ORFG PID Strategy Working Group. 2024. Zenodo. https://doi.org/10.5281/zenodo.10811007