Abstract
Digital Object Identifiers (DOIs) have fundamentally transformed the landscape of scholarly communication by establishing a robust framework for persistent, unique identification of digital research outputs. This comprehensive report meticulously examines the intricate journey of DOIs, from their conceptual genesis and historical evolution to their sophisticated technical underpinnings, including the pivotal role of the Handle System. It delves into the multifaceted applications of DOIs across a diverse spectrum of research materials, encompassing traditional scholarly publications, burgeoning research datasets, critical software and codebases, and an array of other digital objects pertinent to academic discourse. The paper provides an in-depth exploration of the ecosystem of DOI registration agencies, elucidating their governance structures, operational mandates, and the distinct contributions of prominent entities such as CrossRef and DataCite. Furthermore, it articulates advanced best practices for the seamless integration and effective implementation of DOIs within institutional repositories, emphasizing the imperative of meticulous metadata management and sustained persistent linking strategies. A significant portion is dedicated to analyzing the profound impact of DOIs on enhancing research discoverability, ensuring accurate citation and attribution, fostering compliance with open science principles, and promoting interoperability within the global research infrastructure. Finally, the report critically addresses persistent challenges related to DOI maintenance, long-term persistence, the sustainability of the underlying infrastructure, and explores future trajectories for these indispensable identifiers in an increasingly digital and interconnected research environment.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
1. Introduction
The digital revolution has profoundly reshaped the production, dissemination, and consumption of scholarly knowledge. The sheer volume and velocity of digital research outputs — ranging from journal articles and books to datasets, software, and multimedia — present unprecedented challenges for their reliable identification, stable access, and precise attribution. Traditional methods of locating and referencing digital content, primarily Uniform Resource Locators (URLs), proved inherently fragile. URLs are prone to ‘link rot,’ where the underlying content moves, disappears, or changes its address without warning, rendering previous citations obsolete and hindering the long-term accessibility of scholarly work. This pervasive issue threatened the very foundations of academic integrity, reproducibility, and the cumulative nature of scientific progress.
In response to these critical challenges, the concept of a persistent identifier emerged as a fundamental necessity. The Digital Object Identifier (DOI) system was conceived as a groundbreaking solution, offering a standardized, actionable, and persistent means to identify any intellectual property in the digital environment. Unlike URLs, which indicate the location of an object, a DOI identifies the object itself, irrespective of its current location. This distinction is crucial: if a digital object’s URL changes, its DOI remains constant, and the system is designed to update the link associated with that DOI, thereby ensuring enduring access. This paper undertakes an extensive investigation into the architectural, operational, and strategic significance of DOIs, dissecting their historical trajectory, technical mechanisms, diverse applications, and profound implications for the future of scholarly communication, data management, and the broader open science movement.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
2. Evolution and Technical Mechanisms of DOIs
2.1 Historical Development
The genesis of the DOI system can be traced back to the mid-1990s, a period marked by the burgeoning internet and the rapid digitization of information. Publishers and information professionals recognized the urgent need for a robust system to manage and persistently identify digital content in a dynamic online environment. The Association of American Publishers (AAP) was instrumental in initiating discussions and exploring solutions, ultimately leading to a collaboration with the Corporation for National Research Initiatives (CNRI), an organization renowned for its work on persistent identifiers and the Handle System [1, 2].
The International DOI Foundation (IDF) was formally established in 1998 to manage and govern the DOI system. The IDF, a not-for-profit organization, was tasked with overseeing the technical infrastructure, promoting the adoption of DOIs, and developing policies and standards for their use. A pivotal milestone in the formalization of DOIs was their standardization by the International Organization for Standardization (ISO) under ISO 26324:2012, titled ‘Information and documentation – Digital object identifier system’ [1, 5]. This international standard provides a formal specification for the DOI system, defining its syntax, resolution mechanism, and functional requirements, thereby solidifying its status as a critical component of global information infrastructure.
Over the years, the DOI system has evolved from its initial focus on journal articles to encompass an ever-widening array of digital objects, adapting to the diverse needs of the scholarly community. Its development has been driven by a collaborative effort involving publishers, libraries, data centers, funding agencies, and researchers, all committed to enhancing the discoverability, citability, and long-term accessibility of intellectual outputs.
2.2 Structure and Resolution
A Digital Object Identifier is not merely a string of characters; it is a precisely structured identifier designed for machine readability and persistent resolution. Each DOI name is unique and consists of two main components: a prefix and a suffix, separated by a forward slash (/). This structure can be represented as doi:10.xxxx/yyyyy:
- Prefix: The prefix identifies the registrant (an organization, such as a publisher, data repository, or university) responsible for assigning the DOI. It always begins with ’10.’, followed by a unique number assigned by a DOI Registration Agency (e.g.,
10.1000is the general prefix for IDF test DOIs,10.1038for Nature Publishing Group,10.5281for DataCite). This ensures global uniqueness at the registrant level. - Suffix: The suffix is assigned by the registrant and uniquely identifies a specific digital object within that registrant’s domain. Registrants have considerable flexibility in how they construct suffixes, often incorporating internal identifiers, version numbers, or metadata elements. The key requirement is that the suffix, when combined with the prefix, creates an identifier that is globally unique for that specific object.
For example, in 10.1038/nature12345, 10.1038 is the prefix for Nature Publishing Group, and nature12345 is the suffix assigned by Nature to a particular article. The DOI itself is the immutable identifier; the associated URL can change, but the DOI remains the same.
The magic of DOI persistence lies in its resolution mechanism, which is fundamentally underpinned by the Handle System [1, 2]. Developed by CNRI, the Handle System is a comprehensive, open-source distributed information system that provides a general-purpose service for assigning, managing, and resolving persistent identifiers, known as ‘handles’. The Handle System is not exclusive to DOIs but serves as the core technical infrastructure that enables their functionality.
When a user clicks on a DOI link (typically formatted as https://doi.org/10.xxxx/yyyyy or similar), the resolution process unfolds as follows:
- Request Initiation: The browser sends a request to a DOI resolver, often
https://doi.org/. This global resolver acts as a central access point. - Handle System Interaction: The DOI resolver interrogates the Handle System. The Handle System is a distributed network of servers that maintain a database of handles (DOIs in this case) and their associated data, which typically includes one or more URLs where the identified object can be found.
- Redirection: The Handle System retrieves the current URL(s) associated with that specific DOI and directs the user’s browser to the current location of the digital object. This redirection is typically an HTTP 302 or 303 redirect.
This architecture ensures that even if a publisher migrates its content to a new server, changes its domain name, or reorganizes its website, the underlying DOI remains valid. The registrant merely needs to update the URL associated with that DOI within the Handle System via their registration agency. This separation of identifier from location is the cornerstone of DOI’s persistent nature, guaranteeing that a citation to a DOI will reliably lead the user to the intended resource, regardless of how its digital location might evolve over time [3, 4]. The Handle System also supports the association of multiple URLs with a single DOI, providing redundancy and robustness in content delivery.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
3. Applications of DOIs Across Research Outputs
DOIs were initially conceived to address the challenges of identifying journal articles, but their utility quickly expanded to encompass a vast and growing array of digital research outputs. This broader application reflects the increasing diversity of scholarly products and the imperative to ensure that all contributions to knowledge are properly identified, cited, and attributed.
3.1 Scholarly Publications
Traditional scholarly publications remain the primary domain for DOI application. This category includes:
- Journal Articles: The most common use case. DOIs provide a stable, unambiguous link to an article, simplifying citation practices and ensuring long-term access, even if a journal changes publishers or platforms. This is crucial for impact tracking and academic discovery [6].
- Books and Book Chapters: Publishers assign DOIs to entire monographs, edited volumes, and individual chapters within them. This allows for precise citation of specific sections of a book without relying on potentially unstable page numbers or complex chapter URLs.
- Conference Papers and Proceedings: As conferences increasingly publish their proceedings online, DOIs for individual papers or entire volumes ensure these valuable contributions are archived and discoverable. This is particularly important for fields where conference publications are a primary mode of knowledge dissemination.
- Preprints and Postprints: The rise of preprint servers (e.g., arXiv, bioRxiv, medRxiv) necessitates DOIs to identify early versions of research. CrossRef allows linking preprints to their eventual peer-reviewed published versions, creating a transparent publication history. Postprints, or accepted manuscripts, deposited in institutional repositories also benefit from DOIs to provide persistent links to these open access versions [10].
- Theses and Dissertations: Academic institutions frequently assign DOIs to electronic theses and dissertations (ETDs) deposited in their institutional repositories. This practice elevates the visibility and citability of graduate research, making it easier for future scholars to discover and build upon these foundational works.
For scholarly publications, DOIs streamline the citation process, facilitate interoperability between bibliographic databases and reference management software, and bolster the integrity of the scholarly record by minimizing broken links.
3.2 Research Data and Datasets
The increasing emphasis on open science, data sharing, and research reproducibility has propelled the application of DOIs to research data and datasets. Assigning DOIs to data significantly transforms how data is managed, shared, and recognized:
- Enhanced Visibility and Citability: A DOI makes a dataset a citable research object, just like a journal article. This encourages researchers to publish their data in reputable data repositories (e.g., Dryad, Figshare, Zenodo, institutional data repositories) and enables others to cite it accurately. This addresses the historical challenge of researchers receiving proper credit for their data generation efforts [3, 4].
- Adherence to FAIR Principles: DOIs are foundational to the FAIR (Findable, Accessible, Interoperable, Reusable) data principles. They make data Findable by providing a unique identifier, Accessible by enabling persistent linking to a landing page, contribute to Interoperability by facilitating machine-readable links, and support Reusability by ensuring persistent access to data and its associated metadata [14].
- Versioning and Granularity: Data repositories often implement versioning systems for datasets. DOIs can be assigned to specific versions of a dataset, allowing researchers to cite precisely the data version used in their analysis. This is critical for reproducibility, as analyses performed on different versions of data may yield different results. DOIs can also be assigned to granular components of a dataset, such as individual files or subsets, offering flexibility in data citation.
- Metadata Integration: When DOIs are assigned to data, they are inextricably linked to rich metadata that describes the dataset (e.g., creator, publication date, data type, spatial/temporal coverage, methodology). This metadata is crucial for understanding, reusing, and validating the data, further enhancing its value [8]. DataCite is a leading registration agency specifically dedicated to DOIs for datasets and other non-traditional research outputs.
3.3 Software and Code
Software and code are increasingly recognized as fundamental research outputs, particularly in computational sciences, digital humanities, and engineering. Historically, software citation has been problematic, leading to a lack of attribution for developers and difficulties in reproducing computational results. DOIs provide a robust solution:
- Formal Attribution and Credit: Assigning a DOI to research software or a specific version of a codebase ensures that the creators receive formal credit for their intellectual work. This incentivizes software development and maintenance within academia and contributes to career progression [12].
- Reproducibility and Transparency: A DOI for software provides a stable link to the exact version of the code used in a research project. This is vital for reproducibility, allowing other researchers to access, inspect, and run the same code, thereby verifying computational results. Repositories like Zenodo (which integrates with GitHub) allow researchers to easily archive and assign DOIs to their code repositories.
- Versioning Control: Software evolves rapidly. DOIs are instrumental in distinguishing between different versions of a software package, allowing precise citation of the specific version used in a given study. This prevents ambiguity and ensures that future researchers can trace the exact computational methods employed.
- Documentation and Licensing: When assigning a DOI to software, it is typically linked to a landing page containing comprehensive documentation, dependencies, installation instructions, and licensing information, all of which are essential for proper reuse.
3.4 Other Digital Objects
The versatility of the DOI system allows its application to an ever-expanding array of digital research outputs, reflecting a holistic view of scholarly contributions:
- Laboratory Protocols and Methods: Detailed experimental protocols are crucial for reproducibility. Assigning DOIs to these documents, especially those published on platforms like Protocols.io, ensures their discoverability and citability.
- Images and Multimedia: Scientific images, figures, illustrations, audio clips, and video recordings, particularly those forming part of a publication or dataset, can receive DOIs, allowing for their independent citation and reuse.
- Educational Resources: Open Educational Resources (OER) like lecture notes, course modules, simulations, and teaching materials can benefit from DOIs for discoverability and attribution, promoting their reuse in pedagogical contexts.
- Grants and Funding Information: Some initiatives explore assigning DOIs to grant proposals or funding acknowledgements, linking research outputs directly to their funding sources, enhancing transparency and accountability.
- Instruments and Equipment: While less common, DOIs could potentially be used to identify specific scientific instruments or equipment models, particularly those that are custom-built or have significant methodological implications.
This broad applicability underscores the DOI system’s critical role in creating a comprehensive and interconnected digital scholarly record, moving beyond the traditional article-centric view of research outputs.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
4. DOI Registration Agencies
The Digital Object Identifier system operates not as a monolithic entity but as a federated system managed by a network of specialized registration agencies, all operating under the overarching governance of the International DOI Foundation (IDF). This distributed model ensures scalability, responsiveness to specific community needs, and specialized expertise [1, 7].
4.1 Role and Function
DOI Registration Agencies (RAs) are licensed by the IDF to provide services to registrants (publishers, data centers, institutions) who wish to assign DOIs to their digital content. Their primary roles and functions include:
- DOI Assignment and Prefix Management: RAs are responsible for assigning DOI prefixes to their registrants and for overseeing the minting (creation) of unique DOIs. They ensure that prefixes are unique and that the suffixing policies of their registrants adhere to the DOI standard.
- Metadata Management: A core function of RAs is to collect, validate, and maintain metadata associated with each registered DOI. This metadata is critical for discoverability, accurate resolution, and interoperability. Each RA defines its specific metadata schema (e.g., CrossRef’s schema for publications, DataCite’s schema for data), which generally align with global standards like Dublin Core but are tailored to the content type.
- Resolution and Infrastructure Maintenance: RAs contribute to the maintenance of the global DOI resolution infrastructure, primarily by updating the Handle System with the current URLs associated with their registered DOIs. They ensure that their members’ DOIs reliably resolve to the correct landing pages.
- Community Engagement and Policy Development: RAs serve as interfaces between the IDF, their registrants, and the broader scholarly community. They provide technical support, training, and guidance on best practices. They also play a significant role in developing policies related to DOI usage, metadata standards, and ethical considerations within their respective domains.
- Interoperability and Linking Services: Many RAs provide additional services that leverage DOIs to enhance interoperability, such as linking services (e.g., CrossRef’s Reference Linking), content registration, and metadata distribution.
The distributed nature of the RA system allows for specialization while maintaining a unified global identifier system. Each RA typically focuses on specific types of content or communities, developing tailored services and metadata profiles.
4.2 Prominent Registration Agencies
While several RAs exist, two are particularly prominent in the scholarly communication landscape:
-
CrossRef: Established in 1999, CrossRef is by far the largest and most widely recognized DOI Registration Agency, primarily serving scholarly publishers. Its original and core mission was to enable persistent cross-publisher reference linking, allowing users to navigate seamlessly from a reference in one publication to the full text of the cited work, regardless of the publisher [7, 8].
- Services: Beyond basic DOI registration for journal articles and books, CrossRef offers a suite of value-added services:
- Crossmark: A service that provides status information about a publication, indicating whether it has been updated, corrected, retracted, or reviewed, fostering trust in the scholarly record.
- Funder Registry: A standardized taxonomy of grant-giving organizations, allowing publishers to register funding information associated with research outputs, facilitating tracking of research impact and compliance with funder mandates.
- Similarity Check (powered by iThenticate): A plagiarism detection service that helps editors screen submitted manuscripts for originality.
- Cited-by Linking: Allows publishers to show articles that have cited their content.
- Content Registration: Handles not only journal articles and books but also conference proceedings, standards, reports, components of works, preprints, and more.
- Impact: CrossRef has become indispensable for academic publishing, creating a vast interconnected web of scholarly literature that significantly enhances discoverability and navigability.
- Services: Beyond basic DOI registration for journal articles and books, CrossRef offers a suite of value-added services:
-
DataCite: Founded in 2009 by a consortium of research libraries and data centers, DataCite’s explicit mission is to make research data findable, accessible, interoperable, and reusable by providing persistent identifiers (DOIs) for data and other research outputs [4, 8]. DataCite advocates for data citation as a first-class output of research.
- Services: DataCite focuses on:
- DOI Minting for Data: Facilitates the assignment of DOIs to diverse datasets, images, software, and other non-textual research objects deposited in institutional and domain-specific data repositories.
- Metadata Schema: Maintains and evolves the DataCite Metadata Schema, a comprehensive standard for describing research datasets. This schema supports detailed information about creators, contributors, publication year, resource type, subjects, funders, and related identifiers, which is crucial for data discovery and reuse.
- Repository Integration: Works closely with data repositories to integrate DOI assignment into their workflows, often through APIs, making it straightforward for researchers and data managers to get DOIs for their data.
- Discovery Services: Provides tools like DataCite Search and the DataCite Commons platform to enable discovery of DOI-registered datasets, connecting them with publications, researchers (via ORCID), and organizations (via ROR).
- Advocacy: Actively promotes data citation best practices and the broader adoption of FAIR data principles.
- Impact: DataCite has been pivotal in elevating research data to a citable scholarly output, contributing significantly to data sharing, reproducibility, and the recognition of data creators.
- Services: DataCite focuses on:
-
Other Agencies: While CrossRef and DataCite dominate, other RAs serve specialized niches. For instance, the Entertainment Identifier Registry (EIDR) assigns DOIs to commercial audio-visual works, while mEDRA (multilingual European DOI Registration Agency) focuses on content primarily in European languages. These agencies demonstrate the flexibility of the DOI system to adapt to various content types and industries [9].
Many thanks to our sponsor Esdebe who helped us prepare this research report.
5. Best Practices for DOI Implementation in Institutional Repositories
Institutional repositories (IRs) play a crucial role in collecting, preserving, and disseminating the intellectual output of universities and research organizations. The effective implementation of DOIs within IRs is paramount for enhancing the visibility, citability, and long-term impact of institutional research [5]. Adhering to best practices ensures that DOIs fulfill their promise of persistence and actionability.
5.1 Integration with Repository Systems
Seamless integration of DOI assignment into the repository workflow is fundamental. This moves DOI registration from a manual, error-prone task to an automated process:
- API-driven Workflows: Repository platforms (e.g., DSpace, EPrints, Fedora, InvenioRDM, Samvera-based systems) should integrate with DOI registration agencies (primarily DataCite for IRs) via Application Programming Interfaces (APIs). This allows for automated minting and updating of DOIs as content is deposited, published, or modified within the repository.
- Automated Assignment upon Deposit: Ideally, a DOI should be automatically assigned and registered immediately upon the successful deposit and publication of a new digital object in the repository. This ensures that the content is persistently identifiable from its inception.
- Batch Registration: For legacy content or large collections, batch registration capabilities are essential. This allows institutions to assign DOIs to existing digital objects efficiently, retroactively enhancing their discoverability and citability.
- Clear Policies and Guidelines: Institutions must develop clear policies regarding which types of content receive DOIs, who is responsible for their assignment, and the workflow for updating associated URLs. Staff involved in repository management and content deposit should be trained on these policies and the technical procedures.
5.2 Metadata Quality
The robustness of a DOI is directly proportional to the quality and completeness of its associated metadata. Poor metadata can render even a perfectly resolved DOI less useful, as users may struggle to understand or reuse the underlying content [6, 8].
- Adherence to Standards: Repositories must enforce strict adherence to established metadata standards relevant to the content type. For publications, this might include Dublin Core or MARC; for data, the DataCite Metadata Schema is critical. These schemas provide structured fields for essential elements like creators, titles, publication dates, resource types, keywords, and funding information.
- Rich and Comprehensive Metadata: Beyond mandatory fields, repositories should encourage or mandate the provision of rich, descriptive metadata. This includes abstracts, methodologies, data collection instruments, software dependencies, and usage licenses. The more comprehensive the metadata, the greater the utility of the object and its DOI.
- Controlled Vocabularies and Persistent Identifiers for Metadata: To enhance interoperability and machine readability, metadata fields should, wherever possible, leverage controlled vocabularies (e.g., subject headings, thesauri) and other persistent identifiers for entities within the metadata itself. Examples include ORCID for authors, ROR (Research Organization Registry) for affiliations, and the Funder Registry for funding bodies [15, 16, 17].
- Metadata Validation and Curation: Implement automated validation checks during the deposit process to ensure metadata consistency, accuracy, and adherence to schema. Human curation by repository staff is also crucial to rectify errors, enrich sparse records, and maintain overall metadata quality over time.
- Version Control for Metadata: If the digital object is versioned, its metadata should also reflect the specific version it describes, ensuring that the metadata accurately represents the state of the object at the time of its DOI assignment.
5.3 Persistent Linking and Resolver Management
While the DOI system promises persistence, institutional repositories bear a significant responsibility for ensuring that the URLs (landing pages) associated with their DOIs remain stable and continuously accessible [5, 6].
- Stable Landing Pages: Every DOI must resolve to a stable, informative landing page within the repository. This page should contain:
- The complete metadata record for the digital object.
- A link to the full-text or full-data file(s).
- Information about the object’s license and reuse conditions.
- The DOI itself, prominently displayed.
- Version information (if applicable).
- Information about updates or retractions (e.g., via Crossmark functionality if applicable).
- Robust URL Management: Repositories must have robust internal URL management strategies. This means avoiding dynamic URLs that change frequently, using persistent URLs (PURLs) internally if their platform supports it, and having clear procedures for updating the DOI registration agency whenever a content’s URL legitimately changes (e.g., due to platform migration or domain change). Regular auditing of landing page functionality is essential.
- Long-term Preservation: The responsibility for persistence extends beyond just current access. Institutional repositories are often tasked with long-term digital preservation. This involves ensuring that the content itself, not just the link, remains accessible and renderable over decades. Strategies include format migration, dark archiving, and adherence to preservation standards (e.g., OAIS model).
- Redundancy and Backups: Implement robust backup and recovery strategies for both the repository system and its associated data. Redundant hosting environments can further enhance the resilience of DOI-linked content.
5.4 Policy and Governance
Beyond technical implementation, robust institutional policies and governance are critical for sustainable DOI use [10].
- Clear Ownership and Responsibility: Define clear roles and responsibilities within the institution for DOI management, including technical support, policy formulation, and outreach to researchers.
- Funder and Publisher Mandate Alignment: Ensure that institutional DOI policies align with funder requirements for data sharing and publisher policies regarding preprint/postprint DOIs, facilitating compliance for researchers.
- Researcher Education and Support: Provide ongoing education and support to researchers on the benefits of DOIs, how to correctly cite them, and how to obtain them for their diverse research outputs. This is vital for maximizing adoption and correct usage.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
6. Impact on Scholarly Communication and Data Attribution
Digital Object Identifiers have permeated virtually every aspect of scholarly communication, fundamentally reshaping how research is discovered, cited, evaluated, and reused. Their impact extends far beyond mere linking, touching upon core principles of academic integrity, open science, and the evolving nature of the research ecosystem [1, 2, 5].
6.1 Enhanced Discoverability
DOIs are critical enablers of research discoverability in an increasingly information-saturated world:
- Search Engine Optimization (SEO) for Scholarship: While DOIs are not direct SEO elements, the stable links and rich metadata associated with them significantly improve how scholarly content is indexed by academic search engines (e.g., Google Scholar, Scopus, Web of Science) and general search engines. This makes it easier for researchers, practitioners, and the public to find relevant work.
- Interoperability with Scholarly Databases: Major bibliographic databases and citation indexes rely heavily on DOIs to uniquely identify and link records. This ensures that when a researcher searches these databases, they retrieve accurate results and can seamlessly navigate to the original source.
- Machine Readability and Automation: The structured nature of DOIs and their associated metadata makes them highly machine-readable. This facilitates automated harvesting of research outputs, populating institutional repositories, and building sophisticated research knowledge graphs [10, 11].
- Beyond Textual Search: By providing persistent identifiers for datasets, software, and other non-textual outputs, DOIs allow for the discoverability of these often-overlooked research components, opening new avenues for exploration and analysis.
6.2 Accurate Citation and Attribution
At its core, scholarly communication relies on accurate citation to acknowledge intellectual debt and trace the lineage of ideas. DOIs provide an unparalleled mechanism for achieving this:
- Formal Recognition for All Outputs: DOIs extend the concept of formal citation beyond traditional journal articles to datasets, software, preprints, and other digital objects. This means that creators of these diverse outputs can receive proper credit for their work, incentivizing the production and sharing of high-quality research materials.
- Reliable Referencing: The persistence of DOIs ensures that a citation, once made, will reliably lead the reader to the correct source, even years or decades later. This eliminates the frustration and inefficiency caused by broken links, upholding academic integrity and facilitating validation of research claims.
- Impact Metrics and Altmetrics: DOIs are instrumental in tracking the impact of research. Traditional citation counts, as well as emerging altmetrics (which measure online attention from social media, news outlets, policy documents, etc.), rely on persistent identifiers to aggregate data accurately. A DOI acts as the unique anchor for all these metrics [13].
- Mitigating Plagiarism and Misattribution: By unambiguously identifying a unique digital object, DOIs help to prevent plagiarism and misattribution. They provide a clear, verifiable link to the original source, making it easier to check for originality and ensure that authors receive credit where due.
6.3 Compliance with Open Science Principles
Open science is a movement advocating for research to be conducted and disseminated transparently, openly, and collaboratively. DOIs are a foundational technology for realizing many open science ideals:
- FAIR Data Principles: As discussed, DOIs are central to making data Findable, Accessible, Interoperable, and Reusable. By providing a persistent identifier linked to rich metadata and a stable landing page, DOIs enable researchers to discover, access, understand, and reuse data effectively, reducing data silos and promoting data sharing [14].
- Reproducibility and Transparency: Open science heavily emphasizes research reproducibility. DOIs for publications, data, and software collectively create a transparent chain of evidence, allowing others to verify and reproduce findings. This enhances the credibility and trustworthiness of scientific research.
- Funder and Publisher Mandates: Many research funders and publishers now mandate the use of DOIs for research data and other outputs as part of their open science policies. This ensures compliance with requirements for public access to publicly funded research and promotes broader dissemination.
- Accessibility and Inclusivity: By making research outputs more discoverable and persistently accessible, DOIs contribute to a more open and inclusive scholarly ecosystem, allowing researchers from diverse backgrounds and institutions to access and build upon global knowledge.
6.4 Interoperability and Linked Data
DOIs are more than just links; they are nodes in an ever-growing network of scholarly information, facilitating interoperability between disparate systems and contributing to the vision of a semantic web for research.
- Connecting Research Entities: DOIs can link publications to their underlying data, software, grant information, and even to the researchers (via ORCID) and organizations (via ROR) involved. This creates a rich, interconnected ‘research graph’ that illustrates the complex relationships between different scholarly entities.
- Machine-actionable Relationships: The ability to programmatically link and resolve DOIs enables automated workflows and data integration across various platforms, from reference managers to institutional dashboards and national research infrastructures.
- Semantic Web Potential: By serving as persistent identifiers in a linked data environment, DOIs contribute to the semantic web, where machines can not only read information but also understand its meaning and relationships, leading to more intelligent discovery and analysis tools.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
7. Challenges in Maintenance and Long-Term Persistence
Despite their transformative impact, DOIs are not a ‘set it and forget it’ solution. Their long-term persistence and functionality rely on continuous maintenance, robust infrastructure, and the active participation of all stakeholders. Several challenges must be actively managed to ensure the enduring effectiveness of the DOI system [5, 6].
7.1 Link Rot and DOI Resolution
The most prominent challenge to DOI persistence is the phenomenon of ‘link rot’ or ‘URL rot.’ While DOIs themselves are persistent, the underlying URLs to which they resolve can and do change or become invalid over time. This can occur for several reasons:
- Organizational Changes: A publisher might be acquired, an institutional repository might migrate to a new platform, or a data center might change its domain name. Each of these events necessitates updating the URLs associated with the DOIs managed by that entity.
- Content Reorganization or Deletion: Content might be moved within a website, leading to a broken internal link, or, in some unfortunate cases, content might be intentionally or accidentally deleted, rendering the DOI non-resolvable to its original resource.
- Technical Failures: Server outages, database corruption, or misconfigurations can temporarily or permanently disrupt the resolution path.
Mitigation Strategies:
- Proactive Monitoring: Registrants and registration agencies must implement robust systems for proactively monitoring the resolution of their DOIs. Automated tools can periodically check landing page validity and report broken links.
- Responsive Updating: When a URL changes, the registrant must promptly update the corresponding URL in the Handle System via their registration agency. This process should be streamlined and well-documented.
- Landing Page Robustness: As discussed in best practices, the landing page itself must be stable and resilient. It should be designed to handle internal reorganizations without breaking the link to the full content.
- Dark Archives and Preservation: For truly long-term persistence, digital preservation strategies are paramount. This involves depositing content into trusted digital repositories (often distinct from the access repository) that actively manage formats, metadata, and bitstream integrity over time. In cases where a live link cannot be maintained, a DOI might eventually resolve to a static ‘tombstone’ page indicating that the content is archived or no longer available, preventing a completely dead link.
7.2 Sustainability of DOI Registration Agencies
The long-term viability of the DOI system hinges on the financial and operational sustainability of the DOI Registration Agencies (RAs) and the International DOI Foundation (IDF) [1, 7].
- Funding Models: RAs typically operate on membership fees, content registration fees, or service charges levied on their registrants. These models need to be sustainable to cover the costs of infrastructure, development, maintenance, and staff. Economic downturns or shifts in the scholarly communication landscape can impact these revenue streams.
- Governance and Collaboration: Effective governance structures for both the IDF and the RAs are essential to ensure strategic direction, sound financial management, and responsiveness to community needs. Collaboration between RAs and with other PID providers is also vital for an integrated scholarly ecosystem.
- Technological Evolution: The underlying Handle System and the broader internet infrastructure are constantly evolving. RAs must invest in ongoing technical development and adaptation to ensure their systems remain compatible, secure, and efficient.
- Community Support: The success of DOIs relies on broad community adoption and continued commitment from publishers, institutions, funders, and researchers. A decline in support or a fragmentation of the identifier landscape could threaten sustainability.
7.3 Technical Obsolescence and Evolution
The digital environment is characterized by rapid technological change. The DOI system, and especially its underlying Handle System, must continue to evolve to remain relevant and effective [1, 2].
- Handle System Development: The Handle System, while robust, requires continuous maintenance and development to address new security threats, scale to ever-increasing demands, and integrate with emerging technologies. Ensuring that the CNRI continues to support and evolve the Handle System is critical.
- Metadata Standards Evolution: As new types of research outputs emerge and scholarly practices evolve, metadata schemas (e.g., DataCite Metadata Schema, CrossRef’s schema) must be updated to capture new information adequately. This requires careful versioning and migration strategies.
- API Development: APIs linking repositories to RAs need to be maintained and upgraded to support new features and ensure seamless integration across diverse platforms. Backward compatibility is a constant challenge.
7.4 User Adoption and Education
While DOIs are widely adopted for journal articles, their consistent use for other research outputs (data, software, protocols) is still developing. Challenges remain in ensuring widespread adoption and proper use by researchers [10].
- Awareness and Training: Many researchers, particularly those outside highly digital-intensive fields, may not fully understand the benefits of DOIs for their data or software. Comprehensive education and training programs from institutions, funders, and RAs are necessary.
- Workflow Integration: For DOIs to be consistently used, their assignment needs to be integrated seamlessly into researchers’ existing workflows, rather than being an additional burden. This is where repository integration and user-friendly platforms become crucial.
- Incentives for Use: Providing clear incentives for researchers to use DOIs (e.g., improved citation, fulfilling funder mandates, enhanced discoverability leading to greater impact) is vital for driving adoption.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
8. Future Directions and Emerging Trends
The DOI system, while mature, continues to evolve in response to the dynamic scholarly communication landscape. Several trends suggest future directions and enhancements for persistent identifiers [10, 11, 14].
8.1 Deeper Integration with Other Persistent Identifiers
The power of DOIs is amplified when they are integrated with other types of persistent identifiers (PIDs), creating a richer, interconnected web of scholarly information:
- ORCID (Open Researcher and Contributor ID): Linking DOIs of publications and datasets to the ORCIDs of their creators provides a comprehensive, disambiguated record of a researcher’s output. This is crucial for attribution, impact assessment, and reducing administrative burden for researchers [16].
- ROR (Research Organization Registry): ROR IDs for institutional affiliations can be linked to DOIs, allowing for precise attribution of research outputs to organizations, facilitating institutional impact analysis and compliance reporting [17].
- IGSN (International Geo Sample Number): For physical samples in scientific research (e.g., geological, biological specimens), IGSNs provide persistence. Linking IGSNs to DOIs of data derived from these samples ensures traceability from the physical world to the digital realm.
- ARKs (Archival Resource Keys): While DOIs are prominent, ARKs are another widely used type of persistent identifier, particularly in digital libraries and archives. Interoperability between DOI and ARK systems could further enhance the persistence landscape.
This integration fosters a holistic ‘PID Graph’ that can map relationships between people, organizations, grants, instruments, and research outputs, enabling new forms of discovery and analysis.
8.2 Machine-Actionable DOIs and the Semantic Web
The future of DOIs lies in becoming even more ‘machine-actionable,’ moving beyond simple redirection to enable automated understanding and processing of scholarly information [11, 14].
- Enhanced Resolution Services: Future resolvers might offer more than just redirection, potentially returning structured metadata directly or offering choices for content access (e.g., full text, data, abstract, different versions) in a machine-readable format.
- Linked Open Data: DOIs are ideally positioned to act as nodes in the Linked Open Data (LOD) cloud, connecting scholarly resources to broader knowledge graphs. This enables semantic queries and inference, allowing machines to ‘understand’ relationships between research entities and build new knowledge connections automatically.
- AI and Research Automation: As artificial intelligence becomes more sophisticated in scientific discovery, machine-actionable DOIs will be critical for feeding AI systems with structured, reliable, and persistently identifiable research inputs, enabling automated literature review, hypothesis generation, and data synthesis.
8.3 Blockchain for Persistence and Verification
While still largely speculative, some explore the potential of blockchain technology to enhance aspects of PID systems, including DOIs.
- Immutable Records: Blockchain could potentially be used to create immutable records of DOI assignment and metadata, offering an additional layer of trust and auditability for the scholarly record.
- Decentralized Resolution: A truly decentralized resolution system built on blockchain could offer increased resilience and potentially reduce reliance on centralized infrastructure, though significant challenges exist regarding scalability and energy consumption.
- Attribution and Provenance: Blockchain’s ability to create verifiable chains of custody could be beneficial for tracking the provenance of research outputs and ensuring accurate attribution in complex collaborative projects.
It is important to note that the DOI system is already highly robust and well-established, so blockchain integration would likely be complementary rather than revolutionary, addressing specific challenges or offering incremental improvements.
8.4 Support for New Research Modalities
The types of research outputs continue to diversify. Future DOI developments will need to accommodate these new modalities:
- Interactive Publications: DOIs for interactive figures, dynamic dashboards, and computational notebooks (e.g., Jupyter notebooks) that allow users to rerun code and manipulate data directly within a publication.
- Virtual and Augmented Reality Objects: As VR/AR become tools for research, DOIs could identify 3D models, virtual environments, and simulations.
- Research Services and Workflows: Potentially, DOIs could identify specific research services or reproducible computational workflows, treating them as citable research assets.
The ongoing evolution of the DOI system, driven by continuous technical development, strategic partnerships, and community feedback, ensures its continued relevance and indispensable role in shaping the future of global scholarly communication.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
9. Conclusion
Digital Object Identifiers have unequivocally cemented their position as a cornerstone of modern scholarly communication. Their genesis in the late 1990s was a direct response to the inherent impermanence of digital content URLs, offering a groundbreaking solution for persistent, unique identification. This report has illuminated the intricate technical architecture underpinning DOIs, particularly the pivotal role of the Handle System in ensuring reliable resolution and enduring access to digital objects irrespective of their changing locations.
The widespread adoption of DOIs across an ever-expanding array of research outputs—from traditional journal articles and monographs to critical research datasets, software, codebases, and diverse multimedia—underscores their versatility and indispensable nature. The ecosystem of DOI Registration Agencies, governed by the International DOI Foundation, exemplifies a successful federated model, with entities like CrossRef and DataCite playing specialized yet complementary roles in fostering a comprehensive and interconnected scholarly record.
Effective DOI implementation within institutional repositories, guided by robust best practices for system integration, rigorous metadata quality control, and proactive persistent linking strategies, is paramount. These efforts directly translate into enhanced discoverability of research, accurate and formal attribution for all scholarly contributions, and steadfast compliance with the fundamental principles of open science, particularly the FAIR data tenets. DOIs are not merely identifiers; they are enablers of transparency, reproducibility, and interoperability, fostering a more robust and trustworthy research ecosystem.
Despite their profound success, the DOI system confronts ongoing challenges, notably the persistent threat of ‘link rot’ and the imperative for continuous maintenance of the resolution infrastructure. The long-term sustainability of Registration Agencies and the need for ongoing technical evolution and user education remain critical considerations. However, the future trajectory for DOIs appears promising, marked by deeper integration with other persistent identifiers, a move towards increasingly machine-actionable services, and the exploration of novel technologies like blockchain. These developments collectively aim to further solidify DOIs’ role in building a semantic web of scholarship and supporting emerging research modalities.
In summation, Digital Object Identifiers are far more than a technical convenience; they are fundamental to the integrity, accessibility, and enduring impact of global scholarly output. Sustained commitment to their robust implementation, maintenance, and continued evolution will be crucial in navigating the complexities of an ever-expanding digital research landscape, ensuring that the legacy of intellectual endeavor remains perpetually discoverable and attributable for generations to come.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
References
- International DOI Foundation. (n.d.). Digital Object Identifier (DOI) System. Retrieved from https://www.doi.org/resources/DOI_article_ELIS3.pdf
- Indiana University Libraries. (n.d.). Digital Object Identifiers (DOIs). Retrieved from https://libraries.indiana.edu/digital-object-identifiers-dois
- U.S. Geological Survey. (n.d.). Digital Object Identifiers. Retrieved from https://www.usgs.gov/data-management/digital-object-identifiers
- American University Library. (n.d.). Digital Object Identifiers and their use at American U.: DOIs. Retrieved from https://subjectguides.library.american.edu/DOIs
- University of Idaho Library. (n.d.). Digital Object Identifiers (DOIs) | U of I Library Data Management Guide. Retrieved from https://www.lib.uidaho.edu/services/data/data-management/guide/dois/
- Digital Commons. (n.d.). Digital Object Identifiers – Digital Commons. Retrieved from https://digitalcommons.elsevier.com/managing-submissions-publishing/digital-object-identifiers
- WashU Libraries. (n.d.). Persistent Identifiers – WashU Libraries. Retrieved from https://library.wustl.edu/research-support/scholarly-and-digital-publishing/persistent-identifiers/
- DOI.org. (n.d.). Digital Object Identifier (doi) blogs. Retrieved from https://www.doie.org/digital_object_identifier_blogs
- Università di Firenze. (n.d.). Unique identifiers (DOI, ISBN, ISSN, ORCID). Retrieved from https://www.unifi.it/en/research-and-innovation/research/open-science/unique-identifiers-doi-isbn-issn-orcid
- Open Research Knowledge Graph. (n.d.). Open Research Knowledge Graph. Retrieved from https://en.wikipedia.org/wiki/Open_Research_Knowledge_Graph
- DataCite. (2023). DataCite Metadata Schema Documentation for the Publication and Citation of Research Data. Retrieved from https://schema.datacite.org/
- Smith, A. M. (2021). The Essential Role of DOIs in Research Software Citation. Journal of Open Source Software, 6(60), 3021.
- Haustein, S. (2016). Grand Challenges in Altmetrics: Heterogeneity, Data Quality, and Coverage. Frontiers in Research Metrics and Analytics, 1, Article 8.
- Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.
- CrossRef. (n.d.). Funder Registry. Retrieved from https://www.crossref.org/services/funder-registry/
- ORCID. (n.d.). ORCID. Retrieved from https://orcid.org/
- Research Organization Registry (ROR). (n.d.). ROR. Retrieved from https://ror.org/
