The Five Safes Framework: A Comprehensive Analysis of Its Application, Effectiveness, and Broader Implications in Data Governance

The Five Safes Framework: A Comprehensive Examination of its Role in Secure Data Governance and Responsible Research

Many thanks to our sponsor Esdebe who helped us prepare this research report.

Abstract

In an increasingly data-driven world, the imperative to balance data utility with stringent confidentiality requirements has become a cornerstone of ethical and effective research. The Five Safes framework, originating from the UK Office for National Statistics (ONS), stands as a globally recognized and widely adopted model for managing access to sensitive microdata within Trusted Research Environments (TREs). This comprehensive paper undertakes an in-depth examination of the Five Safes, dissecting its foundational principles, operational components, and diverse applications across various national and institutional contexts. It explores the framework’s historical evolution, its crucial role in fostering responsible data stewardship, and the complex interplay it navigates between legal compliance, technological advancements, and public trust. Furthermore, this study critically evaluates the framework’s effectiveness, addressing prominent critiques regarding its perceived limitations in a rapidly evolving data landscape, particularly concerning dynamic disclosure risks and the integration of cutting-edge privacy-enhancing technologies. By providing a nuanced understanding of its inherent strengths and areas for potential refinement, this analysis aims to contribute to ongoing discourse on robust data governance, privacy preservation, and secure data sharing practices for the advancement of public good.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

1. Introduction: Navigating the Data Frontier with Confidentiality at its Core

The digital age has ushered in an unprecedented era of data generation, collection, and analysis. Data, now widely acknowledged as a pivotal asset, fuels advancements across science, policy-making, economic development, and social innovation [1]. From personalized medicine and urban planning to economic forecasting and social welfare programs, the transformative potential of data-driven insights is undeniable. However, alongside this immense potential lies a profound responsibility: to ensure that the collection, storage, access, and analysis of sensitive data are conducted ethically, securely, and in strict adherence to privacy principles [2]. The tension between maximizing the utility of data for public benefit and safeguarding individual confidentiality is a central challenge in contemporary data governance.

In response to this intricate challenge, the Five Safes framework emerged as a pragmatic and comprehensive approach to managing access to confidential data. Developed as a socio-technical governance model, it provides a structured methodology for assessing and mitigating risks associated with data sharing. Unlike purely technical security measures, the Five Safes emphasizes a holistic, multi-dimensional approach, recognizing that data security is not solely a technological problem but also one deeply intertwined with human behavior, organizational policy, and legal frameworks [3]. This paper delves into the genesis and evolution of the Five Safes framework, meticulously examining its five interdependent dimensions: Safe Projects, Safe People, Safe Settings, Safe Data, and Safe Outputs. It explores its widespread adoption, analyzes its impact on data governance, and critically assesses its efficacy in navigating the dynamic landscape of data privacy and security, particularly in the context of Trusted Research Environments.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

2. Origins and Development of the Five Safes Framework: From Technical Model to Governance Paradigm

The conceptual foundations of the Five Safes framework can be traced back to the early 2000s within the UK Office for National Statistics (ONS). At this time, the ONS was grappling with the increasing demand from academic researchers and policy analysts for access to granular, microdata datasets, which held immense potential for detailed analysis but also presented significant risks to individual privacy. Traditional methods of data access, often involving highly aggregated or heavily anonymized public-use files, limited the depth of possible research [4]. Allowing direct access to confidential microdata, however, raised serious concerns about re-identification and disclosure.

It was in this context that Felix Ritchie, then at the ONS, conceived a structured approach to facilitate secure remote access to confidential microdata. Initially, this concept was referred to as the ‘VML Security Model,’ where VML stood for the Virtual Microdata Laboratory, the secure environment being developed by the ONS for researchers. The VML was designed to provide a secure virtual workspace where researchers could access and analyze sensitive data without physically removing it from the ONS’s controlled environment. The underlying security model articulated a set of principles that would later formalize into the ‘Five Safes’ [5].

This early iteration recognized that data protection was not merely about technological safeguards but required a broader consideration of who was accessing the data, for what purpose, in what environment, with what data, and what could be released. The rebranding to ‘The Five Safes’ around 2009-2010 was a strategic move to broaden its appeal and applicability beyond a specific technical environment. It transformed from a security model implicitly tied to a particular system (VML) into a more abstract, overarching governance framework. This re-conceptualization emphasized its applicability across various organizations and data types, encapsulating five key dimensions that collectively address the multifaceted risks associated with sensitive data access and sharing [6]. The framework’s enduring strength lies in its intuitive nature and its ability to act as a common language for discussing and implementing data governance, making it accessible to both technical and non-technical stakeholders.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

3. Components of the Five Safes Framework: A Multi-Dimensional Approach to Data Security

The Five Safes framework is a holistic, interdependent model, meaning that the strength of the entire system depends on the robustness of each individual ‘safe.’ A weakness in one dimension can compromise the effectiveness of the others. This integrated approach ensures that risks are managed throughout the entire data access lifecycle, from project inception to output dissemination.

3.1 Safe Projects: Purpose, Public Benefit, and Ethical Alignment

‘Safe Projects’ is the foundational dimension, assessing whether the proposed use of sensitive data is appropriate, lawful, ethical, and demonstrably beneficial to the public. This initial gatekeeping mechanism is crucial for establishing legitimate grounds for data access and preventing misuse [7]. It moves beyond mere technical compliance to consider the broader societal impact and ethical implications of data utilization.

  • Public Benefit Justification: A core tenet of Safe Projects is the requirement for a clear and compelling public benefit. This means the research or analysis must aim to generate insights that contribute to public good, inform policy, improve public services, or advance scientific understanding. Projects driven purely by commercial gain or individual curiosity, without a demonstrable public benefit, are typically deemed ‘unsafe.’ This criterion helps to build and maintain public trust, demonstrating that sensitive data is used for collective betterment rather than private advantage [8].
  • Ethical Review and Scrutiny: Proposed projects often undergo rigorous ethical review processes. This typically involves submission to independent ethics committees or specialized Data Access Committees (DACs), comprising experts in ethics, law, data science, and the relevant subject matter. These bodies scrutinize the project’s objectives, methodology, data requirements, and potential societal impacts. Key questions include: Is the research question well-defined? Is the chosen methodology sound and proportionate to the data requested? Are there any potential harms to individuals or groups that might arise from the research? What are the safeguards in place to mitigate such risks?
  • Legal and Regulatory Compliance: All projects must strictly adhere to relevant legal and regulatory frameworks, such as the General Data Protection Regulation (GDPR) in Europe, the Data Protection Act 2018 in the UK, HIPAA in the US for health data, and other specific national data protection laws [9]. This involves ensuring that the legal basis for processing the data is clear (e.g., explicit consent, public task, legitimate interest) and that the project respects data subjects’ rights, including their right to privacy and the right to object to processing where applicable.
  • Methodological Soundness and Proportionality: The project’s methodology must be robust and appropriate for the research question. Data access should be proportionate to the needs of the research – only the minimum necessary data elements for the shortest possible duration should be requested. Overly broad or vague data requests are often rejected, as they introduce unnecessary privacy risks. The project proposal must clearly articulate how the data will be used to achieve the stated objectives and what outputs are expected.

3.2 Safe People: Trust, Competence, and Accountability of Data Users

‘Safe People’ focuses on ensuring that individuals granted access to sensitive data are trustworthy, competent, and fully accountable for their actions. This dimension acknowledges that even the most secure technical environment can be compromised by human error or malicious intent [10]. It is about building a cadre of responsible data stewards.

  • Vetting and Accreditation: Researchers seeking access to sensitive data typically undergo a stringent vetting process. This may include identity verification, institutional affiliation checks, background checks (which may extend to criminal record checks depending on the data sensitivity), and verification of professional qualifications. Many organizations require researchers to be formally accredited or registered before granting access, often involving an application process that details their research experience and commitment to ethical data handling.
  • Mandatory Training and Certification: All authorized data users are required to complete mandatory training modules covering data protection principles, ethical data use, statistical disclosure control (SDC) techniques, and the specific rules and regulations of the secure data environment. This training aims to instill a deep understanding of their responsibilities and the potential consequences of misuse or accidental disclosure. Regular refresher training is often mandated to ensure ongoing compliance and awareness of evolving risks [11].
  • Legal Agreements and Undertakings: Researchers must typically sign legally binding data access agreements or confidentiality undertakings. These documents explicitly outline their responsibilities, the terms and conditions of data use, prohibitions (e.g., attempting re-identification, sharing data with unauthorized persons, introducing external data), and the severe penalties for breaches, which can include legal prosecution, substantial fines, and blacklisting from future data access [12].
  • Ongoing Monitoring and Auditing: The behavior of data users within secure environments is often subject to ongoing monitoring. This includes logging of all data access, commands executed, and files created or downloaded. These audit trails are regularly reviewed to detect anomalous activities or potential policy violations. Some environments may also employ more active supervision, such as through secure video feeds in physical data laboratories.

3.3 Safe Settings: The Secure Data Environment

‘Safe Settings’ pertains to the physical and technical infrastructure and environments where sensitive data is accessed, stored, and analyzed. This dimension ensures that the environment is robustly secure, preventing unauthorized access and minimizing the risk of accidental or malicious data disclosure [13]. The concept of Trusted Research Environments (TREs), also known as Secure Data Environments (SDEs) or Data Safe Havens, is central to this safe.

  • Physical Security: If a physical data laboratory is used, it will have stringent physical security measures. This includes controlled access (e.g., swipe cards, biometric scanners), surveillance (CCTV), secure storage for physical media, and sometimes even ‘air-gapped’ environments where physical connectivity to external networks is completely severed. Researchers are often prohibited from bringing personal devices (phones, USB drives) into these spaces.
  • Technical Security Infrastructure: The technical environment of a TRE is designed with multiple layers of security. Key features include:
    • Network Segregation: Data is typically accessed within an isolated network segment, physically or logically separated from public networks. Strong firewalls and intrusion detection/prevention systems are deployed.
    • Access Control: Multi-factor authentication (MFA) is standard, often combining something the user knows (password), something they have (token, phone), or something they are (biometrics). Access is granted on a ‘least privilege’ basis, meaning users only have access to the specific data and tools required for their approved project.
    • Secure Remote Access: For remote TREs, access is typically via highly secure virtual private networks (VPNs) or secure remote desktop protocols, encrypting all communication and ensuring no data leaves the secure environment to the user’s local machine [14].
    • Restricted Functionality: The computing environment within a TRE is deliberately constrained. Users cannot install arbitrary software, access the internet, send emails, or connect external storage devices (e.g., USB drives). Approved statistical software and necessary tools are pre-installed, and all software is regularly patched and updated.
    • Audit Trails and Logging: Comprehensive logs are maintained of all user activities, data accesses, system changes, and security events. These logs are crucial for accountability, incident response, and forensic analysis in case of a breach.
  • Environmental Controls: This also extends to environmental factors like power redundancy, climate control for servers, and physical safeguards against natural disasters, ensuring data availability and integrity.
  • Certification and Compliance: Many TREs seek independent security certifications, such as ISO 27001 (Information Security Management) or specific national government security accreditations, to demonstrate their adherence to international best practices in information security.

3.4 Safe Data: Minimizing Disclosure Risk Through Data Transformation

‘Safe Data’ focuses on the inherent risks within the data itself and the techniques used to transform it to minimize the risk of re-identification while preserving its analytical utility. This dimension recognizes that even without explicit identifiers, sensitive information can be inferred or re-identified when combined with external datasets [15].

  • De-identification Techniques: The primary strategy is to remove or alter direct and indirect personal identifiers. This goes beyond simply removing names and addresses:
    • Pseudonymisation: Replacing direct identifiers with artificial identifiers or pseudonyms, allowing individuals to be tracked over time within a dataset without revealing their true identity. Re-identification is possible only with the ‘key’ linking pseudonyms to real identities, which is kept separate and highly secured.
    • Anonymisation: Techniques aimed at irreversibly altering data so that re-identification of individuals becomes practically impossible, such as aggregation, generalization, or perturbation. This is a higher standard than pseudonymisation.
    • Generalisation: Broadening the categories of sensitive attributes (e.g., exact age to age range, specific location to broader region).
    • Perturbation/Noise Addition: Introducing small, controlled amounts of noise to the data to obscure individual values without significantly altering statistical properties [16].
  • Statistical Disclosure Control (SDC): SDC encompasses a suite of methods applied to microdata or tabular data to reduce the risk of re-identification or attribute disclosure. For microdata, techniques include:
    • Top/Bottom Coding: Capping extreme values of continuous variables to prevent outliers from being unique identifiers.
    • Swapping: Exchanging values of selected variables between records with similar characteristics.
    • Microaggregation: Grouping records into small clusters and replacing individual values with cluster means.
  • Risk Assessment Methodologies: Before releasing data, a thorough risk assessment is conducted to quantify the re-identification risk. This often involves calculating metrics like ‘k-anonymity’ (ensuring each record is indistinguishable from at least k-1 other records) or ‘l-diversity’ (ensuring sufficient diversity of sensitive attributes within each k-anonymous group) [17]. The goal is to determine the appropriate level of de-identification for the specific data and its intended use.
  • Tiered Access Models: Organizations often implement tiered access models, where different versions of a dataset with varying levels of sensitivity are made available. For example, a heavily anonymized public-use file might be available without a TRE, while a more granular, pseudonymized version is only accessible within a highly secure TRE for approved researchers.
  • Synthetic Data: An advanced approach involves generating synthetic data, which statistically resembles the original data but contains no real individual records. This offers strong privacy protection but can sometimes impact the analytical utility and accuracy for complex analyses [18].

3.5 Safe Outputs: Safeguarding Disseminated Information

The final and equally critical dimension, ‘Safe Outputs,’ ensures that the results derived from data analysis do not inadvertently disclose sensitive information. This involves a rigorous review process before any research findings or outputs are released from the secure environment [19]. The principle is that no output should allow for the re-identification of an individual or small group, nor should it allow for the deduction of sensitive attributes.

  • Output Scrutiny and Review: All outputs (e.g., statistical tables, graphs, models, regression coefficients) must undergo a comprehensive review by trained disclosure control specialists. This review process typically involves both automated checks and human expert judgment. It aims to identify and mitigate any potential for direct or indirect disclosure.
  • Statistical Disclosure Control (SDC) for Outputs: Similar to Safe Data, SDC techniques are applied to outputs to prevent disclosure. Common methods include:
    • Cell Suppression: Suppressing (not showing) counts in tables where the number of observations in a cell is below a predefined minimum threshold (e.g., n < 5). This prevents identification of individuals based on unique or very small groups.
    • Rounding: Rounding counts or values in tables to a specified base (e.g., rounding to the nearest 5 or 10) to obscure exact figures.
    • Perturbation/Noise: Adding small, random noise to numerical outputs like means or standard deviations.
    • Aggregation: Ensuring that results are aggregated to a sufficiently high level, preventing granular insights that could lead to re-identification [20].
  • Prohibition of Direct Identifiers: No direct personal identifiers are ever allowed in outputs. This is a fundamental rule, but the focus of output review extends to indirect identifiers.
  • Iterative Review Process: The output review process is often iterative. If potential disclosure risks are identified, the researcher is required to modify their outputs (e.g., aggregate further, apply more SDC, or re-run analyses with modified parameters) until they meet the disclosure control standards. This ensures a balance between releasing valuable research and protecting privacy.
  • Publication and Dissemination Rules: Organizations may also have rules around the publication and dissemination of findings, such as embargo periods or specific journals/platforms where results can be published, further controlling the information flow.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

4. Applications of the Five Safes Framework: Global Adoption and Adaptation

The Five Safes framework has transcended its origins at the UK ONS to become a globally recognized best practice for secure data access. Its principles have been adopted and adapted by various national statistical offices, government agencies, and research consortia worldwide, reflecting its robustness and flexibility in diverse contexts.

4.1 United Kingdom: A Pioneering and Evolving Landscape

The UK remains a global leader in the application and refinement of the Five Safes. Its widespread adoption across various public sector bodies underscores its foundational role in the nation’s data infrastructure.

  • Office for National Statistics (ONS): As the originator, the ONS has continuously evolved its implementation of the Five Safes. Its Secure Research Service (SRS), a direct successor to the Virtual Microdata Laboratory (VML), is a highly secure TRE based entirely on these principles. The SRS provides researchers with access to a vast array of de-identified government datasets, including census data, economic surveys, and administrative data, enabling high-impact research while maintaining stringent privacy [21]. The ONS also actively promotes and provides guidance on the Five Safes framework to other government departments and public bodies.
  • UK Data Service: As the largest national data archiving and dissemination service for social and economic data, the UK Data Service extensively uses the Five Safes. It operates a secure access service for highly sensitive datasets from major longitudinal studies (e.g., Understanding Society, 1970 British Cohort Study) and administrative sources. Their application process for researchers meticulously checks Safe Projects and Safe People, while data access is provided through their secure remote access environment (Safe Settings), with robust Safe Data preparation and strict Safe Outputs review [22].
  • NHS Digital/NHS England (Health Data): The Five Safes framework has become central to the governance of health data in the UK, particularly with the establishment of Secure Data Environments (SDEs) for health research. Following recommendations from the Goldacre Review (2022) on the trustworthy use of health data, the UK government is mandating that all future access to NHS data for research and planning must occur within SDEs built on the Five Safes principles [23]. NHS England’s SDEs ensure that sensitive patient data, often de-identified, is accessed by approved researchers (Safe People) for legitimate purposes (Safe Projects) in highly controlled environments (Safe Settings), with strict controls over the data itself (Safe Data) and the outputs generated (Safe Outputs). This shift aims to consolidate and standardize secure access mechanisms, enhancing public trust and accelerating health research.
  • Research Data Scotland (RDS): RDS provides a single point of access for de-identified public sector data in Scotland, enabling research for public good. Their operational model is explicitly built on the Five Safes framework, integrating data from various Scottish public bodies (e.g., NHS Scotland, Scottish Government) into a secure environment for approved researchers. This cross-sectoral application demonstrates the framework’s versatility in coordinating data sharing across multiple data custodians within a devolved administration [24].
  • Legal Underpinnings: The framework operates within the robust legal landscape of the UK, underpinned by the Data Protection Act 2018 (which implements GDPR) and the Digital Economy Act 2017. The latter provides specific legal gateways for sharing public sector data for research and statistics, often with explicit reference to safeguarding principles that align directly with the Five Safes.

4.2 Australia: Embracing a National Approach to Data Sharing

Australia has also significantly embraced the Five Safes, particularly in its journey towards more streamlined and trustworthy data sharing across government and research sectors.

  • Australian Bureau of Statistics (ABS): The ABS has been a prominent adopter, implementing the Five Safes as a core component of its data access policies. Its DataLab, a secure research environment, provides approved researchers with access to detailed microdata from various ABS surveys and administrative datasets. The ABS applies rigorous checks across all five dimensions, ensuring that researchers are vetted, projects are for public benefit, and all outputs undergo thorough disclosure control [25].
  • Office of the National Data Commissioner (ONDC): Established to implement the Data Availability and Transparency (DAT) Act 2022, the ONDC plays a crucial role in promoting and regulating data sharing across the Australian public sector. The DAT Act explicitly references the Five Safes as a guiding principle for data sharing, making it a statutory requirement for Australian government entities to consider these dimensions when sharing public sector data. The ONDC provides guidance and accreditation for data schemes, data service providers, and data projects based on the Five Safes, creating a consistent national framework for secure data sharing [26].
  • Australian National Data Service (ANDS): While primarily focused on research data infrastructure, ANDS also promotes secure data access practices aligned with the Five Safes for academic research data, often providing guidance and resources for Australian universities and research institutions.

4.3 New Zealand: Integrating Indigenous Data Sovereignty

New Zealand’s adoption of the Five Safes through Statistics New Zealand (Stats NZ) highlights its adaptability and the potential for integration with unique national contexts, specifically the principles of Māori Data Sovereignty.

  • Statistics New Zealand (Stats NZ): Stats NZ utilizes the Five Safes framework to govern access to its confidential microdata through its Data Lab. This secure environment enables researchers to access a wide range of official statistics for approved research projects. Stats NZ’s commitment to the framework ensures that data is used responsibly and securely for research and policy development [27].
  • Māori Data Sovereignty (Te Mana Raraunga): New Zealand’s experience also uniquely demonstrates how the Five Safes can intersect with and inform discussions around Indigenous Data Sovereignty. While the Five Safes provides a general framework for secure access, the principles of ‘Te Mana Raraunga’ (Māori Data Sovereignty) assert the rights of Māori to control their own data, including its collection, ownership, and use [28]. This adds an additional layer of ethical and governance consideration to ‘Safe Projects’ and ‘Safe People,’ emphasizing the need for data access projects to respect Indigenous self-determination and cultural protocols, potentially requiring specific Māori oversight or partnership for data pertaining to Māori communities. This dynamic interaction showcases the framework’s capacity for cultural sensitivity and adaptation within a national context.

4.4 International Adaptations and Emerging Trends

The principles of the Five Safes resonate beyond these core adopters, influencing data governance practices globally:

  • European Union: While not explicitly named ‘Five Safes,’ national statistical offices (NSOs) across Europe (e.g., Eurostat, Destatis in Germany, INSEE in France) implement similar multi-dimensional approaches for secure microdata access, often influenced by the ONS’s pioneering work. The European Data Strategy and proposals for a European Health Data Space envision secure data sharing mechanisms that align conceptually with the Five Safes’ focus on controlled environments and authorized access [29].
  • Canada: Statistics Canada operates various secure access models, including Research Data Centres (RDCs) and a secure remote access system, which incorporate similar principles of vetting researchers, approving projects, securing environments, transforming data, and reviewing outputs, reflecting the Five Safes philosophy [30].

This widespread adoption across diverse institutional and national landscapes underscores the Five Safes’ effectiveness as a robust, flexible, and intuitively understandable framework for managing the complex challenges of sensitive data access for public benefit.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

5. Effectiveness and Critiques of the Five Safes Framework: Balancing Trust and Rigor

The Five Safes framework has been lauded as a pragmatic and effective model for enabling responsible data sharing while safeguarding privacy. Its strengths lie in its structured approach, clarity, and its ability to foster trust among data custodians, researchers, and the public. However, like any governance framework, it is not without its limitations and has faced significant academic and practical critiques, particularly concerning its adaptability to the rapidly evolving data landscape and its technical depth.

5.1 Strengths of the Five Safes Framework

  • Holistic and Multi-dimensional: The framework’s primary strength is its comprehensive, multi-faceted approach. By addressing five distinct but interconnected dimensions, it moves beyond purely technical security to include policy, legal, ethical, and human elements. This holistic view provides a more robust defense against disclosure risks than reliance on isolated controls [31].
  • Clarity and Intuition: The ‘Five Safes’ nomenclature is highly intuitive and easy to understand for both technical and non-technical stakeholders. This clarity facilitates communication about data governance principles and promotes consistent application across different organizations and sectors.
  • Promotes Trust and Social Licence: By clearly articulating the safeguards in place, the framework enhances transparency and builds trust among data subjects, the public, and data custodians. It demonstrates a commitment to responsible data stewardship, which is crucial for maintaining the ‘social licence’ to use sensitive data for research and public good [32].
  • Flexibility and Adaptability: While prescriptive in its dimensions, the framework is flexible enough to be adapted to various data types, organizational structures, and national legal contexts. It provides a high-level conceptual model that allows for diverse technical and procedural implementations.
  • Risk-Based Approach: The Five Safes inherently promotes a risk-based approach to data access. It encourages custodians to assess and mitigate risks at multiple points, tailoring safeguards to the specific sensitivity of the data and the nature of the research project.

5.2 Critiques of the Five Safes Framework

Despite its widespread acceptance and noted strengths, the Five Safes framework has attracted considerable scrutiny. One of the most prominent critiques comes from the 2020 study by Culnane, Rubinstein, and Watts, titled ‘Not fit for Purpose: A critical analysis of the ‘Five Safes” [33]. Their central arguments, along with other emerging concerns, highlight areas where the framework may require re-evaluation or enhancement.

5.2.1 ‘Disconnected from Existing Legal Protections’

Culnane, Rubinstein, and Watts argue that the Five Safes framework, as typically presented, can appear disconnected from the robust legal protections already in place, such as GDPR or national data protection acts. They contend that the framework might inadvertently encourage a perception that it replaces or supersedes legal obligations, rather than supplementing them. The authors suggest that this could lead to a focus on the framework’s principles at the expense of comprehensive legal compliance and accountability [33].

Counter-argument and Nuance: Proponents argue that the Five Safes is designed to operationalize and enforce legal requirements, not replace them. For instance, ‘Safe Projects’ directly addresses legal bases for processing, and ‘Safe People’ encompasses legal undertakings. The framework provides a pragmatic how-to guide for implementing the what mandated by law. However, the critique highlights the importance of explicitly linking each ‘Safe’ to the relevant legal articles and ensuring that adherence to the framework does not lead to a relaxation of other legal due diligence.

5.2.2 ‘Lacks Strong Technical Measures to Ensure Data Safety’

The critique posits that the Five Safes, while touching upon ‘Safe Settings,’ provides insufficient detail or emphasis on robust technical security measures, potentially leading to a false sense of security. They argue that the framework is more conceptual than technically prescriptive, which could allow for varied and potentially weak technical implementations [33].

Counter-argument and Nuance: While the Five Safes itself is a high-level policy framework, the ‘Safe Settings’ dimension implicitly demands state-of-the-art technical controls. As detailed in Section 3.3, modern TREs built on the Five Safes do incorporate strong technical measures such as multi-factor authentication, network segregation, intrusion detection, and comprehensive logging, often adhering to ISO 27001 or equivalent security standards. The framework’s role is to guide what needs to be secured, leaving the how to technical cybersecurity experts. However, the critique points to the need for continuous vigilance and investment in cutting-edge technical safeguards, as the threat landscape evolves rapidly.

5.2.3 ‘Static View of Disclosure Risk Does Not Account for Dynamic Nature of Data’

Perhaps the most significant criticism leveled by Culnane, Rubinstein, and Watts, as well as by other privacy researchers, is that the Five Safes framework tends to adopt a ‘static’ view of disclosure risk. This means it may not adequately account for the dynamic nature of data and the evolving sophistication of re-identification techniques [33]. Specific concerns include:

  • Longitudinal Re-identification: Data subjects can be re-identified over time as more data points become available across different datasets, forming a ‘mosaic’ of information. A dataset deemed ‘safe’ at one point might become unsafe when linked with future releases or external information.
  • Linkage Attacks: The increasing availability of open data, commercial datasets, and public records means that even highly de-identified sensitive datasets can be linked with external information to re-identify individuals [34]. For example, combining seemingly innocuous attributes like age, gender, and postcode with public voter rolls can pinpoint individuals.
  • Machine Learning and AI: Advanced machine learning algorithms can infer sensitive attributes or perform re-identification with high accuracy, even from highly obfuscated data, presenting a challenge that traditional SDC methods may not fully address [35].

Response and Evolution: Data custodians are increasingly aware of these dynamic risks. Efforts are underway to integrate more advanced privacy-enhancing technologies (PETs) like differential privacy, which offers provable privacy guarantees against linkage attacks, into the ‘Safe Data’ dimension. Continuous risk assessment, rather than a one-time evaluation, is also being emphasized. The challenge remains to adapt a framework designed in an earlier era to the complexities of big data and AI-driven re-identification.

5.2.4 Other Emerging Critiques and Challenges

  • Human Factor Risk: While ‘Safe People’ addresses vetting and training, human error (e.g., misinterpreting rules, accidental data mishandling) and insider threats (malicious individuals with authorized access) remain significant risks that no framework can entirely eliminate. The effectiveness of ‘Safe People’ relies heavily on continuous vigilance, strong ethical culture, and robust auditing [36].
  • Bureaucracy and Access Delays: Critics sometimes argue that the stringent requirements of the Five Safes can lead to significant administrative burden and lengthy delays in data access, potentially stifling timely research and innovation. Balancing robust security with agile access remains a perpetual challenge [37].
  • Scope Limitations: The framework is primarily designed for statistical microdata. Its direct applicability to other forms of sensitive data, such as unstructured text, images, or real-time sensor data, is less clear and may require significant adaptation or complementary frameworks.
  • Cost and Resource Intensity: Implementing and maintaining a Five Safes-compliant TRE is resource-intensive, requiring significant investment in infrastructure, specialized staff (security, disclosure control, legal), and ongoing maintenance. This can be a barrier for smaller organizations or those with limited budgets.

In conclusion, while the Five Safes framework provides a vital and foundational structure for secure data governance, continuous critical analysis and adaptation are essential. The dynamic nature of data, the increasing sophistication of re-identification techniques, and the evolving legal and ethical landscape necessitate ongoing refinement to ensure the framework remains truly ‘fit for purpose’ in the long term.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

6. Broader Implications in Data Governance, Privacy, and Security: Beyond a Framework

The impact of the Five Safes framework extends far beyond mere technical data access procedures. It has profound implications for broader data governance paradigms, privacy policy, and the overarching landscape of data security, influencing public trust, ethical considerations, and international collaborations.

6.1 Fostering Public Trust and Social Licence to Operate

Perhaps one of the most significant implications of the Five Safes is its role in building and maintaining public trust in data sharing initiatives. By explicitly demonstrating a commitment to safeguarding sensitive information across multiple dimensions, the framework helps to assure individuals that their data, even when used for research or public policy, is handled responsibly and ethically [38]. This transparency and accountability are crucial for obtaining and sustaining the ‘social licence to operate’ – the public’s acceptance of government agencies and researchers using their data. Without such trust, public engagement with data collection (e.g., census participation, health surveys) could decline, severely impacting the quality and availability of data for crucial public services and research.

6.2 Driving Ethical Data Use and Responsible AI Development

The ‘Safe Projects’ dimension, in particular, emphasizes the ethical imperative of data use. By requiring a clear public benefit and ethical review, the framework guides researchers towards responsible applications of data. In the burgeoning field of Artificial Intelligence (AI), where vast datasets are used for training machine learning models, the Five Safes provides a crucial model for ethical data access [39]. Ensuring that data used for AI development is sourced and accessed under ‘Safe Projects’ and within ‘Safe Settings’ can help mitigate biases, prevent misuse, and enhance the trustworthiness of AI systems, aligning with broader principles of responsible AI.

6.3 Standardizing Secure Data Environments and Promoting Interoperability

The widespread adoption of the Five Safes has implicitly led to a degree of standardization in the design and operation of Trusted Research Environments (TREs) globally. While implementations vary, the core principles provide a common blueprint, fostering a shared understanding of what constitutes a ‘secure’ environment. This convergence can facilitate future interoperability between TREs, making it easier for researchers to access data from multiple sources across different jurisdictions, provided common legal and ethical frameworks are also in place [40]. This standardization also reduces the learning curve for researchers familiar with one Five Safes-compliant environment when moving to another.

6.4 Informing Policy and Legislation

The success and conceptual clarity of the Five Safes have influenced data governance policy and legislation in several countries. For instance, Australia’s Data Availability and Transparency Act 2022 explicitly references the Five Safes, embedding its principles into statutory requirements. This integration into legal frameworks demonstrates the framework’s practical utility in shaping robust regulatory environments for data sharing, providing a clear and actionable set of guidelines for data custodians and users alike [26].

6.5 Balancing Innovation with Protection

The Five Safes framework elegantly navigates the inherent tension between the desire for data-driven innovation and the necessity of protecting individual privacy. By establishing clear guardrails, it enables researchers to leverage granular data that might otherwise be inaccessible due to privacy concerns. This ‘controlled access’ model allows for deep analytical insights that are not possible with heavily aggregated or public-use data, thereby fostering innovation in research and evidence-based policy development, all while maintaining a high standard of confidentiality [41].

6.6 Future Directions: Integration with Privacy-Enhancing Technologies (PETs)

The evolving nature of data privacy and security means the Five Safes framework must continuously adapt. A significant future implication lies in its integration with emerging Privacy-Enhancing Technologies (PETs). While the framework primarily relies on de-identification and secure environments, PETs like homomorphic encryption (allowing computation on encrypted data), secure multi-party computation (enabling collaborative analysis without revealing individual inputs), and differential privacy (mathematically quantifiable privacy guarantees) offer new avenues for data protection [42]. The Five Safes can serve as a governance layer to guide the responsible deployment of these complex technologies, ensuring that the ‘Safes’ are maintained even as the underlying technical mechanisms become more sophisticated.

Similarly, the rise of federated learning, where AI models are trained on decentralized datasets without the data ever leaving its source, presents an interesting challenge and opportunity. The Five Safes principles could be adapted to govern such distributed analysis, focusing on the ‘safety’ of algorithms and model outputs rather than direct data access [43].

In essence, the Five Safes framework is more than just a set of rules; it represents a mature philosophy for managing sensitive data in a complex digital world. Its continued relevance hinges on its ability to evolve, incorporating new technologies and adapting to shifting societal expectations and privacy concerns, thereby solidifying its role as a cornerstone of responsible data stewardship.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

7. Conclusion: The Enduring Value and Future Evolution of the Five Safes Framework

The Five Safes framework, initially conceived within the UK Office for National Statistics, has matured into a globally influential paradigm for secure data access and responsible data governance. Its multi-dimensional approach, encompassing Safe Projects, Safe People, Safe Settings, Safe Data, and Safe Outputs, provides a comprehensive and intuitively understandable methodology for balancing the critical imperative of data utility for public benefit with the non-negotiable requirement of individual confidentiality [44]. Its widespread adoption across national statistical offices, health data custodians, and academic research services in the UK, Australia, New Zealand, and beyond underscores its practical utility and conceptual robustness.

The framework’s enduring value lies in its holistic nature, recognizing that data security is a socio-technical challenge requiring coordinated controls across policy, legal, human, and technological domains. It has significantly contributed to building public trust, fostering ethical research practices, and enabling evidence-based policy-making by providing a trusted pathway for researchers to access granular, sensitive data that would otherwise remain siloed. By clearly articulating the safeguards in place, the Five Safes empowers organizations to demonstrate accountability and transparency, essential elements for sustaining the social licence to operate in an increasingly data-driven society.

However, as highlighted by critical analyses, particularly the ‘Not fit for Purpose’ study, the framework faces ongoing challenges in an ever-evolving digital landscape. The dynamic nature of re-identification risks, the rapid advancements in AI and machine learning, and the increasing complexity of data linkage attacks demand continuous scrutiny and adaptation. While the Five Safes provides a robust conceptual architecture, its technical implementation must remain agile, incorporating cutting-edge privacy-enhancing technologies and adopting a more dynamic view of risk assessment. Furthermore, addressing the administrative burden and ensuring proportionate access to data are ongoing practical considerations.

Looking ahead, the Five Safes framework is poised for continued evolution. Its principles can serve as a guiding light for the responsible deployment of emerging technologies like federated learning and homomorphic encryption, providing a governance layer for complex distributed data ecosystems. Integrating a deeper understanding of human factors, enhancing international interoperability, and embedding a more explicit connection to data sovereignty principles will further strengthen its applicability.

In conclusion, the Five Safes framework remains a cornerstone of responsible data stewardship. Its continued relevance hinges on a commitment to continuous adaptation, robust implementation, and ongoing critical evaluation. By embracing these challenges, the framework can continue to facilitate valuable research and innovation, ensuring that the transformative potential of data is harnessed in a manner that is both secure and ethical, safeguarding privacy for individuals while advancing the collective good.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

References

[1] World Economic Forum. (2018). The Future of Jobs Report 2018. Retrieved from https://www.weforum.org/reports/the-future-of-jobs-report-2018
[2] European Commission. (2020). A European Strategy for Data. Retrieved from https://ec.europa.eu/info/strategy/priorities-2019-2024/europe-fit-digital-age/european-data-strategy_en
[3] Research Data Scotland. (n.d.). What is the Five Safes framework? Retrieved from https://www.researchdata.scot/engage-and-learn/data-explainers/what-is-the-five-safes-framework/
[4] Office for National Statistics. (2017). The ‘Five Safes’ – Data Privacy at ONS. Retrieved from https://blog.ons.gov.uk/2017/01/27/the-five-safes-data-privacy-at-ons/
[5] Ritchie, F., & Green, A. (2018). The ONS Secure Research Service and the Five Safes: Providing an integrated service for data access. IASSIST Quarterly, 42(3), 1–11. Retrieved from https://www.iassistquarterly.com/article/42-3-2
[6] UK Data Service. (n.d.). The Five Safes Framework. Retrieved from https://www.ukdataservice.ac.uk/manage-data/legal-ethical/access-control/five-safes
[7] Office for National Statistics. (2019). Applying for the Secure Research Service. Retrieved from https://www.ons.gov.uk/aboutus/whatwedo/ourstrategy/securedataservice/applyingforthesecuredataservice
[8] Nuffield Council on Bioethics. (2015). The collection, linking and use of data in biomedical research and health care: Ethical issues. Retrieved from https://www.nuffieldbioethics.org/assets/pdfs/Data-in-biomedical-research-and-health-care.pdf
[9] General Data Protection Regulation (GDPR). (2016). Regulation (EU) 2016/679.
[10] IBM. (n.d.). The Human Element of Cyber Security. Retrieved from https://www.ibm.com/topics/human-element-cyber-security
[11] NHS Digital. (n.d.). Five Safes Framework. Retrieved from https://digital.nhs.uk/services/secure-data-environment-service/introduction/five-safes-framework
[12] UK Data Service. (n.d.). Secure Access User Agreement. Retrieved from https://www.ukdataservice.ac.uk/get-data/access-data/secure-access/user-agreement
[13] National Cyber Security Centre (NCSC). (2020). Principles for the design of secure government digital services. Retrieved from https://www.ncsc.gov.uk/collection/cyber-security-design-principles/principles
[14] ISO/IEC 27001:2013. (2013). Information technology – Security techniques – Information security management systems – Requirements.
[15] Samarati, P., & Sweeney, L. (1998). Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In IEEE Symposium on Security and Privacy (pp. 1-10).
[16] Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E. S., Spicer, K., & de Wolf, P.-P. (2012). Statistical Disclosure Control for Microdata and Tables: From Theory to Practice. Springer.
[17] Machanavajjhala, A., Kifer, D., Gehrke, J., & Cornell, E. (2007). l-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1), 3-es.
[18] Rubinstein, B. I. P., & Watts, D. J. (2021). Differential Privacy: An Introduction for Statisticians and Social Scientists. Foundations and Trends in Econometrics, 12(1-2), 1-133.
[19] Office for National Statistics. (n.d.). Secure Research Service: Disclosure control policy. Retrieved from https://www.ons.gov.uk/aboutus/whatwedo/ourstrategy/securedataservice/securedisclosurecontrolpolicy
[20] Willenborg, L., & De Waal, T. (2012). Statistical Disclosure Control in Practice. Springer.
[21] Office for National Statistics. (n.d.). Secure Research Service. Retrieved from https://www.ons.gov.uk/aboutus/whatwedo/ourstrategy/securedataservice
[22] UK Data Service. (n.d.). Secure Access. Retrieved from https://www.ukdataservice.ac.uk/get-data/access-data/secure-access
[23] Department of Health and Social Care. (2022). Data saves lives: reshaping health and social care with data. Retrieved from https://www.gov.uk/government/publications/data-saves-lives-reshaping-health-and-social-care-with-data
[24] Research Data Scotland. (n.d.). Our Operating Model. Retrieved from https://www.researchdata.scot/about-us/our-operating-model/
[25] Australian Bureau of Statistics. (n.d.). About the DataLab. Retrieved from https://www.abs.gov.au/about-datalab
[26] Office of the National Data Commissioner. (2022). Data Availability and Transparency Act 2022. Retrieved from https://www.datacommissioner.gov.au/data-availability-and-transparency-act-2022
[27] Statistics New Zealand. (n.d.). Data Lab access for researchers. Retrieved from https://www.stats.govt.nz/data/access-data/data-lab
[28] Te Mana Raraunga. (n.d.). Māori Data Sovereignty Network. Retrieved from https://www.temanararaunga.maori.nz/
[29] European Commission. (2022). Proposal for a Regulation on a European Health Data Space. Retrieved from https://health.ec.europa.eu/ehealth-digital-health/european-health-data-space_en
[30] Statistics Canada. (2022). Research Data Centres. Retrieved from https://www.statcan.gc.ca/eng/rdc/aboutrdc
[31] Stodden, V. C., & Michener, W. K. (2017). Reproducible Research and Data Science. CRC Press.
[32] Snijkers, G., & Jones, J. A. (2018). The Social Licence of Official Statistics: A Framework for Assessment. Statistical Journal of the IAOS, 34(3), 441-452.
[33] Culnane, C., Rubinstein, B. I. P., & Watts, D. (2020). Not fit for Purpose: A critical analysis of the ‘Five Safes’. arXiv preprint arXiv:2011.02142. https://arxiv.org/abs/2011.02142
[34] Ohm, P. (2010). Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review, 57, 1701-1777.
[35] Rocher, L., Hendrickx, K., & de Montjoye, Y. A. (2019). Estimating the success of re-identification in incomplete datasets using an auxiliary dataset. Nature Communications, 10(1), 1-10.
[36] Furnell, S. M. (2008). Human factors in information security. Information Security Technical Report, 13(4), 163-169.
[37] Gore, R., & Goldacre, B. (2021). Realising the value of health data: Goldacre review. Department of Health and Social Care. Retrieved from https://www.gov.uk/government/publications/data-saves-lives-reshaping-health-and-social-care-with-data (Specific section on administrative burden).
[38] O’Hara, K., & Hall, W. (2018). Four Internets: Data, Geopolitics, and the Future of Digital Governance. Centre for International Governance Innovation.
[39] Floridi, L., Cowls, J., Beltrametti, M., Chatila, R., Chazerand, P., Dignum, V., … & Vayena, E. (2019). AI4People—Ethical guidelines for trustworthy AI: A European perspective. Minds and Machines, 29(4), 689-707.
[40] European Commission. (2022). Proposal for a Data Act. Retrieved from https://digital-strategy.ec.europa.eu/en/policies/data-act
[41] The Royal Society. (2017). Data management and use: Governance in the 21st century. Retrieved from https://royalsociety.org/-/media/policy/projects/data-governance/data-management-use-report.pdf
[42] NIST. (2021). NIST Privacy-Enhancing Technologies (PETs) Portfolio. Retrieved from https://www.nist.gov/privacy-framework/privacy-enhancing-technologies-pets
[43] Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Canziani, M., Charles, Z., … & Yu, H. (2021). Advances and Open Problems in Federated Learning. Foundations and Trends in Machine Learning, 14(1–2), 1-210.
[44] Ritchie, F. (2020). The five safes: the foundations of good data management practice. In Data Management and Analytics (pp. 11-23). Palgrave Macmillan, Cham.

1 Comment

  1. Five Safes, huh? Sounds like a new superhero team! Forget capes; their superpower is rock-solid data governance. Wonder if they ever battle villains trying to break into those Trusted Research Environments? Maybe Esdebe can design their headquarters?

Leave a Reply

Your email address will not be published.


*