Digital Curation Centre: Data Storage Case Studies

Navigating the Digital Deluge: A Deep Dive into Data Curation with the DCC

In our increasingly data-driven world, the sheer volume of digital information generated daily is staggering, almost unfathomable. Think about it: every email, every research paper, every scientific observation, every customer transaction — it all adds up, creating a vast, swirling ocean of data. But here’s the kicker: data, much like any other valuable asset, isn’t truly useful unless it’s properly managed, preserved, and made accessible for the long haul. This is precisely where the Digital Curation Centre (DCC) steps in, acting as a crucial compass and guide in what can often feel like an overwhelming digital sea. Established way back in 2004, the DCC has consistently stayed at the cutting edge, grappling with the intricate challenges of digital data storage and long-term preservation. They’ve partnered with countless institutions, big and small, helping them forge and implement robust data curation strategies that truly stand the test of time.

Award-winning storage solutions that deliver enterprise performance at a fraction of the cost.

It’s not just about ‘saving’ files; it’s about making sure those files remain discoverable, interpretable, and usable far into the future, a concept far more complex than it might initially appear. We’re talking about ensuring the integrity of our collective intellectual heritage, really. If you’ve ever tried to open an old file format from twenty years ago and found yourself hitting a brick wall, you’ll immediately grasp the critical importance of what the DCC does. It’s a bit like meticulously curating a museum collection, only instead of artifacts, we’re preserving digital bits and bytes, ensuring they don’t crumble into digital dust.

The Urgency of Digital Curation: Why It Matters Now More Than Ever

Before we delve into some compelling real-world examples, let’s briefly touch on why digital curation isn’t just a nice-to-have, but an absolute imperative in today’s research and institutional landscape. Every organization, especially those involved in research or maintaining significant historical archives, faces a perfect storm of challenges:

  • Exploding Data Volumes: The sheer scale of data is growing exponentially. Traditional storage and management methods simply can’t keep pace.
  • Technological Obsolescence: Hardware fails, software becomes outdated, and file formats evolve or disappear entirely. Without active management, data can become unreadable faster than you might think.
  • Funding Requirements: Research funders increasingly mandate robust data management plans, recognizing that the data generated by publicly funded projects must be preserved and shared.
  • Ethical and Legal Obligations: Privacy concerns, data security regulations (like GDPR), and ethical responsibilities demand meticulous handling of sensitive information.
  • Reproducibility and Open Science: The push for transparent and reproducible research means data must be openly accessible and understandable, allowing others to validate findings and build upon them.

Ignoring these challenges isn’t really an option; it risks losing valuable intellectual capital, compromising research integrity, and failing to meet critical compliance standards. That’s why the DCC’s proactive approach, offering a clear framework and expert guidance, proves invaluable.

DCC’s Guiding Principles and Practical Approach

At its core, the DCC advocates for a holistic approach to data curation, one that considers the entire data lifecycle. This isn’t just about what happens at the end of a project, but how data is managed from its inception. They champion best practices often aligned with the FAIR principles – ensuring data is Findable, Accessible, Interoperable, and Reusable. They also draw heavily from models like the Open Archival Information System (OAIS) Reference Model, which provides a conceptual framework for digital preservation.

When an institution partners with the DCC, it’s not a one-size-fits-all consultation. Rather, it’s a deep dive, a collaborative journey to understand the unique institutional context, the specific types of data, the existing infrastructure, and the long-term strategic goals. This involves:

  • Needs Assessment: What are the current pain points? What data do they have? What’s at risk?
  • Strategy Development: Crafting bespoke data management plans (DMPs) that cover everything from data capture and documentation to storage, access, and eventual preservation or disposal.
  • Tooling and Infrastructure Recommendations: Advising on appropriate technologies, whether it’s repository software, storage solutions, or metadata management tools.
  • Training and Capacity Building: Empowering staff with the knowledge and skills needed to sustain effective curation practices internally.
  • Policy Formulation: Helping institutions develop clear, actionable policies that govern data management across the board.

This methodical approach is truly what sets them apart, moving beyond generic advice to provide tangible, implementable solutions. Let’s look at how this has played out in practice across a diverse range of institutions.

Case Study 1: Cornell Institute for Social and Economic Research (CISER)

Imagine the challenge of managing an immense and ever-growing collection of social and economic datasets, each with its own nuances, confidentiality requirements, and user base. This was the exact predicament facing the Cornell Institute for Social and Economic Research (CISER), a vital hub within Cornell University. Their existing setup, while functional, was beginning to groan under the weight of an expanding digital archive, threatening to become a bottleneck for both research and public access. The team at CISER recognized that they couldn’t just keep piling data onto existing servers; they needed a genuinely robust, scalable, and secure infrastructure that would not only house their valuable data but also make it readily available to researchers and the public, all while adhering to stringent quality control specifications and security regulations.

Their partnership with the DCC wasn’t a quick fix; it was a comprehensive overhaul. The collaboration began with a detailed assessment of CISER’s existing data holdings, usage patterns, and future growth projections. This meant really getting into the weeds, understanding the different types of data—survey results, economic indicators, demographic statistics—and how researchers interacted with them. The DCC brought their expertise in designing resilient digital archives, recommending a strategic shift towards network-attached storage (NAS) systems. This wasn’t just about buying new hardware; it involved designing a system architecture that prioritized redundancy, data integrity, and ease of management. They meticulously planned out the directory structures, metadata schema, and backup routines to ensure every bit of data was accounted for and protected.

The resulting infrastructure was a marvel of practicality and foresight. The compressed, publicly accessible datasets, often cleaned and anonymized for broader use, found a secure home in a dedicated archive residing on these NAS systems. This setup provides not only tremendous scalability – they can add more storage as needed without disrupting service – but also excellent reliability. Furthermore, the commitment to sustainability was key; the chosen technologies and workflows were designed to be maintainable over the long term, preventing future cycles of data migration headaches. For the broader research community, this meant frictionless access to valuable resources via the CISER data catalogue. Think about the impact of that: researchers worldwide can simply download a compressed file and immediately begin their analysis, greatly accelerating the pace of discovery.

However, it’s not just about public access. For CISER account holders, those directly affiliated with the institution and working on more sensitive or in-progress projects, a separate, more immediate tier of data access was crucial. Uncompressed data files, alongside all the vital documentation, codebooks, and ancillary files, found their home on dedicated research-computing servers. This tiered approach allowed CISER account holders to locally prepare, analyze, and manage data using industry-standard statistical software packages, fostering an environment of active research and data manipulation. This duality—public access for broad use and controlled access for in-depth research—demonstrates a nuanced understanding of varying data needs. It’s a testament to how careful planning and expert guidance can transform a challenging data management situation into a highly efficient and future-proof operation, directly empowering a vibrant research community.

Case Study 2: University of Bristol

The academic world thrives on recognition and impact, and nothing communicates the significance of research output quite like proper citation. Traditionally, this primarily applied to journal articles, books, and conference papers. But what about the underlying data? The raw numbers, the survey responses, the experimental results – these are often just as, if not more, valuable than the final published paper. The University of Bristol recognized this gap, understanding that their rich trove of research data wasn’t receiving the visibility or citation credit it deserved, limiting its potential impact and reuse. How could they ensure that when a researcher spent months collecting data on, say, climate change indicators or public health trends, that foundational work could be easily found, referenced, and built upon by others? This was the pivotal question that led them to the DCC.

The solution, championed by the DCC, revolved around the implementation of Digital Object Identifiers (DOIs). For those unfamiliar, a DOI is essentially a persistent, unique identifier for an object – in this case, a dataset. It’s like a permanent address on the internet; even if the data moves servers, the DOI remains constant, always pointing to its current location. The collaboration focused intensely on the practicalities of assigning these DOIs and seamlessly integrating them into the university’s existing data storage and dissemination infrastructure. This wasn’t just a technical exercise; it involved developing new workflows, educating researchers on the importance of DOIs, and configuring their repositories to mint and manage these identifiers effectively. They had to consider metadata standards to ensure that each DOI was linked to rich, descriptive information about the dataset, making it genuinely discoverable.

By systematically assigning DOIs to their research data, the University of Bristol achieved several critical objectives. Firstly, it dramatically enhanced the discoverability of their datasets. Researchers, peer reviewers, and even policymakers could now easily locate specific datasets through standard search engines and academic databases, simply by searching for the DOI. Secondly, and equally importantly, it boosted the citability of their data. Researchers could now properly acknowledge the original creators of datasets in their publications, just as they would cite a journal article, fostering a more equitable and transparent research ecosystem. This direct link between data and publication creates a powerful incentive for researchers to share their data, knowing they’ll receive credit for it.

I remember speaking to a data librarian from another institution who recounted how, before DOIs, researchers would often feel their data was ‘invisible’ – a true frustration. The University of Bristol’s initiative, therefore, wasn’t just a technical upgrade; it represented a fundamental shift towards valuing data as a first-class research output. It significantly increased the impact of their institutional data and cemented long-term access, creating a persistent, reliable pathway for future generations of scholars to access and reuse their valuable intellectual assets. This commitment to data citation not only benefits the individual researcher but truly elevates the entire institution’s contribution to global knowledge.

Case Study 3: Monash University

Monash University, a research powerhouse, found itself in a situation common to many large academic institutions: a burgeoning volume of research data, generated across myriad disciplines, often managed in siloed and inconsistent ways. Researchers, brilliant in their respective fields, weren’t always data management experts. This led to potential inefficiencies, difficulties in data sharing, and even risks of data loss or non-compliance. They knew they needed a more unified, systematic approach to research data management (RDM) that would not only support individual researchers but also foster a broader culture of open science and data sharing across the entire university. The challenge was immense: how do you bring coherence to such a diverse and decentralized environment?

The DCC’s involvement with Monash University was less about a specific technological fix and more about a strategic, overarching framework. The collaboration focused on developing a comprehensive data management plan (DMP) framework that wasn’t just theoretical but practical and adaptable. This framework addressed the entire data lifecycle, from the moment data is conceived or collected all the way through its active use, preservation, and eventual archiving or destruction. It covered critical aspects such as:

  • Data Description and Metadata: Ensuring data is well-documented and understandable, even years down the line, by someone who wasn’t involved in its original creation.
  • Storage and Backup Strategies: Implementing secure, redundant storage solutions and clear backup protocols.
  • Access and Security: Defining who can access what data, under what conditions, and how sensitive data is protected.
  • Ethical and Legal Considerations: Guidance on consent, anonymization, and adherence to relevant regulations.
  • Long-term Preservation and Archiving: Planning for the sustainability of valuable datasets beyond the life of the initial project.

Working closely with Monash’s research support teams, the DCC helped embed these principles into institutional policy and practical guidance. This wasn’t just a top-down mandate; it involved extensive outreach and training for researchers and support staff to build capacity and demonstrate the tangible benefits of good RDM. They provided templates for DMPs, guidance on selecting appropriate data repositories, and advice on navigating complex data sharing agreements. It truly changed how researchers thought about their data, shifting from ‘my data’ to ‘our data,’ a shared institutional asset.

By implementing these comprehensive strategies, Monash University achieved significant improvements. They drastically enhanced the accessibility of their research data, making it easier for internal and external collaborators to find and use relevant datasets. This, in turn, boosted the usability of the data, as consistent metadata and clear documentation meant less time was spent deciphering raw files. Ultimately, this partnership played a pivotal role in fostering a robust culture of open science and data sharing across Monash, positioning the university as a leader in responsible and impactful research. It’s an excellent example of how institutional change, driven by expert guidance, can have a profound and lasting effect on an entire research ecosystem.

Case Study 4: University of East London (UEL)

The University of East London (UEL) faced a common dilemma for many institutions looking to get serious about research data management: they needed a central, reliable place to store, manage, and share their research data, but they also needed a solution that was cost-effective, sustainable, and flexible enough to meet their evolving needs. Building a brand-new, bespoke data repository from scratch can be incredibly expensive and resource-intensive, often beyond the scope of many university budgets. So, the question became, how do you create a robust, professional data infrastructure without breaking the bank or getting bogged down in proprietary software limitations?

Their collaboration with the DCC provided a clear pathway: leveraging open-source solutions. Specifically, the project centered around building a research data repository using EPrints, an incredibly versatile and widely adopted open-source software platform. This choice was deliberate and strategic. EPrints is renowned for its flexibility, allowing institutions to customize it to their specific branding and workflow requirements, yet it also benefits from a large, active development community that continually improves and supports the software. The DCC’s role here was multifaceted, guiding UEL through the entire implementation process.

This involved several key stages:

  • Requirements Gathering: Understanding UEL’s specific needs in terms of data types, user roles, metadata standards, and integration with existing systems.
  • Software Configuration and Customization: Tailoring EPrints to align perfectly with UEL’s institutional policies and research data management strategies. This wasn’t just installing software; it was about shaping it to fit UEL like a glove.
  • Metadata Schema Development: Crucially, ensuring that the repository could capture rich, descriptive metadata for each dataset, making it truly discoverable and reusable.
  • Workflow Design: Establishing clear processes for data deposit, review, publication, and access.
  • Training and Support: Equipping UEL staff with the skills to manage and maintain the repository independently.

By choosing EPrints and working closely with the DCC, UEL was able to construct a highly functional and sustainable data repository. This project vividly underscored the critical importance of selecting appropriate software solutions – ones that genuinely align with an institution’s unique needs, available technical resources, and long-term financial constraints. Open-source solutions, when chosen wisely and implemented with expert guidance, offer an attractive alternative to proprietary systems, ensuring sustainability by avoiding vendor lock-in and allowing for greater community support.

The resulting repository transformed UEL’s approach to research data. It provided a centralized, accessible platform for researchers to deposit their data, fulfilling funder mandates and promoting open science. It facilitated easier data sharing and collaboration, both internally and externally. Most importantly, it ensured the long-term preservation and discoverability of UEL’s valuable research outputs, making a clear statement about their commitment to responsible data stewardship. It’s a powerful demonstration of how strategic choices in technology, coupled with expert guidance, can yield significant returns without demanding an exorbitant investment.

Case Study 5: University of Glasgow

Securing research funding is a highly competitive endeavor, and increasingly, funding bodies aren’t just looking at the proposed research methodology and anticipated outputs; they’re scrutinizing how the data generated by the project will be managed and preserved. This is particularly true for prestigious funders like the Arts and Humanities Research Council (AHRC), which places a strong emphasis on the long-term impact and accessibility of research outcomes, including digital outputs. The University of Glasgow found itself in this precise scenario: embarking on a significant AHRC-funded project, they needed to develop a robust data management plan (DMP) that would satisfy the funder’s rigorous requirements and guarantee the longevity and utility of their digital assets. This wasn’t a task to be taken lightly; a poorly conceived DMP could jeopardize funding or, worse, lead to the eventual loss of invaluable research material.

The DCC’s collaboration with the University of Glasgow was focused intently on crafting this critical DMP. This wasn’t just about filling out a form; it was an in-depth exercise in foresight and strategic planning. Together, they meticulously outlined every aspect of the project’s data lifecycle, ensuring nothing was left to chance. The plan, developed with DCC’s expert input, included:

  • Technical Methodology for Data Capture and Creation: How would the data be collected? What formats would be used? What naming conventions would be enforced to ensure consistency?
  • Expected Technical Support: Identifying the necessary IT infrastructure, software, and personnel resources required throughout the project’s duration and beyond.
  • Metadata Strategy: How would the data be described so that it remained intelligible to others, potentially decades later, without direct input from the original researchers?
  • Preservation Strategies: Detailing the specific methods and repositories chosen for long-term archiving, considering format migration, bit-level preservation, and redundancy.
  • Sustainability and Future Use: Beyond the project’s immediate goals, how would the data be made available for future research? What licenses would apply? How would intellectual property be managed?
  • Costs and Resources: A realistic assessment of the financial and human resources needed to execute the DMP effectively.

This level of proactive planning is paramount, especially in humanities projects where data can take incredibly diverse forms, from digitized manuscripts and oral histories to complex digital models and virtual environments. The DCC brought invaluable experience in navigating these complexities, ensuring the DMP addressed not only technical requirements but also ethical considerations and long-term impact.

This case study brilliantly highlights a fundamental truth: robust data storage and curation aren’t afterthoughts; they are integral components of successful research. By embedding comprehensive planning right from the grant application stage, institutions can not only meet stringent funder requirements but also ensure the enduring value and accessibility of their research data. It mitigates future risks, maximizes the return on research investment, and ultimately contributes to a richer, more reliable body of knowledge. It’s a proactive rather than reactive stance, one that every institution aiming for sustained research excellence ought to embrace.

Beyond the Case Studies: Broader Implications and Best Practices

The experiences of Cornell, Bristol, Monash, UEL, and Glasgow, while diverse, collectively underscore a vital message: effective digital curation is a cornerstone of modern research and institutional integrity. The DCC’s work isn’t just about solving isolated technical problems; it’s about embedding a culture of responsible data stewardship. But what does this mean for your institution or project? Here are some broader takeaways and best practices that emerge from these collaborations:

1. Plan Early, Plan Often: The Power of a Proactive Approach

You know the saying, ‘Fail to plan, plan to fail,’ right? Nowhere is this truer than in data management. As seen with the University of Glasgow, developing a comprehensive Data Management Plan (DMP) before a project even fully kicks off is absolutely crucial. It’s not just a box-ticking exercise for funders; it’s your roadmap for ensuring data integrity, accessibility, and long-term value. This includes thinking about:

  • Data Capture and Format Choices: What file formats are you using? Are they open and widely supported, or proprietary and prone to obsolescence? Think about long-term stability here.
  • Metadata Standards: How will you describe your data? Rich, consistent metadata is the key to discoverability and understanding. Without it, your data is essentially a black box.
  • Storage and Backup Strategies: Where will the data live? How will it be backed up? Consider the ‘3-2-1 rule’: three copies of your data, on two different media, with one copy offsite. It’s really non-negotiable.
  • Roles and Responsibilities: Who is responsible for what aspect of data management? Clear roles prevent crucial tasks from falling through the cracks.

Ignoring these questions early on can lead to significant headaches, even catastrophic data loss, down the line. Trust me, retroactively fixing a decade of inconsistent data naming conventions is not where you want to spend your time.

2. Embrace Open Standards and Sustainable Solutions

The UEL case study perfectly illustrates the power of open-source solutions like EPrints. While proprietary software can offer slick interfaces and dedicated support, it often comes with significant ongoing costs and the risk of vendor lock-in. Open standards and open-source tools foster greater interoperability, flexibility, and community support, which can be invaluable for long-term sustainability. When evaluating any data management solution, ask yourself:

  • Is this solution based on open standards, or will it lock me into a specific vendor’s ecosystem?
  • Does it have an active user community or long-term support model?
  • Can it integrate with other tools and systems I already use?

Choosing wisely here can save your institution a considerable amount of resources and heartache over the years.

3. Data is a First-Class Research Output: Give it a DOI!

The University of Bristol’s initiative with DOIs is a game-changer. By treating research data as a primary output, deserving of citation and persistent identification, we elevate its status and incentivize researchers to share it. Assigning DOIs isn’t just a technical detail; it’s a philosophical shift that recognizes the immense value of raw data. It directly contributes to:

  • Increased Discoverability: Making your data easier for others to find and use.
  • Enhanced Impact and Credit: Ensuring researchers receive proper attribution for their data collection efforts.
  • Improved Reproducibility: Providing clear, persistent links to the foundational evidence of research.

If you’re generating research data, seriously consider how you can assign persistent identifiers. It’s a relatively simple step with profound implications for the reach and recognition of your work.

4. Foster an Institutional Culture of Data Stewardship

Monash University’s success hinged on creating a comprehensive framework that permeated the entire institution. Digital curation isn’t just the job of IT or a single librarian; it requires a collective effort. This means:

  • Leadership Buy-in: Senior management must understand and champion the importance of RDM.
  • Training and Education: Providing researchers and support staff with the skills and knowledge they need.
  • Clear Policies and Guidelines: Establishing institutional policies that make good RDM practices the norm, not the exception.
  • Dedicated Support: Ensuring there are experts available to guide researchers through the complexities of data management.

Without this cultural shift, even the best technological solutions will struggle to gain traction. It’s about changing mindsets as much as it is about changing systems.

5. Don’t Be Afraid to Seek Expert Guidance

These case studies unequivocally demonstrate the pivotal role played by the DCC’s expertise. Digital curation is a complex field, constantly evolving with new technologies and challenges. You don’t have to navigate it alone. Organizations like the DCC bring years of experience, a deep understanding of best practices, and a neutral, objective perspective. They can help you:

  • Assess Your Current Situation: Identify vulnerabilities and areas for improvement.
  • Develop Tailored Strategies: Create solutions that fit your unique institutional context.
  • Navigate Complexities: Understand funder mandates, legal requirements, and technical nuances.

Sometimes, an external perspective is exactly what’s needed to cut through the internal complexities and find the most effective path forward. It’s an investment that almost always pays dividends, helping you avoid costly mistakes and ensuring your valuable data is truly future-proofed.

Key Takeaways: Securing Our Digital Future

These fascinating case studies paint a vivid picture of the Digital Curation Centre’s absolutely pivotal role in assisting institutions with the often-daunting complexities of digital data storage and curation. It’s clear that the DCC isn’t just offering generic advice; they’re rolling up their sleeves and working hand-in-hand with organizations to develop tailored solutions that directly address specific, unique challenges. This collaborative approach consistently leads to tangible improvements, namely enhanced data accessibility, usability, and long-term preservation.

Moreover, the DCC’s deep-seated expertise in the multifaceted domain of data curation has been truly instrumental in driving a broader, more impactful shift. They’ve played a crucial role in fostering a robust culture of open science and data sharing, a movement that benefits not only individual researchers but the entire, interconnected global research community. When data is findable, accessible, interoperable, and reusable, the potential for new discoveries and breakthroughs skyrockets. It truly democratizes knowledge and accelerates progress. So, whether you’re grappling with mountains of social science data like CISER, striving for proper attribution like Bristol, orchestrating institutional-wide change like Monash, building foundational repositories like UEL, or meticulously planning for major grants like Glasgow, the underlying message is clear: proactive, expert-guided data curation is no longer optional; it’s the bedrock of credible, impactful, and sustainable research in the digital age. It’s about securing our collective intellectual legacy for generations to come, and frankly, that’s a mission worth investing in.

References

  • Digital Curation Centre. (n.d.). History of the DCC. Retrieved from (dcc.ac.uk)

  • Digital Curation Centre. (n.d.). CISER Case Study. Retrieved from (dcc.ac.uk)

  • Digital Curation Centre. (n.d.). Assigning Digital Object Identifiers to Research Data at the University of Bristol. Retrieved from (dcc.ac.uk)

  • Digital Curation Centre. (n.d.). Bringing it all together: a case study on the improvement of research data management at Monash University. Retrieved from (dcc.ac.uk)

  • Digital Curation Centre. (n.d.). Using EPrints to Build Research Data Repository for UEL. Retrieved from (dcc.ac.uk)

  • Digital Curation Centre. (n.d.). DMPs in the Arts and Humanities. Retrieved from (dcc.ac.uk)