Navigating the Complex World of Data Management: Insights from Dr. Emily Harper

In the ever-evolving landscape of scientific research, the management of data is as crucial as the research itself. During a recent conversation with Dr. Emily Harper, a seasoned data scientist, I gained invaluable insights into the complexities and necessities of data management in the scientific community. Dr. Harper, who has spent over a decade in data management for biological sciences, shared her experiences and offered practical advice on navigating this intricate field.

Dr. Harper began by emphasising the importance of understanding the nature of the data one is working with. “The first step is always to clearly define what kind of data you’re handling,” she explained. “Is it numerical, image-based, or text data? Each type has its own set of challenges and requirements when it comes to storage and backup.”

For instance, image data typically requires significant storage space. Dr. Harper recounted an experience from early in her career when a lack of foresight regarding data storage led to a near loss of critical image data. “We underestimated the volume of data our imaging equipment would generate, and it pushed our storage limits,” she recalled. “From that experience, I learned the importance of planning for data growth. You must always think a year or two ahead.”

Dr. Harper highlighted the necessity of a robust data description and organisation strategy. “Organising your data effectively is not just about tidiness; it’s about future-proofing your research,” she said. She offered practical tips, such as consistent naming conventions and thorough documentation, which are vital for maintaining data integrity and accessibility. “When you describe and organise your data well, it becomes much easier to share and collaborate effectively,” she noted.

Another vital aspect of data management Dr. Harper discussed was the need for regular backups and security measures. “Data is the backbone of any research project, and losing it can be catastrophic,” she asserted. While many institutions provide resources for data backup, Dr. Harper advised researchers to also explore external options. “Always have a secondary backup plan. Relying solely on institution-provided storage can be risky, especially if you’re dealing with large datasets.”

Sharing data is a critical requirement for many researchers, driven by funding agencies and scientific community standards. Dr. Harper explained the importance of understanding funder data sharing policies and selecting appropriate repositories for data deposit. “The repositories you choose must align with your data’s nature and the sharing requirements set by your funders,” she suggested. “Also, bear in mind that many repositories only accept finalised datasets, so plan your data lifecycle accordingly.”

Version control emerged as another significant topic during our conversation. “In fields where data changes frequently, managing versions can quickly become a nightmare,” Dr. Harper remarked. She stressed the importance of implementing a version control system from the onset of any project. “Having a structured versioning strategy helps in tracking changes and ensures that everyone involved in the project is on the same page.”

Dr. Harper also touched on the often-overlooked aspect of data management plans (DMPs), which are increasingly required by funding agencies. “Crafting a comprehensive DMP is essential, and it’s not just about ticking a box for funders,” she explained. “A well-prepared DMP guides your entire project’s data handling process, from collection to storage, and eventually sharing.”

In discussing data documentation, Dr. Harper stressed the need for meticulous record-keeping and the use of persistent identifiers. “Good documentation practices ensure that your data remains usable and understandable, not just for you, but for anyone else who might need to access it in the future,” she elaborated. Persistent identifiers, according to Dr. Harper, play a crucial role in maintaining the traceability and discoverability of datasets.

Lastly, Dr. Harper highlighted the role of community standards and ontologies in data sharing. “Adhering to established standards ensures that your data can be universally understood and utilised,” she said. She encouraged researchers to familiarise themselves with the metadata standards relevant to their field, as this enhances the data’s interoperability.

Throughout our conversation, Dr. Harper’s insights underscored the multifaceted nature of data management in the sciences. Her experiences serve as a guide for researchers navigating the vast seas of data, reminding them of the importance of foresight, organisation, and compliance with established standards. In a world where data is becoming increasingly central to scientific advancement, her advice is both timely and invaluable.

Koda Siebert