You are here

Content

Open Science and Research Data Management

"The term Open Science bundles strategies and procedures that all aim to consistently use the opportunities of digitisation to make all components of the scientific process openly accessible and reusable via the internet. This is intended to open up new possibilities for science, society and the economy in dealing with scientific findings. Open Science is based on six principles." Translated from the Mission statement of the German-language Open Science working group, 2014 www.ag-openscience.de/mission-statement

The six principles of Open Science

Open science is based on six principles to open up sub-steps and results of a scientific process. The first four principles are based on the paper "The Case for an Open Science in Technology Enhanced Learning" (Kraker 2011). Open Peer Review and Open Educational Resources are two other important aspects of science. The "Open Definition" explains what "open" means in this context.

  • Open Access: "Publishing in an open way and making it usable and accessible to everyone free of charge."
  • Open Data: "Making created data freely available."
  • Open Methodology: "Documenting the use of methods and the entire process behind them as far as practicable and relevant."
  • Open Source: "Use open source technology (software and hardware) and open own technologies."
  • Open Peer Review: "Make quality assurance transparent and comprehensible through open peer review procedures."
  • Open Educational Resources: "Use free and open materials for education and in university teaching."

Translated from http://openscienceasap.org/open-science/

Open Data

"Open data or open data is data that can be freely used, reused and shared by anyone. The only restriction concerns the obligation to cite the author."  http://opendatahandbook.org/guide/en/what-is-open-data/

For this purpose, the data must meet the following criteria:

  • Availability and open access: "The data must be available in its entirety and at no more than reasonable reproduction cost, ideally as a download from the internet. The data must still be available in a format that is fit for purpose and editable."
  • Reuse and sharing: "The data must be made available under conditions that allow reuse and sharing, including use of the data together with datasets from other sources."
  • Universal participation: "Everyone must be able to use, process and redistribute the data - there must be no discrimination against individuals, groups, or purposes of use."

Research data as a component of Open Data

What exactly is research data?

"Research data are not only the (final) results of research. Rather, it is any data generated in the course of scientific work, for example through observations, experiments, simulations, surveys, interviews, source research, recordings, digitisation, evaluations." Key Terms>Research Data Rat für Informationsinfrastrukturen, 2017

As research data are necessary to verify the research results based on them, their preservation is a recognised part of good scientific practice (see for example Code of Conduct DFG (PDF-File) and forschungsdaten.org (German)).

What exactly does research data management mean?

"Research data management encompasses all measures - beyond researcher action in the narrower sense and also organisation-related - that must be taken to obtain quality data, to comply with good scientific practice in the data life cycle, to make results reproducible and to take account of any documentation obligations that may exist (e.g. in the health sector). Also, the availability of data (possibly across domains) for subsequent use is an important issue." Key Terms >Research Data Management Rat für Informationsinfrastrukturen, 2017

Why research data management?

There are many good reasons to deal with the topic of research data management:

  • Increasing the visibility of research results: The visibility of research and thus the citation frequency and reputation is improved, as a data publication is considered an independent publication and is recorded by databases.
  • Good scientific practice: The DFG has included the handling of research data in its guidelines for ensuring good research practice.
  • Research funding: A good research data management plan increases the chances of being funded. Many funding bodies expect a statement on the planned handling of data when the application is submitted (for example Guidelines on FAIR Data Management in Horizon 2020). This is to ensure that the data acquired during the project is available in a suitable form for subsequent use, if possible, long after the project has ended.

  • Transparency of research output: Publishers and journals are also increasingly demanding that the data on which a publication is based be made available in order to ensure the transparency of the research output. This quality assurance measure enhances the reputation of the journals, from which the authors also benefit. In some cases, cooperation between publishers, journals and data repositories already exists to facilitate the linking of research data with the associated research articles.

  • Reusability of data: Through efficient research data management, data is preserved in the long term and can be reused long after the original research, for example for comparative research.

  • Research data are valuable: Collecting research data is time-consuming and cost-intensive, and in many cases not repeatable. This makes research data valuable - not only for the verification of research results, but especially for further research questions.

  • Minimising data loss: Scientists can work with the data without restriction over a long period of time, as the risk of data loss is minimised and it is stored in a future-proof file format.

What does the FAIR data principle mean?

According to the GO-FAIR initiative, research data should be discoverable, accessible, interoperable and reusable for humans and machines as far as possible.

The "FAIR Data Principles" are supported by many science policy actors, including research funders. These principles should be taken into account when setting up services or research data infrastructures: Requirements profile for handling research data with which data management can be easily established:

  • Findable: "Data and their metadata should be easily found by humans and machines through a clear and rich description".
  • Accessible: "Information on access conditions of research objects should be unambiguous for both humans and machines".
  • Interoperable: "Computers should be able to interpret data and their metadata in an automated way. This ensures interchange with other applications."
  • Reusable: "The data and metadata should be documented according to clear standards so that they can be re-used for further research."

For more information on the FAIR Data Principles, see: publisso/fair-principles and  FORCE 11 community.

Research Data Unit (RDU) University Heidelberg

The Research Data Unit (RDU) offers researchers at Heidelberg University central services for archiving and publishing research data in accordance with the university's Research Data Policy (only available in German) and manages heiDATA. The Research Data Unit is a joint service facility of the University Library and the Heidelberg University Computing Centre.

The RDU provides support at all stages of the research process:

Project planning and funding application: Data management plans and expert advice

What is a data management plan?

Even before the start of a research project, it is important and helpful to plan the handling of data and results for the coming work phase. A suitable tool for this is the data management plan (DMP), in which framework conditions and concrete strategies for the creation and processing, scope and securing as well as, if necessary, later publication of the accruing research data are defined and documented.

Heidelberg University has issued guidelines for the management of research data in its Research Data Policy (only available in German) adopted in July 2014. In this policy, all project leaders are encouraged to create data management plans for their research projects that ensure access to and use of research data while adhering to ethical and open access principles with appropriate security measures.

In accordance with the policy, the Research Data Unit supports project leaders in the creation and implementation of their data management plans.

Note: A DMP can be a prerequisite for the granting of third-party funding by some funding bodies and is reviewed together with the project application.

Further helpful information is provided by the Practical Guide to the International Alignment of Research Data Management - Extended Edition from Science Europe.

Data management tools

There are some web-based tools that can help build a data management plan. The Research Data Unit recommends:

With the help of the tools, researchers can gradually create a data management plan tailored to their needs and the requirements of different funding agencies. The use of the tools is free of charge. The plans created can be exported in various formats, for example to integrate them into project applications.

During your project: Data editing

The RDU offers a variety of software tools and services to support you with data processing and data management:

heiBOX is a data storage and sharing service provided by the University Computing Centre. It can be used to store research data in a central location at the Computing Centre, to synchronize the data with several devices like workstations, desktop computers and laptops, and to share the data with other users.

heiCLOUD is a infrastructure-as-a-service (IaaS) providing virtual ressources in terms of servers, storage and networks. Users are able to administrate virtualized servers in a self-contained way. This offers a high flexibility comparable to usages of a local server in an institute. heiCLOUD is available to all employees of Heidelberg University.

SDS@hd is a central service for secure storage of scientific data (Scientific Data Storage). The service is available to researchers of the Baden-Württemberg universities. The focus is on data that is frequently accessed ("hot data").

The University Computing Centre provides access to various computing clusters in Heidelberg and Baden-Württemberg for employees and students of the Heidelberg University.

The F*EX service (only available in German) is an HTTP-based service provided by BelWü, the Baden-Württemberg state university network, for sending large files.

End of your project: Sharing, publication and cataloguing of data

With heiDATA, the RDU offers all scientists the possibility to permanently archive and publish their data  If you are interested in using heiDATA, please see our author instructions.

In the comprehensive Research Data Catalogue of Heidelberg University we register published research data independently of whether they are available from our services or from external repositories.

heiARCHIVE (in German) is a institutional service for long-term data preservation at Heidelberg University. The service is currently still in pilot operation, so not all functionalities can be used yet. Details can be found in the information on pilot operation (onliy in German). If you are interested in using it, you can find the relevant information on participation in the pilot operation (only in German).

Information on the requirements of the funding institutions with regard to research data management

Most research funders have introduced guidelines for research data management. The general expectation is that research data from publicly funded projects is a public good that should be made openly available with as few restrictions as possible. Here is some important information on this:

Bundesministerium für Bildung und Forschung (BMBF)

Deutsche Forschungsgemeinschaft (DFG)

Horizon Europe

European Research Council (ERC)

Save and archive research data

Observance of good scientific practice is the basis of all research. This also includes the storage and retention of research data. Regular backup and storage should take place right at the beginning of the research work. Ideally, this is done not only on local storage media, but also on internal university servers, so that the data can be backed up regularly and restored in the event of loss.

Important note: The DFG considers the retention of primary data for at least 10 years to be part of good scientific practice. It has anchored this in a code on scientific integrity and also included the Guidelines for Safeguarding Good Research Practice in it. The aim of the Code is to anchor a culture of scientific integrity in the German scientific landscape. It states, for example, in Recommendation 7: "Primary data as the basis for publications should be stored on durable and secure media in the institution where they originated for ten years." This means that the data should be physically preserved and accessible for at least this period. The Statutes for Safeguarding Good Scientific Practice at Heidelberg University (PDF-file, Status: 1998) also require that research data be kept for at least 10 years. Archiving or preservation for this period is not the same as long-term archiving, as this requires more extensive measures.

All guidelines and rules of Heidelberg University can be found here.

Long-term archiving

Long-term archiving means more than just physically securing data. Due to constant technical development, software and hardware become obsolete very quickly. This can mean that suitable interfaces are no longer available for data carriers or that data cannot be correctly displayed or interpreted. In addition, file formats and software can become technically obsolete over time and fall into disuse. The Long-term archiving must ensure that the file contents can still be read in the distant future. This can be achieved by regularly converting files to current file formats or by using file formats that are open and well documented from the outset. In addition to physical availability, a central goal of long-term archiving is to preserve interpretability. This requires well-documented metadata. Metadata describe the content of the research data and provide information about the collection methods used, software and hardware, coding, etc. The metadata are stored together with the actual research data.

Another important task of long-term archiving is to ensure that data is not altered (either accidentally or intentionally). It also fulfils the requirement of third-party funders for secure data storage of research data.

Long-term archiving of research data at the institutional level is often difficult because different subject areas are usually represented within the institution with correspondingly different types of data. However, it is possible to publish data in subject-specific archives (also called repositories). These ensure that the data are properly backed up and, in case of doubt, transferred to a new format. In addition, they offer appropriate rights and licence management, which enables different usage scenarios. For detailed information, see https://www.forschungsdaten.info/themen/veroeffentlichen-und-archivieren/ (in German) und https://www.publisso.de/en/research-data-management/rd-archiving/

Where can I find research data?

The best source for published research data is usually data centres and data repositories. Repositories are storage locations for digital objects. Since repositories are mostly open to the public or to a restricted group of users, this term is closely linked to Open Access. Further information on repositories at open-access.network/repositories.

The selection of a suitable repository should be based on the practices of the respective discipline or the requirements of funding institutions or publishers. If there are no requirements, subject or institutional repositories should be considered first as storage locations. There are several directories that facilitate the search for a suitable repository.

heiDATA is the university’s research data repository. All researchers affiliated with the university can use this service for archiving and publishing their data.

The Research Data Catalogue lists published research data by researchers of Heidelberg University - independently of whether they are available from university-based or external repositories. The Research Data Catalogue is part of heiBIB

re3data.org - Registry of Research Data Repositories is a directory for repositories that make scientific research data freely available.

DataCite Search is an international search engine for research outputs, where it is possible to search for registered datasets. The DataCite portal allows researchers to share their data and store all their research outputs, including publications, software and funding data.

Google Dataset Search is Google's data search engine. It searches the internet for data sets that are described as such on publicly accessible websites using the schema.org standard.

List of german repositories is a list of repositories with identification of those with a DINI certificate (only available in German).

Ethics, data protection, other legal provisions

Legal and ethical issues directly influence work with research data in the life sciences, even if these issues are not the subject of the research. Research data operate within the legal framework of personality, data protection and copyright law as well as ethical principles. In principle, the guiding principle in research data management is: "as open as possible, as closed as necessary". If you have any legal questions, it is best to contact your colleagues at the Competence Centre for Research Data at the University Library.

Further information on Open Science and research data

Comprehensive information on the topic of Open Science and research data management can be found at:

UNESCO Recommendation on Open Science

Helmholtz Open Science

Federal Funding

Forschungsdaten-Info Information portal on research data management (English)

NFDI consortia in medicine:

  • NFDI4Health: NFDI personal health data
  • GHGA: German Human Genome–Phenome Archive

DataCite Search Research data search engine

nestor Handbook Digital Curation of Research Data: Experiences of a Baseline Study in Germany

Context Column