This is the active version of the national maDMP reference model for comments to national stakeholders in Finland prepared in EOSC-INFRA OSTrails project.
Towards structural maDMP template - national metadata application profile
Target & Focus:
- Our focus in OSTrails FI pilot has been to prepare a reference data model to be implemented further by research organisations, not a DMP template.
- Thus we focus to the information required - not to the questions i.e. how the information should be asked.
- Core of the reference data model is presented in columns A-D.
- Additional information relating to the purpose of the fields are given in the columns F-K, and especially for the justification of national additions.
- Target is that the model will be found useful by the Finnish research ecosystem to pursue development of machine actionability.
- The suggested model has been designed in cooperation with Finnish universities, research organisations and the CSC.
- Whilst respect the RDA maDMP standard, national suggestion to Finland has been made e.g. meeting requirements of the Research Council of Finland for DMPs link:
National consultation:
- To ensure the usability of the model the Finnish research organisations, service providers, and the funders are consulted for their comments, further suggestions and use cases.
Please indicate what information would still be needed to provide or launch machine-actionability & which fields can be used automatically via digital object, or with AI e.g. extracting information from an existing source.
- We are thankful for comments noting any concerns, typos or logical problems.
Link to ontologies, data spaces and repositories and other relevant sources of information when you notice gaps of usage of available auxiliary information.
- Comment the purpose and user of the data elements.
Does your organization agree which national additions should be mandatory in addition to those that are mandatory in RDA standard.
Documentation:
- All notions are document the work into the table below.
- For the consultation additional questionnaire for comments will be published in the Webinar 12 May.
Guidelines to the reference maDMP data model table:
Below is the structure of the RDA standard, and the elements from RDA standard are in the table below marked in GREEN and there is also a column "Is this in the RDA template?" indicating that the data element is in RDA standard.
The elements derived from our national workshop consisting of questions to DMP are marked in BLUE.
The sections are grouped according to the RDA standard:
DMP, Project, Contact, Contributor, Funding, Cost, Dataset, Distribution, Host, License, Security and Privacy, Technical Resource, Metadata.
In addition, we have suggested additional elements to highlight their importance in the data model: Data lifecycle, DPIA and Ethics. Especially Data lifecycle contains elements that have been regarded important in our workshops.
Version 2025-05-12 in PDF.
Link for Questionnaire for further comments
Reference:
You can refer back to the RDA maDMP data model which is the core, but we can make suggestions for developing its machine-actionability. In addition, we can add relevant DMP fields to national context. But note that this is general data model - not containing scientific discipline specific information.
https://github.com/RDA-DMP-Common/RDA-DMP-Common-Standard
OSTrails Plan-Track-Assess Pathways:https://zenodo.org/records/13145788
For use cases you can refer to:
Marttila, J., Manninen, S., Ahokas, M., Hindersson-Söderholm, T., Keckman-Koivuniemi. H. (2022). Dynaamiset DMP:t -työryhmän loppuraportti. https://zenodo.org/records/6601258
Marttila, J. & Manninen, S. (2022). Dynaamiset DMP:t -työryhmän toivekartoitus. https://zenodo.org/records/6594597
How to read the colour codes in the table?
National suggested fields are in BLUE; RDA Standard fields in GREEN.
A. | B. | C. | D. | E. | H. | I. | J. | K. | L. Phase | M. | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|
DMP | ||||||||||||
dmp_id | Identifier for the DMP itself | Nested Data Structure | 1 | 1 | 1 | general | Public | System for interoperability | 3 | Request id for DMP Where does this originate from, especially if using different tools/systems for DMPs? | ||
title | Title of a DMP | String | 1 | 1 | 3 | general | Public | User | 3 | Max 100 char | ||
cost | To list costs related to data management. Providing multiple instances of a 'Cost' allows to break down costs into details. Providing one 'Cost' instance allows to provide one aggregated sum. | Nested Data Structure | 0..n | 1 | 2 | general | Closed (see comment) | Funder | 2 | → further development: In DMP template this could be linked to the budget with grant id. What costs included? Needs clear guidelines. Closed early in the process, but depends if the DMP will be actively made public / "published" at some point during the data life-cycle. | ||
created | Date and time of first version of a DMP Encoded using the relevant ISO 8601 Date and Time compliant string | DateTime | 1 | 1 | 1 (system) | general | Public | Organization | 3 | 2025-05-28 System recorded | ||
modified | Must be set each time DMP is modified. Indicates DMP version. Encoded using the relevant ISO 8601 Date and Time compliant string | DateTime | 1 | 1 | 1 (system) | general | Public | Organization | 3 | 2025-05-28 | ||
dataset | To describe data on a general level. Describe how datasets used can be categorised. | Nested Data Structure | 1..n | 1 | 3 | general | Public | Organization, PI | 3 | At least one dataset should be defined. See "Dataset" in the table. | ||
project_id | Unique project identifier related to DMP | Nested Data Structure | 0..n | 2 | 3 | general | Public | Everyone | 3 | |||
dataset_id | Unique dataset identifier related to DMP | Nested Data Structure | 1..n | |||||||||
data_lifecycle | Describe at general level data lifecycle at a general level, and how open science criteria will be applied. | String | 0..1 | 2 | 3 | general | Public | Organization, PI | 1 | Data will be shared in active phase using Allas, after the project data will be shared via Fairdata IDA, and data paper will be published. The aim is to support reuse of the data. | ||
type | A description on what kind of DMP to do | Term from Controlled vocabulary | 1 | 2 | 3 | Organisational / national / international | Public | System for appriopriate DMP template | 1 | Type of DMP: Student, Academic organization own template, Academic national template, National generic, EU Horizon, RDA / International, Input formula should be later updated or extended to a richer format. Input profiles: for example: (Define national typology for recommended use of DMPs (light, detailed), key issues personal data, confidentiality of information, resource intensity, number of actors (outsiders)) | ||
nextreview | Next review date to update DMP Encoded using the relevant ISO 8601 Date and Time compliant string | Date | 0..1 | 2 | 2 / 3 | Organisational / Funder specific | Public | Organization, PI | 3 | Research project benefits of timing the update of DMP, and Data Support can better plan the assistance. Suggested to be added for making dmp alive and updated e.g. for reporting purposes | ||
Rights & ethics - IN RDA Standard Ethical issues are part of DMP domain - FI pilot considers grounds for separate category or merge with license & add user rights:RIghts, ethics & license | ||||||||||||
ethical_issues_exist | To indicate whether there are ethical issues related to data that this DMP describes. Allowed Values:
| Term from Controlled vocabulary | 1 | 1 | 3 | general | Closed | PI, Organization, Service provider | 1 | This is an important trigger because then the DMP must be very good Allowed Values:
| ||
ethical_issues_report | To indicate where a report/document that details all identified ethical issues (might be for example emit from a meeting with an ethical committee) | URL | 0..1 | 1 | 3 | general | Closed | PI, Organization, Service provider | 1 | Add link Comment: Date when the decision was made | ||
ethical_issues_description | To describe considerations that require compliance with laws and regulations (e.g. GDPR, animal welfare) due to the involvement of humans, animals, or sensitive information. This includes ensuring informed consent from participants, protecting privacy and confidentiality, and adhering to applicable legal and ethical standards throughout the research. | String | 0..1 | 1 | 3 | national | Closed | PI, Organization, Service provider | 1 | |||
research_permit | Rights related to data: Whether permission is required to collect data in research data set | Term from Controlled vocabulary | 1 | 2 | 3 | national | Closed | PI, Organization | 1 | Actual research permit | ||
ownership_data_right_person | Who owns the data/rights related to the data? Give ORCID, if available otherwise give name surname first name | String | 0..1 | 2 | 3 | national | Closed | PI, Organization | 3 | Person or organization? Dataset-specific? The organisation can be a research organisation, a customer organisation or an organisation that otherwise only owns the data (e.g. an archive) | ||
ownership_data_right_organization | Which organization owns the data/ rights related to the data? Give ROR if available, otherwise name of the official name of the organization as given in their website | String | 1 | 2 | 3 | national | Public | PI, Organization, Reuse | 1 | ROR - add source list here | ||
ipr_copyright | Is there IPR or copyright issues in research described in a DMP | Term from Controlled Vocabulary | 0..1 | 2 | 3 | Organizational | Closed | PI, Organization | 3 | yes, no, unknown | ||
agreements_data_right | What agreements are needed with other organisations and people related to the rights to the material? Give both the type and name of the agreement. (DMP) | String | 0..n | 2 | 3 | Organizational | Closed | PI, Organization | 3 | Data right agreement with data provider, e.g. with Findata. Agreement for utilising technical devices, and external research laboratory. | ||
agreements | What other agreements are needed? (DMP) | String | 0..n | 2 | 3 | Organizational | Closed | PI, Organization | 3 | Disclosure agreement with project partners | ||
properties of dmp_id | ||||||||||||
identifier | Identifier for a DMP | String | 1 | 1 | 3 | general | see comments | Everyone | 3 | For some research DMP may have to be closed by a justified reason, otherwise public | ||
type | Identifier type | Term from Controlled Vocabulary | 1 | 1 | 3 | general | see comments | Everyone | 3 | doi Allowed Values:
| ||
Project | ||||||||||||
id | Project identifier | Nested Data Structure | 1 | 1 | 2 | general | Public | Everyone | 4 | Compare also with RAiD: https://raid.org/ | ||
title | Name/Title of the project | String | 1 | 1 | 3 | general | Public | Everyone | 1 | If project information is not yet available anywhere, how much should be produced here? Is it possible to have multiple DMPs for one project or a maDMP without a funder or project? | ||
start | Project start date Encoded using the relevant ISO 8601 Date and Time compliant string | Date | 1 | 1 | 3 (Can trigger update process e.g. after 3-6 months after start) | general | Public | Everyone | 1 | 2026-01-01 Encoded using relevant ISO Date and time compliant string | ||
end | Project end date Encoded using the relevant ISO 8601 Date and Time compliant string | Date | 0..1 | 1 | 3 (Can trigger update process & reporting stage) | general | Public | Everyone | 1 | 2028-12-31 Encoded using relevant ISO Date and time compliant string If DMP is used for continuous process no end date is required, but this needs to be specified in description. Alternatively end date can be used to the end of funding period of long-term-plans. | ||
description | Project short description | String | 1 | 1 | 1 (project_id links to long description) otherwise 3 | general | Public | Everyone | 1 | Short description e.g. max char 2000; include link to project plan if needing (project id field links to the longer description to project master data) Example: This project aims to analyze the impact of urbanization on local biodiversity by collecting and assessing environmental data from multiple urban centers. Using remote sensing, field observations, and statistical modeling, the study will identify key factors influencing species diversity and habitat loss. The findings will support sustainable urban planning initiatives and inform conservation strategies. | ||
funding | Funding related with a project | Nested Data Structure | 0..n | 1 | 2 (Derived from Funding status & Grant_id) | general | Public | Everyone | 3 | Public after publishing the grant. | ||
discipline | Scientific discipline of project | Term from Controlled Vocabulary | 0..n | 2 | 2 / 3 | general | Public | Everyone | 1 | 3 if need to be added by researcher UNESCO science classification pore-in via main categories | ||
Contact | ||||||||||||
contact_id | Identifier for contact | String | 1 | 1 | 1 | general | Public | Everyone | 1 | ORCID of Contact person for a DMP / Principal (responsible) researcher | ||
mbox | E-mail address | String | 1 | 1 | 3 | general | Public | Everyone | 1 | from orcid, if possible or manual | ||
firstnames | First names of the contact person / principal researcher; | String | 1 | 1 (single field name) | 2 (from ORCID) / 3 | general | Public | Everyone | 1 | from orcid or manual Note: In RDA this is not separated into first name and last name; In Finnish data model this is separated | ||
lastname | Last name of the contact person / principal researcher; | String | 1 | 1 (single field name) | 2 (from ORCID) / 3 | general | Public | Everyone | 1 | from orcid or manual Note: In RDA this is not separated into first name and last name; In Finnish data model this is separated | ||
organization | Organization of contact | String | 1 | 2 | 2 (from ORCID/ROR) / or 3 | general | Public | Everyone | 2 | If ROR exists this can be derived from ROR | ||
ROR | ROR of organization of contact | String ROR | 1 | 2 | 3 | general | Public | Everyone | 1 | This has its own attributes (ROR) | ||
properties in contact_id | ||||||||||||
identifier | To indicate the specific value of an identifier for a contact | String | 1 | 1 | ||||||||
type | Identifier type
| Term from Controlled Vocabulary | 1 | |||||||||
Contributor | ||||||||||||
#_Nested Data Structure used if there are many contributors (and data controllers) this information will requested from all of them | ||||||||||||
contributor_id | Contributor id e.g. ORCID | Nested data structure | 1..n | 1 | 2: Digital authentication e.g. by e-mail Contributor will add their ORCID or from Funding application 3: Has risk of errors for ORCID | general | Public | Everyone | 2 | Needs to be defined - or where could be derived? From funding decision? | ||
mbox | E-mail address | String | 0..n | 1 | 2 / 3 (depending if person has allowed sharing) | general | Public | Everyone | 2 | |||
firstname | First name of the contact person / principal researcher; | String | 1 | 1 (single field name) | 2 (from ORCID) / 3 | general | Public | Everyone | 1 | from orcid or manual In RDA this is not separated into first name and last name - Do we need the separate fields in Finland? | ||
lastname | Last name of the contact person / principal researcher; | String | 1 | 1 (single field name) | 2 (from ORCID) / 3 | general | Public | Everyone | 1 | from orcid or manual In RDA this is not separated into first name and last name - Do we need the separate fields in Finland? | ||
role | Role of the contributor:
| Term from Controlled Vocabulary | 1..n | 1 | 2 / 3 | general | Public | Everyone | 2 | Data controller is required for research data services Use case for AI search from funding proposal by roles | ||
organization | Organization of contributing researcher | String | 0..1 | 2 | 2 (from ROR) / or 3 | general | Public | Everyone | 2 | If ROR exists this can be derived from ROR | ||
ROR | ROR of organization of contributing researcher | String ROR | 0..1 | 2 | 3 | general | Public | Everyone | 1 | This has its own attributes (ROR) | ||
properties in contributor_id | ||||||||||||
identifier | Term from Controlled Vocabulary | String | 1 | 1 | 3 | general | Public | Everyone | 2 | orcid | ||
type | Identifier type
| Term from Controlled Vocabulary | 1 | 3 | general | Public | Everyone | 2 | ||||
Cost | ||||||||||||
# list all cost object categories | ||||||||||||
currency_code | Currency of costs Allowed values defined by ISO 4217. | Term from Controlled Vocabulory | 0..1 | 1 | 3 / 2 (from grant_id) | general | Closed/Public | Organization | 2 | "978" for eur | ||
description_cost | Description of costs Note: Could this be linked to Grant ID for description of applied/granted budget? | String | 0..1 | 1 | 3 / 2 (from grant_id / application) | general | Closed/Public | Organization | 2 | from Grant id when funded | ||
title_cost | Title of costs Note: Could this be linked to Grant ID for title of applied/granted budget? | String | 1 | 1 | 3 / 2 (from grant_id / application) | general | Closed/Public | Organization | 2 | from Grant id when funded | ||
value_cost | Value of costs Note1: Could this be linked to Grant ID for applied/granted budget? Note2: Link with DMP / cost_dmp | Number | 0..1 | 1 | 3 / 2 (from grant_id / application) | general | Closed/Public | Organization | 2 | from Grant id when funded | ||
Funding | ||||||||||||
#_Nested Data Structure if many funding sources for a large research program unless defined that DMP relates to single grant decision | ||||||||||||
funder_id | Funder ID of the associated project, ROR if available | String | 1 | 1 | 2: ROR API via search option 3 | general | Public | System | 1 | Registry number of associated project Y-tunnus / Business ID Nested structure used if there are many of these. Field is empty if none | ||
funding_status | To express different phases of project lifecycle.
| Term from Controlled Vocabulory | 0..1 | 1 | 3 | general | Public | Everyone | 1-5 | from Funding id maDMP use case: automatically derived information from grant ID the project is applied/granted | ||
grant_id | Grant ID of the associated project | Nested data structure | 0..1 | 1 | 2 if DOI (not currently) 3 | general | Public | Everyone | 3 | 654321 | ||
properties of funding_id | ||||||||||||
identifier | Funder ID, recommended to use CrossRef Funder Registry. See: https://www.crossref.org/services/funder-registry/ | String | 1 | 1 | ||||||||
type | Identifier type Allowed Values:
| Term from Controlled Vocabulary | 1 | 1 | ||||||||
funder | Name of the funding organization, official name of the funder as given in their registry or their website | String | 1 | 2 / 1 | 2 | general | Closed until Funded | Everyone | 1 | Research Council of Finland | ||
submission_dl | Deadline for funding submission Encoded using the relevant ISO 8601 Date and Time compliant string | Date | 1 | 2 | 2: select funding 3 | Closed until Funded | PI, Research group | 1 | 2026-08-31 | |||
decision_expected | Expected date for funding decision Encoded using the relevant ISO 8601 Date and Time compliant string | Date | 1 | 2 | 2: select funding 3 | Closed until Funded | PI, Research group | 1 | 2026-06-12 | |||
properties of grant_id | ||||||||||||
identifier | Grant ID | String | 1 | 1 | ||||||||
type | Identifier type Allowed values:
| Term from Controlled Vocabulary | 1 | 1 | ||||||||
start | Funding (Project) start Encoded using the relevant ISO 8601 Date and Time compliant string | Date | 1 | 2 | 2 | general | Public | Everyone | 3 | 2027-01-01 Used if funding period is different from project.start date | ||
end | Funding (Project) end Encoded using the relevant ISO 8601 Date and Time compliant string | Date | 1 | 2 | 2 | general | Public | Everyone | 3 | 2028-12-31 Used if funding period is different from project.end date | ||
Dataset | ||||||||||||
#_Nested Data Structure if many datasets are used. Relationships to 1..* datasets are defined at DMP level. DMP has "dataset" association that can relate to many datasets. Each data set can have multiple files/distributions. | ||||||||||||
id | Dataset ID Preferred values: DOI, PID, URN, URL, handle, ark, other digital ID | String | 1 | 1 | 2 | general | Public | RPO (reseach performing organization), Repositories, Data catalogues | 3 | Dataset may not exist when DMP is defined. DMP tool should provide temporary ID before dataset gets PID by some way. | ||
title | Data set title / name Title is a property in both Dataset and Distribution, in compliance with W3C DCAT. In some cases these might be identical, but in most cases the Dataset represents a more abstract concept, while the distribution can point to a specific file. | String | 1 | 1 | 1/3 | general | Public | Same as above, but humans need this instead of machines | 3 | There can be many data sets, the information is related to one entity. A so-called metax entity, i.e. one must be able to express a wide variety of entities that then have attributes. Example "Fast car images" | ||
type | If appropriate, type according to: DataCite and/or COAR dictionary. Otherwise use the common name for the type, e.g. raw data, software, survey, etc. https://schema.datacite.org/meta/kernel-4.1/doc/DataCite-MetadataKernel_v4.1.pdfhttp://vocabularies.coar-repositories.org/pubby/resource_type.html Data set type (indication interview, questionnaire, photos, video, measurement, samples, simulation, code) | Controlled vocabulary | 0..1 | 1 | 2 / 3 | general | Public | Data Catalogues, Repositories, RPOs | 3 | Associated with a single dataset, can include ready-made options, but also an open text field. What is the correct granularity level here? Resource intensity can affect the needs of the description. In general, it is instructed to describe so that the attribute applies to the entire dataset. By describing just one data set, it would be possible to create a so-called data set. light-DMP. This is an important option to keep. Need here some sort of defined and shared vocabulary on "data set types". RDA Commons points to DataCite and Coar, but neither feel enough by themselves. Should do national type list based on those, but enhanced to give perhaps subtypes. String or Partly (Controlled vocabulary and "Other" option) | ||
personal_data | Whether the dataset contains personal data Allowed Values:
| Term from Controlled Vocabulary | 1 | 1 | 3 | general | Closed | Data protection officers, RPOs (data protection/management experts), repository | 2 | Associated with a single dataset, is this personal data the data of the data providers or of the target data? What is the role of individuals? Yes or No / Yes or No. FI restriction: It is assumed that "Unknown" is not an option here after submission to Funder, and researcher must be able to judge whether data contains personal data or consult about it. Type of personal data will be in its own section. Can trigger automatic data protection processes. | ||
sensitive_data | Whether there are legal restrictions that apply to using this data, e.g. military use, commercial restrictions, endangered species Allowed Values:
| Term from Controlled Vocabulary | 1 | 1 | 3 | general | Closed | Data protection officers, RPOs (data protection/management experts), repository | 2 | Related to the dataset, how can we ensure that this is not asked except when it is likely? Dual use and import controls? FI restriction: This should be yes/no after submission to Funder. In dataset we need to know if there is sensitive/confidential information or not. That triggers then more questions in security & privacy section. | ||
description | Description of dataset Description is a property in both Dataset and Distribution, in compliance with W3C DCAT. In some cases these might be identical, but in most cases the Dataset represents a more abstract concept, while the distribution can point to a specific file. | String | 1 | 1 | 3 | general | Public | Repository, Data catalogues, CRIS | 3 | Needs some kind of guidance on what level of description is needed. Need for space limitatation? We already have name for the dataset. How much more description we want/need at this point? We should not ask these in DMP. They are about publishind and metadata. If somebody wants to combine DMP and CRIS, this information needs to be interoperable, but this is NOT part of DMP. Same holds true for all red rows. | ||
distribution | Technical information on a specific instance of dataset | Nested Data Structure | 0..n | 1 | 3 | general | Public | Repository, Data catalogues, CRIS | 3 | This might need more clarification, as it relates to resources/infra needed. | ||
issued | Date of dataset been issued Encoded using the relevant ISO 8601 Date and Time compliant string | Date | 0..1 | 1 | 1 / 3 | general | Public | Everyone | 3 | |||
keyword | Keyword | String | 0..n | 1 | 1 / 3 | general | Public | Everyone | 3 | Should be asked only when data is opened/catalogued. Terms from controlled vocabulory | ||
language | Language of the dataset expressed using ISO 639-3 | Term from Controlled Vocabulary | 0..n | 1 | 1 / 3 | general | Public | Everyone | 3 | |||
metadata | To describe metadata standards used | Nested Data Structure | 0..n | 1 | 1 / 3 | general | Public | Everyone | 3 | |||
data_quality_assurance | To describe any quality assurance processes applied to a dataset, such as, to ensure its accuracy, reliability, consistency, and usability for its intended purposes. This includes systematic practices, procedures, and policies designed to maintain high data quality throughout its lifecycle. | String | 0..n | 1 | 3 | general | Public | Funder | 3 | We calibrate measuring equipment daily, run repeat samples to monitor consistency in measurements and results, and cross-check collected data with at least two colleagues for accuracy. | ||
method_quality_assurance | Method describing how the quality assurance has been conducted | Term from Controlled Vocabulary | 1 | 2 | 3 | general | Public | Funder | 3 | Example: TAU list as an example. There is a need to develop a list related to disciplines | ||
category (CHECK) | Describe categories of datasets if multiple and of different types | Term from Controlled Vocabulary | 0..n | 2 | 3 | general | Public | Organization, PI | 3 | Categories need to be defined Controlled vocabulory by Scientific field | ||
format | Description of used dataset formats during the active research. For example database, csv, xml, json. (Format of the dataset to be used. - Format of the datasets to be published / distributed after project is different) | Term from Controlled Vocabulary | 0..1 | 2 | 2 | General | Public | Organization, PI, Research group | 3 | Relates to one data set How does this relate to other outputs than datasets like code? Or code that is close related to data usability, e.g. link or PID? Format vs. Type? What is the difference. File format should be in distribution, not here. | ||
data_sharing_issues | How legal and ethical issues related to the sharing of data (e.g. ownership, copyright, sensitivity) will be resolved | String | 1 | 2 | 3 | National | Public | Organization, PI | 3 | |||
data_sharing_contracts | Are contracts needed prior to sharing data? Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 | Organizational | Public | Organization, PI | 3 | |||
data_sharing_ownership | Is the ownership of data clear for data sharing? Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 | Organizational | Public | Organization, PI | 3 | |||
data_sharing_copyright | Are the copyright issues clear related to data sharing?
| Term from Controlled Vocabulary | 1 | 2 | 3 | Organizational | Public | Organization, PI | 3 | |||
data_sharing_sensitivity | Are possible issues related to sharing sensitive data cleared? Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 | Organizational | Public | Organization, PI | 3 | |||
data_sharing_other | Describe any emerging other issues of data sharing | String | 0..1 | 2 | 3 | Organizational | Public | Organization, PI | 3 | |||
data_landing_page | Give the link / PID to landing page of data | link / PID | 0..1 | 2 | 3 | General | Public | Organization, PI | 3 | |||
properties of dataset_id | ||||||||||||
identifier | Identifier for a dataset | String | 1 | 1 | 3 | General | Public | https://hdl.handle.net/11353/10.923628 | ||||
type | Identifier type Allowed Values:
| Term from Controlled Vocabulary | 1 | 1 | 3 | General | Public | pid | ||||
Dataset life cycle - (this is suggested extension to Dataset - requires re-specification for dataset level) | ||||||||||||
description | Summarise description of all datasets created in project if many, and after the project at general level, and how they are managed. Description is also a property in both Dataset and Distribution, in compliance with W3C DCAT. In some cases these might be identical, but in most cases the Dataset represents a more abstract concept, while the distribution can point to a specific file. | String | 0..1 | 2 | 3 | general | Public | PI, Organization, Technical service provider | 3 | Funder and CSC needs this information | ||
data_collected | Summarise data collected for this project | String | 0..n | 2 | 3 | general | Public | Funder | 5 | |||
data_produced | Summarise data produced as an outcome of the project | String | 0..n | 2 | 3 | general | Public | Funder | 5 | |||
data_users | With whom will the data be shared during the project? Allowed values:
| Term from Controlled Vocabulary | 0..n | 2 | 3 | National | Closed | PI, Organization, Technical service provider | 3 | Refers to the technical solutions, will a DPA be needed? Is joint controller agreement, NDAs etc. already elsewhere? Or does this refer to the consortium projects? | ||
shareage_solution | How the data will be shared during the project? Define technical solutions planned to be used? | Term from Controlled Vocabulary Choose from Service catalog | 1..n | 2 | 3 | National | Closed | PI, Organization, Technical service provider | 3 | |||
version_mgmt | How the data versions are managed? | String | 1 | 2 | 3 | National | Closed | PI, Organization, Technical service provider | 3 | Mandatory for large data intensive projects (At CSC >50 TB) | ||
data_retention | How data retentions are managed? | String | 1 | 2 | 3 | National | Closed | PI, Organization, Technical service provider | 3 | Mandatory for large data intensive projects (At CSC >50 TB) Data retention plan is needed for managing the size of the project | ||
exit_plan | What is the exit plan from computational and storage services in the end of the project? | String | 1 | 2 | 3 | National | Closed | PI, Organization, Technical service provider | 3 | Exit plan is needed to ensure that research data with value for re-use is saved within the available resources | ||
backup | How data will be backed up during the project? To be planned by the researcher or organization specific solutions? | String | 1 | 2 | 3 | Organizationl | Closed | PI, Organization, Technical service provider | 3 | Utilisation of prefilled information derived from backup of services used | ||
closure_justification | If the project does not collect or produce any data fully or partially suitable for reuse, justify why the data cannot be made available even partially. | String | 1 | 2 | 3 | National | Closed | Funder, PI, Organization, | 3 | This is mandatory if data is closed. Should there be dataset level field for dataset publication (open / closed) ? | ||
open_location | Where will the data be opened? | String | 1 | 2 | Special requirements for data repositories for preliminary data? | National | Public | Funder, PI, Organization, | 3 | FSD comments: It is essential for the repository/archive to know (in the case of research projects that have received a positive funding decision) what kind of data are planned to be opened in the repository/archive and by whom. Covered under distribution maybe? This field responds also to requirement of National DMP template on: where the data or a publishable portion of them will be made available after the end of the project | ||
storage_location | Where will the data be stored during the project? | URN from CSC Service Catalogue & list presented by organization, if something else, what? | 1..n | 2 | 3 | National | Public/Closed | PI, Organization, Technical service provider | 3 | Relates to a dataset, extra-important if data subject to the Act on the Secondary Use of Data Add to general data life-cycle Specify by data set if needed | ||
storage_length | How long the data is stored for the original research purpose. Give the time estimate in years | Number | 1 | 2 | 3 | National | Public | PI, Organization, Funder, Technical service provider | 3 | Example: "5 years" Relates to dataset, original purpose | ||
deletion | How is data deleted/destroyed? | String | 1 | 2 | 3 | National | Public | Funder, Technical service provider | 3 | Could be specified that this relates to unpublished data. Or data that are mentioned to be shared e.g. for 5 or 10 years, etc. | ||
deletion_no | If data will not be deleted in the end of the project from active storage, give an explanation as to why. | String | 0..1 | 2 | 3 | Organizational | Public | PI, Funder, Technical service provider | ||||
deletion_date | When is data deleted/destroyed? Encoded using the relevant ISO 8601 Date and Time compliant string | DateTime | 0..1 | 2 | 3 | Organizational | Public | PI, Funder, Technical service provider | 3 | Could be specified that this relates to unpublished data. | ||
deletion_plannedtiming | If date cannot be given, then description of the planned deletion stage and approximate timing | String | 0..1 | 2 | 3 | Organizational | Public | PI, IT Services, Technical service provider | 3 | |||
archiving_services | Are archiving services or long term preservation for data needed? Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 | National | Closed | PI, Organization, Technical service provider | 4 | Relates to data set, and how to determine the value of data? Is this long-term storage, e.g. 20 in Zenodo, archiving in institutional archive or something else? | ||
archiving_date | When to archive? Encoded using the relevant ISO 8601 Date and Time compliant string | DateTime | 0..1 | 2 | 3 | National | Public | IT Services, Technical service provider | 4 | Active data can be deleted and archived at the same time | ||
archiving_location | Where to archive? Allowed values from: CSC Service Catalogue & organization's own archiving services | Term from Controlled Vocabulary | 0..1 | 2 | 3 | National | Public | IT Services, Technical service provider | 4 | |||
Technical resource | ||||||||||||
#_Nested Data Structure if many technical resources are used from different providers. IDs relate to user id of technical service providers. | ||||||||||||
name | Name a resource applied to a dataset | String | 1 | 1 | 3 | general | closed | PI & IT service | 5 | |||
description | To list all technical resources needed. Describe a technical resource (e.g. tools or software) required for any stage of a dataset lifecycle (e.g. microscopes, sensors, Jupyter Notebook, Galaxy workflows, measuring devices) | String | 0..1 | 1 | 3 / 2 (from organisational or national list) | general | closed | PI & IT service | 5 | These sound like reports compiled based on DMP. So if you have confidential data, DMP compiles list of local services that CAN be used. | ||
user_id | User id of technical resource | String | 0..n | 2 | 3 | general | closed | PI & Technical resource provider | 5 | MyCSC user id | ||
reuse | Is previously collected data reused in this project Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 | general | Public | Funder | 5 | Relates to one data set (does it? Or does it relate to whole research in this meaning) Reuse of data is also information funders require. Also important here is the terms of use to the data. If you're using data that is already published and is based on other data? So is this property of dataset or is this property of research? | ||
source | Data source | String | 1 | 2 | 2 / 3 | general | Public | Funder | 5 | Example "pid" Relates to one data set, can include ready-made options, but also an open text field Referencing can be really confusing. You can use data obtained from Twitter. Or dataset that somebody else compiled from Twitter... What do you reference here? Or do you make derivate dataset based on already existing dataset that is compiled from twitter? | ||
estimate_datasize | Give a rough estimate of the size of the data produced/collected in TBs | Number | 1 | 2 | 3 | general | Public | Technical resource provider | 5 | Estimate for the resources applied for the project | ||
data_resource_estimate | Project data magnitude for resources required to analyse and store the data | Number | 1 | 2 | 3 | general | Public | Technical resource provider | 5 | Estimate for the resources applied for the project | ||
application_process | What applications are used to process data? Allowed values from: Controlled list CSC Service Catalogue & organization services | Term from Controlled Vocabulary | 1..n | 2 | 3 | General, National & Organizational | Closed | PI, Organization, Technical service provider | 3 | Affects the choice of storage environment (e.g. whether the video is only available for viewing or whether it needs to be available at the file level in an analysis program) | ||
computing_environments | Which computing environments are needed for research? Allowed values from: Controlled list CSC Service Catalogue & organization services | Term from Controlled Vocabulary | 1..n | 2 | 3 | National & Organizational | Closed | PI, Organization, Technical service provider | 3 | Relates to data set | ||
computing_capacity_CPU | How much core hours for computing capacity is required in CPU? | Number | 1 | 2 | 3 | general | Closed | PI, Organization, Technical service provider | 4 | Estimated value | ||
computing_capacity_GPU | How much core hours for computing capacity is required in GPU? | Number | 1 | 2 | 3 | general | Closed | PI, Organization, Technical service provider | 4 | Estimated value | ||
properties of user_id | 2 | |||||||||||
identifier | Identifier for a user of technical resources | String | 1 | 2 | 3 | general | closed | PI & Technical resource provider | CSC project | |||
type | Identifier type defined by technical resource provider | String | 0..1 | 2 | 3 | general | closed | PI & Technical resource provider | CSC project | |||
project_id | ||||||||||||
identifier | Unique project established for use of technical resource | String | 0..n | 2 | 3 | general | closed | PI & Technical resource provider | CSC project | |||
type | Type defined by technical resource provider for project granted resources | String | 0..n | 2 | 3 | general | closed | PI & Technical resource provider | CSC project | |||
Distribution | ||||||||||||
access_url | A URL of the resource that gives access to a distribution of the dataset. e.g. landing page. | URL | 0..1 | 1 | 3 | general | Public | PI | In case of DMP you should use these to describe active use of the data. Others should be in life-cycle. | |||
title_dataset | Title is a property in both Dataset and Distribution, in compliance with W3C DCAT. In some cases these might be identical, but in most cases the Dataset represents a more abstract concept, while the distribution can point to a specific file. | String | 1 | 1 | 3 | general | Public | PI | ||||
available_until_url | Indicates how long this distribution will be/ should be available. Encoded using the relevant ISO 8601 Date and Time compliant string | DateTime | 0..1 | 1 | 3 | general | Public | PI | ||||
byte_size | Estimated byte size :
| Term From Controlled Vocabulary | 1 | 1 | 3 | general | Public | PI, Data controller, Technical resource provider | E.g. Important as it affects what all tools are available. Number or Size Category: S, M, L, XL, XXL | |||
data_access | Indicates access mode for data and data sharing. Allowed Values:
| Term from Controlled Vocabulary | 1 | 1 | 3 | general | Public | PI, Data controller, Technical resource provider | Example: "Open" This can change during the study. First I use it 3 years as closed, then I open it. Should here be what I want to do after the active use or what happens right now? → Should be the current publication status of the distribution. Dataset lifecycle documents the plan for the dataset. | |||
description | Description is a property in both Dataset and Distribution, in compliance with W3C DCAT. In some cases these might be identical, but in most cases the Dataset represents a more abstract concept, while the distribution can point to a specific file. | String | 1 | 1 | 3 | general | Public | PI, Data controller, Technical resource provider | ||||
download_url | The URL of the downloadable file in a given format. E.g. CSV file or RDF file. | URL | 0..1 | 1 | 3 | general | Public | PI, Data controller | ||||
format | Format according to: https://www.iana.org/assignments/media-types/media-types.xhtml if appropriate, otherwise use the common name for this format | String | 0..n | 1 | 3 | general | Public | PI, Data controller | ||||
preservation_statement | Preservation Statement | String | 1 | 3 | general | Public | PI, Organization | 3 | ||||
license | List all licenses applied to a specific distribution of data. | Nested data structure | 0..n | 1 | 3 | general | Public | PI, Data controller | Comment: This could be at dataset level → If not distribution specific | |||
Host | ||||||||||||
host | To provide information on quality of service provided by infrastructure (e.g. repository) where data is stored. Service URN | Nested data structure | 0..1 | 1 | 3 | General | Public | Everyone | Question: Outside the own organization? Same as location_open_data in lifecycle | |||
availability | Availability | String | 0..1 | 1 | 3 | General | Public | Everyone | 99,5 | |||
backup_frequency | Backup Frequency | String | 0..1 | 1 | 3 | General | Public | Everyone | weekly | |||
backup_type | Backup Type | String | 0..1 | 1 | 3 | General | Public | Everyone | tapes | |||
certified_with | Repository certified to a recognised standard Allowed Values:
| Term from Controlled Vocabulary | 0..1 | 1 | 3 | General | Public | Everyone | coretrustseal | |||
description | Description | String | 0..1 | 1 | 3 | General | Public | Everyone | Repository hosted by... | |||
geo_location | Physical location of the data expressed using ISO 3166-1 country code. | Term from Controlled Vocabulary | 0..1 | 1 | 3 | General | Public | Everyone | AT | |||
pid_system | PID System Allowed Values:
| Term from Controlled Vocabulary | 0..n | 1 | 3 | General | Public | Everyone | doi | |||
storage_type | The type of storage required | String | 0..1 | 1 | 3 | General | Public | Everyone | LTO-8 tape | |||
support_versioning | Allowed Values:
| Term from Controlled Vocabulary | 0..1 | 1 | 3 | General | Public | Everyone | yes | |||
title | Title | String | 1 | 1 | 3 | General | Public | Everyone | Super Repository | |||
url | The URL of the system hosting a distribution of a dataset | URI | 1 | 1 | 3 | General | Public | Everyone | https://www.fairdata.fi/en/ida/ | |||
Security and privacy | ||||||||||||
id | ID of risk assessment | URI/PID | 0..1 | 2 | 2 | EU/national security levels (restricted, confidential, secret, top secret | Closed | Organization & PI | 3 | |||
title | Title of security measures | String | 1 | 1 | 3 | general | Closed | Organization & PI | 3 | Example: "Physical access control" | ||
description | Description of security and privacy measures | String | 0..1 | 1 | 3 | general | Closed | Organization & PI | 3 | Example: "Server with data must be kept in a locked room" | ||
security_privacy | To list all issues and requirements related to security and privacy | String | 0..1 | 1 | 3 (from organisational list) | EU/national security levels (restricted, confidential, secret, top secret | Closed | Organization & PI | 3 | These sound like reports compiled based on DMP | ||
protection_level_data | What is the required level of data protection? | Term from Controlled Vocabulary | 0..1 | 2 | 3 | EU/national security levels (restricted, confidential, secret, top secret | Closed | Organization & PI, Technical resource provider | 3 | Relates to Data Protection Level of the data set: Controlled List (from Service catalog) | ||
confidentiality | Does the data contain confidential information Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | DMP, 3 | EU/national security levels (restricted, confidential, secret, top secret) | Closed | Organisation, & PI, Open science services, Technical resource provider | 3 | May contain a "yes" condition, after which it is indicated which datasets this relates to. Confidential, business secrets, sensitive geospatial data, sensitive biodiversity data, national security, trade secrets. Dataset-specific. Comment: a joined classification for security levels. EU security levels here as an example. | ||
personal_data | Does the research handle personal data for research purposes (in any of the datasets used)?
Documented | Term from Controlled Vocabulary | 1 | 2 | 1 (Derived from Dataset section) | EU/national security levels (restricted, confidential, secret, top secret) | Closed | Organization & PI | 3 | This is in RDA standard dataset specific attribute. This could be derived variable from dataset specific questions to project level. If information is needed as well at project level. | ||
personal_data_list | What personal data do you process | String | 1 | 2 | 3 | EU/national security levels (restricted, confidential, secret, top secret) | Closed | Organization & PI, Data protection team | 3 | Controlled list, e.g. MyCSC or TAU example | ||
privacy-notice | Is there a need for a privacy notice? Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 | EU/national security levels (restricted, confidential, secret, top secret) | Closed | Data protection team | 3 | Date & title for privacy notice, data transfer agreements | ||
dpia | Should DPIA be done? Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | Requirement comes from the law Voluntary question | EU/national security levels (restricted, confidential, secret, top secret) | Closed | Organization | 3 | Good question, should this be in the DMP at all. In this context, it is also possible to make a real assessment of whether it is being done. However, privacy information should be structured and compatible so that you can ask for it here if you wish. Comment: pre-DPIA usually executed to see if a full DPIA is necessary. | ||
toms | Links to Technical & organisational measures | URI | 0..1 | 2 | 1/3 | EU / national | Public | Everyone | 4 |
| ||
toms_description | Describe project specific toms measures in the project | String / Term from Controlled Vocabulary | 0..1 | 2 | 3 | EU / national | Public | Everyone | 4 |
| ||
data_nnnn | Do you plan any data transfers or access outside the EEA? | Value | 0..1 | 2 |
| |||||||
DPIA process | ||||||||||||
dpia_id | If DPIA exist give URI / DOI | URI | 0..1 | 2 | 3 | EU/national security levels (restricted, confidential, secret, top secret) | Closed | Organization | 1 | |||
privacy_notice_id | If privacy notice exist give link / archive number | String | 0..1 | 2 | 3 | EU/national security levels (restricted, confidential, secret, top secret) | Closed | PI, Organization | 1 | Use case maDMP could transfer the number / link of the privacy notice to data protection team when it is been done to indicate the status. | ||
pre_dpia | Has risk assessment been filled in? (risk assessment/pre-dpia, selftest if DPIA is needed)
| Term from Controlled Vocabulary | 0..1 | 2 | 3 | EU/national security levels (restricted, confidential, secret, top secret ) | Closed | Organization | 1 | |||
data_use_region | Will data be managed
| |||||||||||
IF data privacy notice exists / risk assessment these fields are not asked, but could be filled in automatically from privacy notice & risk assessment | ||||||||||||
personal_data_sp_category | What special categories of personal data do you process | String | 1 | 2 | Requirement comes from the law | Organizational/national security levels (restricted, confidential, secret, top secret ) | Closed | PI, Organization | 1 | Categories of special categories of personal data | ||
ethnic_origin | Do you process data of ethnic origin?
| Term from Controlled Vocabulary | 1 | 2 | 3 / 1 | Organizational/national security levels (restricted, confidential, secret, top secret ) | Closed | PI, Organization | 1 | |||
political_opinions | Do you process data of political opinions? Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 / 1 | Organizational/national security levels (restricted, confidential, secret, top secret ) | Closed | PI, Organization | 1 | |||
religion_philosophical beliefs | Do you process data of religion or philosophical beliefs? Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 / 1 | Organizational/national security levels (restricted, confidential, secret, top secret ) | Closed | PI, Organization | 1 | |||
trade_union_membership | Do you process data of trade_union_membership? Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 / 1 | Organizational/national security levels (restricted, confidential, secret, top secret ) | Closed | PI, Organization | 1 | |||
data_concerning_health | Do you process data of data_concerning health of individuals? Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 / 1 | Organizational/national security levels (restricted, confidential, secret, top secret ) | Closed | PI, Organization | 1 | |||
sexual_orientation_or_activity | Do you process data of sexual orientation or activity? Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 / 1 | Organizational/national security levels (restricted, confidential, secret, top secret ) | Closed | PI, Organization | 1 | |||
genetic_or_biometric_data | Do you process genetic or biometric data for identifying the persons? Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 / 1 | Organizational/national security levels (restricted, confidential, secret, top secret ) | Closed | PI, Organization | 1 | |||
other_sp_category | Describe the other special categories of data that you process in the research? | String | 0..1 | 2 | 3 / 1 | Organizational/national security levels (restricted, confidential, secret, top secret ) | Closed | PI, Organization | 1 | |||
data_prosessing_basis | Basis for data processing | String | 1 | 2 | 3 / 1 | National | Closed | PI, Organization | 1 | |||
data_prosessing_sp_category | Basis for processing special categories of personal data | String | 1 | 2 | 3 / 1 | National - (for enabling to provide optimal services) | Closed | PI, Organization, Service provider | 1 | |||
data_transfer_outside_EU | Whether personal data is transferred outside the EU Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 / 1 | National - (for enabling to provide optimal services) | Closed | PI, Organization, Service provider | 1 | |||
data_transfer_country | To which countries personal data is transferred | String | 0..n | 2 | 3 / 1 | National - (for enabling to provide optimal services) | Closed | PI, Organization, Service provider | 1 | |||
data_external_processors | Are there external processors Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 / 1 | Local (Responsibility for organization) | Closed | PI, Organization, Service provider | 1 | |||
personal_data_minimized | How is the processing of personal data minimized? | String | 1 | 2 | 3 / 1 | Local (Responsibility for organization) | Closed | PI, Organization, Service provider | 1 | Anonymization, pseudonymization, removal of direct identifiers..., dataset-specific? | ||
Note: Security and privacy has especially needed special attention to meet national context in Finland to benefit from ma-features | ||||||||||||
License | ||||||||||||
license_ref | Link to license document. | URI | 1 | 1 | 3 | general | Closed | PI, Organization | 4 | Dataset-specific - What kind of license is granted for the use of data https://creativecommons.org/licenses/by/4.0/ | ||
start_date | If date is set in the future, it indicates embargo period. Encoded using the relevant ISO 8601 Date and Time compliant string | Date | 1 | 1 | 3 | general | Closed | PI, Organization | 4 | |||
Metadata | ||||||||||||
| Metadata Standard ID | Nested data structure | 1 | 1 | 2 | general | Public | PI, Research team, Reuse | http://www.dublincore.org/specifications/dublin-core/dcmi-terms/ | |||
description | Description | String | 0..1 | 1 | 1 / 3 | general | Public | PI, Research team, Reuse | provides taxonomy for... | |||
language | Language of the metadata expressed using ISO 639-3 | Term from Controlled Vocabulary | 1 | 1 | 1 / 3 | general | Public | PI, Research team, Reuse | ||||
schema | Is the data built according to a specific schema? Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 | general | Public | PI, Research team, Reuse | Relates to dataset. Ideally, metadata from existing datasets could be imported directly from e.g. Zenodo. Also, metadata could be brought in for any datasets published in the project. Infoflow might be easiest this way around rather than from DMP API to repository. | |||
vocabulary_link | Which vocabularies are used? | Term from controlled vocabulory | 1 | 2 | 3 | general | Public | PI, Research team, Reuse | ||||
Term from controlled vocabulory | 0..n | 2 | 3 | general | Public | PI, Research team, Reuse | ||||||
location_doc | Where is the documentation? | String/URL | 1 | 2 | 3 | general | Public | PI, Research team, Reuse | ||||
generated | Is documentation generated automatically? Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 | general | Public | PI, Research team, Reuse | ||||
access | Can the the documentation be accessed? Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 | general | Public | PI, Research team, Reuse | ||||
publish_methodology | Where the methodology/workflow has been published | String /URL | 0..1 | 2 | 3 | general | Public | PI, Research team, Reuse | registration of research? | |||
workflow | Is the workflow described? Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 | general | Public | PI, Research team, Reuse | Especially important in the case of large datasets, from which the data itself cannot be preserved, but is produced again if necessary | |||
documentation | What does the documentation consist of? | String | 1 | 2 | 3 | general | Public | PI, Research team, Reuse | Workflow, variable description, … ? | |||
purpose | What is the basic purpose of metadata? | String | 0..1 | 2 | 3 | general | Public | PI, Research team, Reuse | Qvain, own CRIS, something else? | |||
open | Is the discovery metadata open? Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 3 | general | Public | PI, Research team, Reuse | ||||
location_metadata | Landing page of metadata | PID | 0..1 | |||||||||
properties of metadata_standard_id | ||||||||||||
identifier | Identifier for the metadata standard used. | String | 1 | 2 | 2 | http://www.dublincore.org/specifications/dublin-core/dcmi-terms/ | ||||||
type | Identifier type Allowed Values:
| Term from Controlled Vocabulary | 1 | 2 | 1 / 3 | general | Public | PI, Research team, Reuse |