This is the active version of the national maDMP reference model for comments to national stakeholders in Finland prepared in EOSC-INFRA OSTrails project.


Towards structural maDMP template - national metadata application profile


Target & Focus:

  • Our focus in OSTrails FI pilot has been to prepare a reference data model to be implemented further by research organisations, not a DMP template.
  • Thus we focus to the information required - not to the questions i.e. how the information should be asked.
  • Core of the reference data model is presented in columns A-D.
  • Additional information relating to the purpose of the fields are given in the columns F-K, and especially for the justification of national additions.
  • Target is that the model will be found useful by the Finnish research ecosystem to pursue development of machine actionability.
  • The suggested model has been designed in cooperation with Finnish universities, research organisations and the CSC.
  • Whilst respect the RDA maDMP standard, national suggestion to Finland has been made e.g. meeting requirements of the Research Council of Finland for DMPs link:

National consultation:

  • To ensure the usability of the model the Finnish research organisations, service providers, and the funders are consulted for their comments, further suggestions and use cases.
  • Please indicate what information would still be needed to provide or launch machine-actionability & which fields can be used automatically via digital object, or with AI e.g. extracting information from an existing source.

  • We are thankful for comments noting any concerns, typos or logical problems.
  • Link to ontologies, data spaces and repositories and other relevant sources of information when you notice gaps of usage of available auxiliary information.

  • Comment the purpose and user of the data elements.
  • Does your organization agree which national additions should be mandatory in addition to those that are mandatory in RDA standard.

Documentation:

  • All notions are document the work into the table below. 
  • For the consultation additional questionnaire for comments will be published in the Webinar 12 May.


Guidelines to the reference maDMP data model table:

Below is the structure of the RDA standard, and the elements from RDA standard are in the table below marked in GREEN and there is also a column "Is this in the  RDA template?" indicating that the data element is in RDA standard.

The elements derived from our national workshop consisting of questions to DMP are marked in BLUE

The sections are grouped according to the RDA standard:
DMP, Project, Contact, Contributor, Funding, Cost, Dataset, Distribution, Host, License, Security and Privacy, Technical Resource, Metadata

In addition, we have suggested additional elements to highlight their importance in the data model: Data lifecycle, DPIA and Ethics. Especially Data lifecycle contains elements that have been regarded important in our workshops.

Version 2025-05-12 in PDF.


Link for Questionnaire for further comments


Reference:
You can refer back to the RDA maDMP data model which is the core, but we can make suggestions for developing its machine-actionability. In addition, we can add relevant DMP fields to national context. But note that this is general data model - not containing scientific discipline specific information.
https://github.com/RDA-DMP-Common/RDA-DMP-Common-Standard

OSTrails Plan-Track-Assess Pathways:https://zenodo.org/records/13145788

For use cases you can refer to:
Marttila, J., Manninen, S., Ahokas, M., Hindersson-Söderholm, T., Keckman-Koivuniemi. H. (2022). Dynaamiset DMP:t -työryhmän loppuraportti. https://zenodo.org/records/6601258
Marttila, J. & Manninen, S. (2022). Dynaamiset DMP:t -työryhmän toivekartoitus. https://zenodo.org/records/6594597


How to read the colour codes in the table?
National suggested fields are in BLUE; RDA Standard fields in GREEN.


A. 
Name  

B. 
Description

C.
Type

D.
Cardinality

E. 
RDA maDMP standard
1=yes;

2=FI national data model



 H. 
Interoperability from data source 
1=automate;
2=DO;
3=manual

I.  
General for EU, EOSC, International / National / Organizational / Funder specific

J. 
Public /
Closed 

K. 
Who needs this field? 
(To be updated & completed by all national fields)

L. Phase 
1 = Planning

2 = Applied
3 = Funding 
4 = Permission
5 = Active 
6 = Update

7 = Final

M. 
Example value 
Comments

DMP













dmp_idIdentifier for the DMP itselfNested Data Structure11

1generalPublic

System for interoperability

3

Request id for DMP

Where does this originate from, especially if using different tools/systems for DMPs?

titleTitle of a DMPString11

3generalPublicUser 3Max 100 char 
cost

To list costs related to data management. Providing multiple instances of a 'Cost' allows to break down costs into details. Providing one 'Cost' instance allows to provide one aggregated sum. 
(Sum from costs given in cost section)

Nested Data Structure0..n1

2generalClosed
(see comment)
Funder2→ further development: In DMP template this could be linked to the budget with grant id. What costs included? Needs clear guidelines.
Closed early in the process, but depends if the DMP will be actively made public / "published" at some point during the data life-cycle.
createdDate and time of first version of a DMP
Encoded using the relevant ISO 8601 Date and Time compliant string

DateTime


11

1 (system)generalPublicOrganization

2025-05-28  

System recorded

modifiedMust be set each time DMP is modified. Indicates DMP version.
Encoded using the relevant ISO 8601 Date and Time compliant string
DateTime11

1 (system)generalPublicOrganization

2025-05-28  


System recorded
datasetTo describe data on a general level. Describe how datasets used can be categorised.Nested Data Structure1..n1

3generalPublic Organization, PI3At least one dataset should be defined. See "Dataset" in the table.
project_idUnique project identifier related to DMPNested Data Structure0..n2

3generalPublicEveryone3
dataset_idUnique dataset identifier related to DMPNested Data Structure1..n








data_lifecycleDescribe at general level data lifecycle at a general level, and how open science criteria will be applied.String0..12

3generalPublic Organization, PI1Data will be shared in active phase using Allas, after the project data will be shared via Fairdata IDA, and data paper will be published. The aim is to support reuse of the data.
typeA description on what kind of DMP to do

Term from Controlled vocabulary


12

3Organisational / national / international PublicSystem for appriopriate DMP template1

Type of DMP:

Student, 

Academic organization own template, 

Academic national template,

National generic,

EU Horizon,

RDA / International,


Input formula should be later updated or extended to a richer format.     

Input profiles: for example: (Define national typology for recommended use of DMPs (light, detailed), key issues personal data, confidentiality of information, resource intensity, number of actors (outsiders))

nextreviewNext review date to update DMP
Encoded using the relevant ISO 8601 Date and Time compliant string
Date0..12

2 / 3Organisational / Funder specificPublic Organization, PI3Research project benefits of timing the update of DMP, and Data Support can better plan the assistance. Suggested to be added for making dmp alive and updated e.g. for reporting purposes

Rights & ethics - IN RDA Standard Ethical issues are part of DMP domain - FI pilot considers grounds for separate category or merge with license & add user rights:RIghts, ethics & license

ethical_issues_exist

To indicate whether there are ethical issues related to data that this DMP describes.

Allowed Values:

  • yes
  • no
  • unknown
Term from Controlled vocabulary11

3generalClosedPI, Organization, Service provider1

This is an important trigger because then the DMP must be very good

Allowed Values:

  • yes
  • no
  • unknown
ethical_issues_reportTo indicate where a report/document that details all identified ethical issues (might be for example emit from a meeting with an ethical committee)

URL


0..11

3generalClosedPI, Organization, Service provider1

Add link

Comment: Date when the decision was made

ethical_issues_descriptionTo describe considerations that require compliance with laws and regulations (e.g. GDPR, animal welfare) due to the involvement of humans, animals, or sensitive information. This includes ensuring informed consent from participants, protecting privacy and confidentiality, and adhering to applicable legal and ethical standards throughout the research.String0..11

3nationalClosedPI, Organization, Service provider1
research_permitRights related to data: Whether permission is required to collect data in research data setTerm from Controlled vocabulary12

3nationalClosedPI, Organization1

Actual research permit


ownership_data_right_personWho owns the data/rights related to the data? Give ORCID, if available otherwise give name surname first nameString0..12

3nationalClosedPI, Organization3Person or organization? Dataset-specific? The organisation can be a research organisation, a customer organisation or an organisation that otherwise only owns the data (e.g. an archive)
ownership_data_right_organizationWhich organization owns the data/ rights related to the data? Give ROR if available, otherwise name of the official name of the organization as given in their websiteString 12

3nationalPublicPI, Organization, Reuse1ROR - add source list here
ipr_copyright Is there IPR or copyright issues in research described in a DMPTerm from Controlled Vocabulary0..12

3OrganizationalClosedPI, Organization3yes, no, unknown
agreements_data_rightWhat agreements are needed with other organisations and people related to the rights to the material? Give both the type and name of the agreement. (DMP)String0..n2

3OrganizationalClosedPI, Organization3Data right agreement with data provider, e.g. with Findata. 
Agreement for utilising technical devices, and  external research laboratory.
agreementsWhat other agreements are needed? (DMP)String0..n2

3OrganizationalClosedPI, Organization3Disclosure agreement with project partners

properties of dmp_id













identifierIdentifier for a DMPString11

3generalsee commentsEveryone3

For some research DMP may have to be closed by a justified reason, otherwise public
typeIdentifier type

Term from Controlled Vocabulary11

3generalsee commentsEveryone3

doi

For some research DMP may have to be closed by a justified reason, otherwise public

Allowed Values:

  • handle
  • doi
  • ark
  • url
  • other

Project













idProject identifierNested Data Structure11

2generalPublicEveryone4Compare also with RAiD: https://raid.org/
titleName/Title of the projectString11

3generalPublicEveryone1If project information is not yet available anywhere, how much should be produced here? Is it possible to have multiple DMPs for one project or a maDMP without a funder or project?
startProject start date
Encoded using the relevant ISO 8601 Date and Time compliant string

Date


11

3 (Can trigger update process e.g. after 3-6 months after start)generalPublicEveryone1

2026-01-01

Encoded using relevant ISO Date and time compliant string 

endProject end date
Encoded using the relevant ISO 8601 Date and Time compliant string

Date


0..11

3 (Can trigger update process & reporting stage)generalPublicEveryone1

2028-12-31

Encoded using relevant ISO Date and time compliant string If DMP is used for continuous process no end date is required, but this needs to be specified in description. Alternatively end date can be used to the end of funding period of long-term-plans.

descriptionProject short description String11

1 (project_id links to long description) otherwise 3generalPublicEveryone1

Short description e.g. max char 2000; include link to project plan if needing (project id field links to the longer description to project master data)

Example:

This project aims to analyze the impact of urbanization on local biodiversity by collecting and assessing environmental data from multiple urban centers. Using remote sensing, field observations, and statistical modeling, the study will identify key factors influencing species diversity and habitat loss. The findings will support sustainable urban planning initiatives and inform conservation strategies.

fundingFunding related with a projectNested Data Structure0..n1

2 (Derived from Funding status & Grant_id)generalPublicEveryone3Public after publishing the grant.
disciplineScientific discipline of project

Term from Controlled Vocabulary




0..n2

2 / 3generalPublicEveryone1

3 if need to be added by researcher
2 if Analytics / AI can be used to suggest based on ORCID, Project_ID or Description to identify UNESCO science classification. Keywords and freeword allow mapping to ontologies and hence smart searches (whereas controlled vocabularies and taxonomies tend force users to use whatever is close if there is no appropriate term available) 

UNESCO science classification

pore-in via main categories

Contact













contact_idIdentifier for contact 

String

11

1generalPublicEveryone 1ORCID of Contact person for a DMP / Principal (responsible) researcher

mboxE-mail addressString11

generalPublicEveryone 1from orcid, if possible or manual
firstnames

First names of the contact person / principal researcher;

String11 (single field name)

2 (from ORCID) / 3generalPublicEveryone 1

from orcid or manual

Note: In RDA this is not separated into first name and last name; In Finnish data model this is separated

lastname

Last name of the contact person / principal researcher;

String11 (single field name)

2 (from ORCID) / 3generalPublicEveryone 1from orcid or manual

Note: In RDA this is not separated into first name and last name; In Finnish data model this is separated

organization

Organization of contact

String12

2 (from ORCID/ROR) / or 3 generalPublicEveryone 2If ROR exists this can be derived from ROR

ROR

ROR of organization of contact

String

ROR

12

3generalPublicEveryone1This has its own attributes (ROR) 

properties in contact_id













identifier

To indicate the specific value of an identifier for a contact

String11







type

Identifier type
Allowed Values:

  • orcid
  • isni
  • openid
  • other
Term from Controlled Vocabulary
1







Contributor 













#_Nested Data Structure used if there are many contributors (and data controllers) this information will requested from all of them











contributor_id

Contributor id e.g. ORCID


Nested data structure1..n1

2: Digital authentication e.g. by e-mail Contributor will add their ORCID or from Funding application 
3: Has risk of errors for ORCID
generalPublicEveryone 2

Needs to be defined - or where could be derived? From funding decision?

mbox

E-mail addressString0..n1

2 / 3 (depending if person has allowed sharing)generalPublicEveryone 2
firstname

First name of the contact person / principal researcher;

String11 (single field name)

2 (from ORCID) / 3generalPublicEveryone 1from orcid or manual
In RDA this is not separated into first name and last name - Do we need the separate fields in Finland?
lastname

Last name of the contact person / principal researcher;

String11 (single field name)

2 (from ORCID) / 3generalPublicEveryone 1from orcid or manual

In RDA this is not separated into first name and last name - Do we need the separate fields in Finland?

role

Role of the contributor:

Allowed values: 

  • Access controller,
  • Data controller,
  • Principle investigator,
  • Work package leader,
  • Author of data set,
  • Other
Term from Controlled Vocabulary1..n1

2 / 3 generalPublicEveryone

Data controller is required for research data services 

Use case for AI search from funding proposal by roles

organization

Organization of contributing researcher

String0..12

2 (from ROR) / or 3 generalPublicEveryone2If ROR exists this can be derived from ROR

ROR

ROR of organization of contributing researcher

String

ROR

0..12

3generalPublicEveryone1This has its own attributes (ROR) 

properties in contributor_id













identifier

Term from Controlled Vocabulary

String11

3generalPublicEveryone2orcid

type

Identifier type
Allowed Values:

  • orcid
  • isni
  • openid
  • other
Term from Controlled Vocabulary
1

3generalPublicEveryone2

Cost













 # list all cost object categories











currency_code

Currency of costs

Allowed values defined by ISO 4217. 
Note: Default is EUR or could this be linked to Funder_Id? 

Term from Controlled Vocabulory0..11

3 / 2 (from grant_id)generalClosed/PublicOrganization2"978" for eur 
description_costDescription of costs
Note: Could this be linked to Grant ID for description of applied/granted budget?
String0..11

3 / 2 (from grant_id / application)generalClosed/PublicOrganization2from Grant id when funded
title_costTitle of costs
Note: Could this be linked to Grant ID for title of applied/granted budget?
String11

3 / 2 (from grant_id / application)generalClosed/PublicOrganization2from Grant id when funded
value_costValue of costs
Note1: Could this be linked to Grant ID for applied/granted budget?
Note2: Link with DMP / cost_dmp
Number0..11

3 / 2 (from grant_id / application)generalClosed/PublicOrganization2from Grant id when funded

Funding













#_Nested Data Structure if many funding sources for a large research program unless defined that DMP relates to single grant decision
funder_idFunder ID of the associated project, ROR if available String11

2: ROR API via search option
3
generalPublicSystem1Registry number of associated project Y-tunnus / Business ID
Nested structure used if there are many of these. Field is empty if none
funding_status

To express different phases of project lifecycle.
Allowed Values:

  • planned
  • applied
  • granted
  • rejected
Term from Controlled Vocabulory0..11

3  generalPublicEveryone1-5

from Funding id

maDMP use case: automatically derived information from grant ID the project is applied/granted

grant_idGrant ID of the associated projectNested data structure0..11

2 if DOI (not currently)
3
generalPublicEveryone3654321
properties of funding_id











identifierFunder ID, recommended to use CrossRef Funder Registry. See: https://www.crossref.org/services/funder-registry/String11







typeIdentifier type
Allowed Values:
  • fundref
  • url
  • other
Term from Controlled Vocabulary11







funderName of the funding organization, official name of the funder as given in their registry or their website String12 / 1

2generalClosed until FundedEveryone1Research Council of Finland
submission_dlDeadline for funding submission
Encoded using the relevant ISO 8601 Date and Time compliant string
Date

1

2



2: select funding

3


Closed until FundedPI, Research group12026-08-31
decision_expectedExpected date for funding decision
Encoded using the relevant ISO 8601 Date and Time compliant string
Date

1

2



2: select funding 

3


Closed until FundedPI, Research group12026-06-12
properties of grant_id











identifierGrant IDString11







typeIdentifier type
Allowed values:
  • url
  • other
Term from Controlled Vocabulary11







startFunding (Project) start
Encoded using the relevant ISO 8601 Date and Time compliant string

Date

12

2generalPublicEveryone3

2027-01-01

Used if funding period is different from project.start date

endFunding (Project) end
Encoded using the relevant ISO 8601 Date and Time compliant string

Date

12

2generalPublicEveryone3

2028-12-31

Used if funding period is different from project.end date

Dataset













#_Nested Data Structure if many datasets are used. Relationships to 1..* datasets are defined at DMP level. DMP has "dataset" association that can relate to many datasets. Each data set can have multiple files/distributions.
idDataset ID 
Preferred values:
DOI, PID, URN, URL, handle, ark, other digital ID
String11

2generalPublicRPO (reseach performing organization), Repositories, Data catalogues3Dataset may not exist when DMP is defined. DMP tool should provide temporary ID before dataset gets PID by some way.
titleData set title / name
Title is a property in both Dataset and Distribution, in compliance with W3C DCAT. In some cases these might be identical, but in most cases the Dataset represents a more abstract concept, while the distribution can point to a specific file.
String11

1/3generalPublicSame as above, but humans need this instead of machines3

There can be many data sets, the information is related to one entity. A so-called metax entity, i.e. one must be able to express a wide variety of entities that then have attributes.

Example "Fast car images"

typeIf appropriate, type according to: DataCite and/or COAR dictionary. Otherwise use the common name for the type, e.g. raw data, software, survey, etc. https://schema.datacite.org/meta/kernel-4.1/doc/DataCite-MetadataKernel_v4.1.pdfhttp://vocabularies.coar-repositories.org/pubby/resource_type.html

Data set type (indication interview, questionnaire, photos, video, measurement, samples, simulation, code)

Controlled vocabulary



0..11

2 / 3generalPublicData Catalogues, Repositories, RPOs3

Associated with a single dataset, can include ready-made options, but also an open text field. What is the correct granularity level here? Resource intensity can affect the needs of the description. In general, it is instructed to describe so that the attribute applies to the entire dataset. By describing just one data set, it would be possible to create a so-called data set. light-DMP. This is an important option to keep.

Need here some sort of defined and shared vocabulary on "data set types".

RDA Commons points to DataCite and Coar, but neither feel enough by themselves. Should do national type list based on those, but enhanced to give perhaps subtypes.

String or Partly (Controlled vocabulary and "Other" option)

personal_data

Whether the dataset contains personal data 

Allowed Values:

  • yes
  • no
  • unknown
Term from Controlled Vocabulary11

3generalClosedData protection officers, RPOs (data protection/management experts), repository2

Associated with a single dataset, is this personal data the data of the data providers or of the target data? What is the role of individuals? Yes or No / Yes or No.

FI restriction: It is assumed that "Unknown" is not an option here after submission to Funder, and researcher must be able to judge whether data contains personal data or consult about it.

Type of personal data will be in its own section.

Can trigger automatic data protection processes.

sensitive_data

Whether there are legal restrictions that apply to using this data, e.g. military use, commercial restrictions, endangered species

Allowed Values:

  • yes
  • no
  • unknown


Term from Controlled Vocabulary11

3generalClosedData protection officers, RPOs (data protection/management experts), repository2

Related to the dataset, how can we ensure that this is not asked except when it is likely?
Yes or No / Yes or No or Unknown

Dual use and import controls?

FI restriction: This should be yes/no after submission to Funder. 

In dataset we need to know if there is sensitive/confidential information or not. That triggers then more questions in security & privacy section.

description

Description of dataset

Description is a property in both Dataset and Distribution, in compliance with W3C DCAT. In some cases these might be identical, but in most cases the Dataset represents a more abstract concept, while the distribution can point to a specific file.

String 11  3generalPublicRepository, Data catalogues, CRIS 3

Needs some kind of guidance on what level of description is needed. Need for space limitatation?

We already have name for the dataset. How much more description we want/need at this point?

We should not ask these in DMP. They are about publishind and metadata. If somebody wants to combine DMP and CRIS, this information needs to be interoperable, but this is NOT part of DMP. Same holds true for all red rows.

distributionTechnical information on a specific instance of dataset

Nested Data Structure


0..n1

3generalPublicRepository, Data catalogues, CRIS  3


This might need more clarification, as it relates to resources/infra needed.

issuedDate of dataset been issued
Encoded using the relevant ISO 8601 Date and Time compliant string
Date0..11

1 / 3generalPublicEveryone 3
keywordKeywordString0..n1

1 / 3generalPublicEveryone 3


Should be asked only when data is opened/catalogued.

Terms from controlled vocabulory

languageLanguage of the dataset expressed using ISO 639-3Term from Controlled Vocabulary0..n1

1 / 3generalPublicEveryone 3
metadataTo describe metadata standards usedNested Data Structure0..n1

1 / 3generalPublicEveryone3
data_quality_assuranceTo describe any quality assurance processes applied to a dataset, such as, to ensure its accuracy, reliability, consistency, and usability for its intended purposes. This includes systematic practices, procedures, and policies designed to maintain high data quality throughout its lifecycle.String0..n1

3generalPublicFunder3

We calibrate measuring equipment daily, run repeat samples to monitor consistency in measurements and results, and cross-check collected data with at least two colleagues for accuracy.

method_quality_assuranceMethod describing how the quality assurance has been conductedTerm from Controlled Vocabulary12

3generalPublicFunder3

Example: TAU list as an example.

There is a need to develop a list related to disciplines

category (CHECK)Describe categories of datasets if multiple and of different types

Term from Controlled Vocabulary



0..n2

3generalPublicOrganization, PI3

Categories need to be defined

Controlled vocabulory by Scientific field

formatDescription of used dataset formats during the active research. For example database, csv, xml, json.

(Format of the dataset to be used. - Format of the datasets to be published / distributed after project is different)
Term from Controlled Vocabulary0..12

2GeneralPublicOrganization, PI, Research group3

Relates to one data set 

How does this relate to other outputs than datasets like code? Or code that is close related to data usability, e.g. link or PID?

Format vs. Type? What is the difference.

File format should be in distribution, not here.

data_sharing_issuesHow legal and ethical issues related to the sharing of data (e.g. ownership, copyright, sensitivity) will be resolvedString12

3National 

Public


Organization, PI3
data_sharing_contracts

Are contracts needed prior to sharing data?

Allowed Values:

  • yes
  • no
  • unknown
Term from Controlled Vocabulary12

3OrganizationalPublicOrganization, PI3
data_sharing_ownership

Is the ownership of data clear for data sharing?

Allowed Values:

  • yes
  • no
  • unknown
Term from Controlled Vocabulary12

3OrganizationalPublicOrganization, PI3
data_sharing_copyright

Are the copyright issues clear related to data sharing?
Allowed Values:

  • yes
  • no
  • unknown
Term from Controlled Vocabulary12

3OrganizationalPublicOrganization, PI3
data_sharing_sensitivity

Are possible issues related to sharing sensitive data cleared?

Allowed Values:

  • yes
  • no
  • unknown
Term from Controlled Vocabulary12

3OrganizationalPublicOrganization, PI3
data_sharing_otherDescribe any emerging other issues of data sharingString0..12

3OrganizationalPublicOrganization, PI3
data_landing_pageGive the link / PID to landing page of datalink / PID0..12

3GeneralPublicOrganization, PI3
properties of dataset_id











identifierIdentifier for a datasetString11

3GeneralPublic

https://hdl.handle.net/11353/10.923628
typeIdentifier type
Allowed Values:
  • pid
  • handle
  • doi
  • ark
  • url
  • other
Term from Controlled Vocabulary11

3GeneralPublic

pid

Dataset life cycle - (this is suggested extension to Dataset - requires re-specification for dataset level)

 

 

 

 

 

 

 

 

 


 

 description

Summarise description of all datasets created in project if many, and after the project at general level, and how they are managed.

Description is also a property in both Dataset and Distribution, in compliance with W3C DCAT. In some cases these might be identical, but in most cases the Dataset represents a more abstract concept, while the distribution can point to a specific file.

String0..12

3generalPublic PI, Organization, Technical service provider3

Funder and CSC needs this information

data_collectedSummarise data collected for this projectString0..n2

3generalPublicFunder5


data_producedSummarise data produced as an outcome of the projectString0..n2

3generalPublicFunder5


data_users

With whom will the data be shared during the project?

Allowed values:

  • Open,
  • In DMP defined research consortium,
  • In home organization, 
  • To specified people
  • To other projects
  • To service providers
  • Complex structure

Term from Controlled Vocabulary


0..n2

3National  ClosedPI, Organization, Technical service provider3

Refers to the technical solutions, will a DPA be needed? Is joint controller agreement, NDAs etc. already elsewhere? Or does this refer to the consortium projects?


shareage_solution

How the data will be shared during the project? Define technical solutions planned to be used?


Term from Controlled Vocabulary

Choose from Service catalog

1..n2

3National  ClosedPI, Organization, Technical service provider3
version_mgmtHow the data versions are managed?String12

3National  ClosedPI, Organization, Technical service provider3Mandatory for large data intensive projects (At CSC >50 TB)
data_retentionHow data retentions are managed?String12

3National  ClosedPI, Organization, Technical service provider3Mandatory for large data intensive projects (At CSC >50 TB)
Data retention plan is needed for managing the size of the project
exit_planWhat is the exit plan from computational and storage services in the end of the project?String12

3National  ClosedPI, Organization, Technical service provider3Exit plan is needed to ensure that research data with value for re-use is saved within the available resources 
backupHow data will be backed up during the project? To be planned by the researcher or organization specific solutions?String12

3Organizationl  ClosedPI, Organization, Technical service provider3Utilisation of prefilled information derived from backup of services used
closure_justification

If the project does not collect or produce any data fully or partially suitable for reuse, justify why the data cannot be made available even partially.

String12

3NationalClosedFunder, PI, Organization,3This is mandatory if data is closed. Should there be dataset level field for dataset publication (open / closed) ?
open_location

Where will the data be opened? 

String 12  Special requirements for data repositories for preliminary data?National PublicFunder, PI, Organization,3

FSD comments: It is essential for the repository/archive to know (in the case of research projects that have received a positive funding decision) what kind of data are planned to be opened in the repository/archive and by whom.

Covered under distribution maybe?

This field responds also to requirement of National DMP template on: where the data or a publishable portion of them will be made available after the end of the project


storage_locationWhere will the data be stored during the project?URN from CSC Service Catalogue & list presented by organization, if something else, what?1..n2

3National  Public/ClosedPI, Organization, Technical service provider3

Relates to a dataset, extra-important if data subject to the Act on the Secondary Use of Data

Add to general data life-cycle

Specify by data set if needed

storage_lengthHow long the data is stored for the original research purpose. Give the time estimate in yearsNumber12

3NationalPublicPI, Organization, Funder, Technical service provider3

Example: "5 years"

Relates to dataset, original purpose

deletionHow is data deleted/destroyed?String 12

3NationalPublicFunder, Technical service provider3Could be specified that this relates to unpublished data. Or data that are mentioned to be shared e.g. for 5 or 10 years, etc.
deletion_noIf data will not be deleted in the end of the project from active storage, give an explanation as to why.String 0..12  3OrganizationalPublicPI, Funder, Technical service provider

deletion_dateWhen is data deleted/destroyed? 
Encoded using the relevant ISO 8601 Date and Time compliant string

DateTime


0..12  OrganizationalPublicPI, Funder, Technical service provider3Could be specified that this relates to unpublished data.
deletion_plannedtimingIf date cannot be given, then description of the planned deletion stage and approximate timing

String

0..12

3OrganizationalPublicPI, IT Services, Technical service provider3 
archiving_services

Are archiving services or long term preservation for data needed?

Allowed Values:

  • yes
  • no
  • unknown
Term from Controlled Vocabulary12

3NationalClosedPI, Organization, Technical service provider

4

Relates to data set, and how to determine the value of data?

Is this long-term storage, e.g. 20 in Zenodo, archiving in institutional archive or something else? 


archiving_dateWhen to archive?
Encoded using the relevant ISO 8601 Date and Time compliant string

DateTime


0..12

3NationalPublicIT Services, Technical service provider4Active data can be deleted and archived at the same time
archiving_location

Where to archive?

Allowed values from: 

CSC Service Catalogue &  organization's own archiving services

Term from Controlled Vocabulary

 
0..12

3NationalPublicIT Services, Technical service provider4

Technical resource













#_Nested Data Structure if many technical resources are used from different providers. IDs relate to user id of technical service providers.







nameName a resource applied to a datasetString11

generalclosedPI & IT service
descriptionTo list all technical resources needed. Describe a technical resource (e.g. tools or software) required for any stage of a dataset lifecycle (e.g. microscopes, sensors, Jupyter Notebook, Galaxy workflows, measuring devices) String0..11

3 / 2  (from organisational or national list)generalclosedPI & IT service5These sound like reports compiled based on DMP. So if you have confidential data, DMP compiles list of local services that CAN be used.
user_idUser id of technical resourceString0..n2

3generalclosedPI & Technical resource provider5

MyCSC user id

reuse

Is previously collected data reused in this project
(Whether the data is collected, created or comes from elsewhere)

Allowed Values:

  • yes
  • no
  • unknown
Term from Controlled Vocabulary12

3generalPublicFunder5

Relates to one data set (does it? Or does it relate to whole research in this meaning)

Reuse of data is also information funders require. Also important here is the terms of use to the data.

If you're using data that is already published and is based on other data? So is this property of dataset or is this property of research?

sourceData sourceString12

2 / 3generalPublicFunder5

Example "pid"

Relates to one data set, can include ready-made options, but also an open text field

Referencing can be really confusing. You can use data obtained from Twitter. Or dataset that somebody else compiled from Twitter... What do you reference here? Or do you make derivate dataset based on already existing dataset that is compiled from twitter? 

estimate_datasize

Give a rough estimate of the size of the data produced/collected in TBs

Number12

3generalPublicTechnical resource provider5

Estimate for the resources applied for the project

data_resource_estimateProject data magnitude for resources required to analyse and store the dataNumber12

3generalPublicTechnical resource provider5Estimate for the resources applied for the project
application_process

What applications are used to process data?

Allowed values from:

Controlled list CSC Service Catalogue &  organization services

Term from Controlled Vocabulary




1..n2

3General, National & OrganizationalClosedPI, Organization, Technical service provider3Affects the choice of storage environment (e.g. whether the video is only available for viewing or whether it needs to be available at the file level in an analysis program)
computing_environments

Which computing environments are needed for research?

Allowed values from:

Controlled list CSC Service Catalogue &  organization services

Term from Controlled Vocabulary



1..n2

3National & OrganizationalClosedPI, Organization, Technical service provider3Relates to data set
computing_capacity_CPUHow much core hours for computing capacity is required in CPU?Number12

3generalClosedPI, Organization, Technical service provider4Estimated value
computing_capacity_GPUHow much core hours for computing capacity is required in GPU?Number12

3generalClosedPI, Organization, Technical service provider4Estimated value
properties of user_id


2







identifierIdentifier for a user of technical resourcesString12

3generalclosedPI & Technical resource provider

CSC project

typeIdentifier type defined by technical resource provider
String0..12

3generalclosedPI & Technical resource provider

CSC project

project_id











identifierUnique project established for use of technical resourceString0..n2

3generalclosedPI & Technical resource provider

CSC project

typeType defined by technical resource provider for project granted resourcesString0..n2

3generalclosedPI & Technical resource provider

CSC project

Distribution

 

 

 

 

 

 

 

 

 

 


 

access_urlA URL of the resource that gives access to a distribution of the dataset. e.g. landing page.URL0..11

3generalPublicPI

In case of DMP you should use these to describe active use of the data. Others should be in life-cycle.


title_datasetTitle is a property in both Dataset and Distribution, in compliance with W3C DCAT. In some cases these might be identical, but in most cases the Dataset represents a more abstract concept, while the distribution can point to a specific file.String11

3generalPublicPI

available_until_url

Indicates how long this distribution will be/ should be available.
Encoded using the relevant ISO 8601 Date and Time compliant string

DateTime0..11

3generalPublicPI

byte_size

Estimated byte size :

  • S: < 10 TB, 
  • M: 10-50 TB ,
  • L: 50-100TB,
  • XL: 100-200 TB,
  • XXL: > 200 TB

Term From Controlled Vocabulary




11

3generalPublicPI, Data controller, Technical resource provider

E.g. 
S < 10 TB, 10 <= M < 50 TB, 50 < L <= 100 TB, 
100 < XL <= 200 TB, 200 < XXL

Important as it affects what all tools are available.

Number or  Size Category: S, M, L, XL, XXL

data_accessIndicates access mode for data and data sharing.
Allowed Values:
  • open
  • shared
  • closed
Term from Controlled Vocabulary11

3generalPublicPI, Data controller, Technical resource provider

Example: "Open"

This can change during the study. First I use it 3 years as closed, then I open it. Should here be what I want to do after the active use or what happens right now? → Should be the current publication status of the distribution. Dataset lifecycle documents the plan for the dataset.

descriptionDescription is a property in both Dataset and Distribution, in compliance with W3C DCAT. In some cases these might be identical, but in most cases the Dataset represents a more abstract concept, while the distribution can point to a specific file.String11

3generalPublicPI, Data controller, Technical resource provider

download_urlThe URL of the downloadable file in a given format. E.g. CSV file or RDF file.URL0..11

3generalPublicPI, Data controller

formatFormat according to: https://www.iana.org/assignments/media-types/media-types.xhtml if appropriate, otherwise use the common name for this formatString0..n1

3generalPublicPI, Data controller

preservation_statementPreservation StatementString
1

3generalPublicPI, Organization3
licenseList all licenses applied to a specific distribution of data.Nested data structure0..n1

3generalPublicPI, Data controller
Comment: This could be at dataset level → If not distribution specific

Host

 










hostTo provide information on quality of service provided by infrastructure (e.g. repository) where data is stored. Service URNNested data structure0..11

3GeneralPublicEveryone
Question: Outside the own organization? Same as location_open_data in lifecycle
availabilityAvailabilityString0..11

3GeneralPublicEveryone
99,5
backup_frequencyBackup FrequencyString0..11

3GeneralPublicEveryone
weekly
backup_typeBackup TypeString0..11

3GeneralPublicEveryone
tapes
certified_withRepository certified to a recognised standard
Allowed Values:
  • din31644
  • dini-zertifikat
  • dsa
  • iso16363
  • iso16919
  • trac
  • wds
  • coretrustseal
Term from Controlled Vocabulary0..11

3GeneralPublicEveryone
coretrustseal
description DescriptionString0..11

3GeneralPublicEveryone
Repository hosted by...
geo_locationPhysical location of the data expressed using ISO 3166-1 country code.Term from Controlled Vocabulary0..11

3GeneralPublicEveryone
AT
pid_systemPID System
Allowed Values:
  • ark
  • arxiv
  • bibcode
  • doi
  • ean13
  • eissn
  • handle
  • igsn
  • isbn
  • issn
  • istc
  • lissn
  • lsid
  • pmid
  • purl
  • upc
  • url
  • urn
  • other
 
Term from Controlled Vocabulary0..n1

3GeneralPublicEveryone
doi
storage_type The type of storage requiredString0..11

3GeneralPublicEveryone
LTO-8 tape
support_versioning Allowed Values:
  • yes
  • no
  • unknown
Term from Controlled Vocabulary0..11

3GeneralPublicEveryone
yes
title TitleString11

3GeneralPublicEveryone
Super Repository
url The URL of the system hosting a distribution of a datasetURI11

3GeneralPublicEveryone
https://www.fairdata.fi/en/ida/

Security and privacy

 
id

ID of risk assessment

URI/PID  0..12  2EU/national security levels (restricted, confidential, secret, top secret ClosedOrganization & PI3 
titleTitle of security measures String11

3generalClosedOrganization & PI 3Example: "Physical access control"
descriptionDescription of security and privacy measuresString0..11

3generalClosedOrganization & PI3Example: "Server with data must be kept in a locked room"
security_privacyTo list all issues and requirements related to security and privacyString0..11

3 (from organisational list)EU/national security levels (restricted, confidential, secret, top secret ClosedOrganization & PI3These sound like reports compiled based on DMP
protection_level_dataWhat is the required level of data protection?

Term from Controlled Vocabulary




0..12

3EU/national security levels (restricted, confidential, secret, top secret ClosedOrganization & PI, Technical resource provider3

Relates to Data Protection Level of the data set: 
Open access, Restricted access, Restricted access & controlled use; Restricted access & restricted use

Controlled List (from Service catalog)

confidentiality

Does the data contain confidential information
(EU definition , law (julkisuuslaki); agreements incl trade secrets - classification from governmental bodies)

Allowed Values:

  • yes
  • no
  • unknown

Term from Controlled Vocabulary

12

DMP, 3EU/national security levels (restricted, confidential, secret, top secret)ClosedOrganisation, & PI, Open science services, Technical resource provider3

May contain a "yes" condition, after which it is indicated which datasets this relates to. Confidential, business secrets, sensitive geospatial data, sensitive biodiversity data, national security, trade secrets. Dataset-specific.
Yes / No / Unknown

Comment:  a joined classification for security levels. EU security levels here as an example.

personal_data 

Does the research handle personal data for research purposes (in any of the datasets used)? 

  • If yes → Triggers Risk assessment process & DPIA process & Ethical review process
  • no

Documented 

Term from Controlled Vocabulary

12

1 (Derived from Dataset section)EU/national security levels (restricted, confidential, secret, top secret)ClosedOrganization & PI3This is in RDA standard dataset specific attribute. This could be derived variable from dataset specific questions to project level. If information is needed as well at project level.

personal_data_list

What personal data do you process String12

3EU/national security levels (restricted, confidential, secret, top secret)ClosedOrganization & PI, Data protection team3Controlled list, e.g. MyCSC or TAU example

privacy-notice

Is there a need for a privacy notice?

Allowed Values:

  • yes
  • no
Term from Controlled Vocabulary12   3EU/national security levels (restricted, confidential, secret, top secret)ClosedData protection team

3Date & title for privacy notice, data transfer agreements

dpia


Should DPIA be done?

Allowed Values:

  • yes
  • no

 

Term from Controlled Vocabulary12  Requirement comes from the law

Voluntary question
EU/national security levels (restricted, confidential, secret, top secret)ClosedOrganization3

Good question, should this be in the DMP at all. In this context, it is also possible to make a real assessment of whether it is being done. However, privacy information should be structured and compatible so that you can ask for it here if you wish. 

Comment: pre-DPIA usually executed to see if a full DPIA is necessary. 

toms

Links to Technical & organisational measures 

URI 0..1 2  1/3EU / nationalPublicEveryone4

 

toms_description

Describe project specific toms measures in the project

String / Term from Controlled Vocabulary 0..1 2   3EU / nationalPublicEveryone4

 

data_nnnn

Do you plan any data transfers or access outside the EEA?

Value0..12   



 

DPIA process 













dpia_id

If DPIA exist give URI / DOI

URI 0..12   3EU/national security levels (restricted, confidential, secret, top secret)Closed Organization1 

privacy_notice_id

If privacy notice exist give link / archive numberString 0..12   3EU/national security levels (restricted, confidential, secret, top secret)ClosedPI, Organization1Use case maDMP could transfer the number / link of the privacy notice to data protection team when it is been done to indicate the status. 
pre_dpia

Has risk assessment been filled in?

(risk assessment/pre-dpia, selftest if DPIA is needed)

  • yes
  • no
Term from Controlled Vocabulary0..12

3EU/national security levels (restricted, confidential, secret, top secret )

Closed


Organization1
data_use_region

Will data be managed

  • In Europe,
  • Outside Europe











IF data privacy notice exists / risk assessment these fields are not asked, but could be filled in automatically from privacy notice & risk assessment
    



 
personal_data_sp_categoryWhat special categories of personal data do you process String 12  Requirement comes from the lawOrganizational/national security levels (restricted, confidential, secret, top secret )ClosedPI, Organization1Categories of special categories of personal data

ethnic_origin

Do you process data of ethnic origin?
Allowed Values:

  • yes
  • no
  • unknown
Term from Controlled Vocabulary 12  3 / 1Organizational/national security levels (restricted, confidential, secret, top secret )ClosedPI, Organization1 

political_opinions

Do you process data of political opinions?
Allowed Values:
  • yes
  • no
  • unknown
Term from Controlled Vocabulary 12  3 / 1 Organizational/national security levels (restricted, confidential, secret, top secret )ClosedPI, Organization1 

religion_philosophical beliefs

 Do you process data of religion or philosophical beliefs?
Allowed Values:
  • yes
  • no
  • unknown
Term from Controlled Vocabulary 12  3 / 1Organizational/national security levels (restricted, confidential, secret, top secret )ClosedPI, Organization1 

trade_union_membership

Do you process data of  trade_union_membership?
Allowed Values:
  • yes
  • no
  • unknown
Term from Controlled Vocabulary 12  3 / 1Organizational/national security levels (restricted, confidential, secret, top secret )ClosedPI, Organization1 

data_concerning_health

Do you process data of  data_concerning health of individuals?
Allowed Values:
  • yes
  • no
  • unknown
Term from Controlled Vocabulary 12  3 / 1Organizational/national security levels (restricted, confidential, secret, top secret )ClosedPI, Organization1 

sexual_orientation_or_activity

Do you process data of  sexual orientation or activity?
Allowed Values:
  • yes
  • no
  • unknown
Term from Controlled Vocabulary 12  3 / 1Organizational/national security levels (restricted, confidential, secret, top secret )ClosedPI, Organization1 

genetic_or_biometric_data

Do you process genetic or biometric data for identifying the persons? 

Allowed Values:

  • yes
  • no
  • unknown
Term from Controlled Vocabulary12  3 / 1Organizational/national security levels (restricted, confidential, secret, top secret )ClosedPI, Organization1 

other_sp_category

Describe the other special categories of data that you process  in the research?String 0..12  3 / 1Organizational/national security levels (restricted, confidential, secret, top secret )ClosedPI, Organization1 
data_prosessing_basisBasis for data processingString12  3 / 1NationalClosedPI, Organization1
data_prosessing_sp_categoryBasis for processing special categories of personal dataString12  3 / 1National - (for enabling to provide optimal services)ClosedPI, Organization, Service provider1
data_transfer_outside_EU

Whether personal data is transferred outside the EU

 Allowed Values:

  • yes
  • no
  • unknown
Term from Controlled Vocabulary12  3 / 1National - (for enabling to provide optimal services)ClosedPI, Organization, Service provider1
data_transfer_countryTo which countries personal data is transferredString0..n2  3 / 1National - (for enabling to provide optimal services)ClosedPI, Organization, Service provider1
data_external_processors

Are there external processors

 Allowed Values:

  • yes
  • no
  • unknown
Term from Controlled Vocabulary12  3 / 1Local (Responsibility for organization)ClosedPI, Organization, Service provider1
personal_data_minimizedHow is the processing of personal data minimized?String12  3 / 1Local (Responsibility for organization)ClosedPI, Organization, Service provider1Anonymization, pseudonymization, removal of direct identifiers..., dataset-specific?
Note: Security and privacy has especially needed special attention to meet national context in Finland to benefit from ma-features



License













license_refLink to license document.URI11

3general

Closed


PI, Organization4Dataset-specific - What kind of license is granted for the use of data
https://creativecommons.org/licenses/by/4.0/
start_date

If date is set in the future, it indicates embargo period.

Encoded using the relevant ISO 8601 Date and Time compliant string

Date


11

3generalClosedPI, Organization4

Metadata














metadata_standard_id

Metadata Standard IDNested data structure11

2generalPublicPI, Research team, Reuse
http://www.dublincore.org/specifications/dublin-core/dcmi-terms/
descriptionDescriptionString0..11

1 / 3generalPublicPI, Research team, Reuse
provides taxonomy for...
languageLanguage of the metadata expressed using ISO 639-3Term from Controlled Vocabulary11

1 / 3generalPublicPI, Research team, Reuse

schema

Is the data built according to a specific schema?

 Allowed Values:

  • yes
  • no
Term from Controlled Vocabulary12

3generalPublicPI, Research team, Reuse

Relates to dataset. Ideally, metadata from existing datasets could be imported directly from e.g. Zenodo. Also, metadata could be brought in for any datasets published in the project. Infoflow might be easiest this way around rather than from DMP API to repository. 

vocabulary_linkWhich vocabularies are used?Term from controlled vocabulory12

3generalPublicPI, Research team, Reuse

formatWhat is the format of the metadata?Term from controlled vocabulory0..n2

3generalPublicPI, Research team, Reuse

location_docWhere is the documentation?String/URL12

3generalPublicPI, Research team, Reuse

generated

Is documentation generated automatically?

 Allowed Values:

  • yes
  • no
Term from Controlled Vocabulary12

3generalPublicPI, Research team, Reuse

access

Can the the documentation be accessed?

 Allowed Values:

  • yes
  • no
Term from Controlled Vocabulary12

3generalPublicPI, Research team, Reuse

publish_methodology

Where the methodology/workflow has been published

String /URL0..12

3generalPublicPI, Research team, Reuse
registration of research?
workflow

Is the workflow described?

Allowed Values:

  • yes
  • no
Term from Controlled Vocabulary12

3generalPublicPI, Research team, Reuse
Especially important in the case of large datasets, from which the data itself cannot be preserved, but is produced again if necessary
documentationWhat does the documentation consist of?String12

3generalPublicPI, Research team, Reuse
Workflow, variable description, … ?
purposeWhat is the basic purpose of metadata?String0..12

3generalPublicPI, Research team, Reuse
Qvain, own CRIS, something else?
open

Is the discovery metadata open?

Allowed Values:

  • yes
  • no
Term from Controlled Vocabulary12

3generalPublicPI, Research team, Reuse

location_metadata

Landing page of metadataPID0..1








properties of metadata_standard_id













identifierIdentifier for the metadata standard used.String12

2



http://www.dublincore.org/specifications/dublin-core/dcmi-terms/
typeIdentifier type
Allowed Values:
  • url
  • other
Term from Controlled Vocabulary12

1 / 3generalPublicPI, Research team, Reuse

  • No labels