NOTE: This is the active version of the national maDMP reference model.

Towards structural maDMP template

The target of this workshop is to make progress in maDMP reference data model.  Slides

Below is the structure of the RDA standard, and the elements from RDA standard are in the table below marked in GREEN and there is also a column "Is this in the  RDA template?" indicating that the data element is in RDA standard.

The elements derived from our national workshop consisting of questions to DMP are marked in BLUE

The sections are grouped according to the RDA standard:
DMP, Project, Contact, Contributor, Funding, Cost, Dataset, Distribution, Host, License, Security and Privacy, Technical Resource, Metadata

In addition, we have noted in previous workshop that it would be important to cover Data life cycle and Ethics. These can be incorporated to RDA maDMP datamodel or suggested as additional sections. Note: This is to be discussed in the webinar 16th Dec. (Other IDs to consider: RAID)



Guidance to way of working:

Target & Focus:

  • Our focus is to prepare a reference data model, not a DMP template.
  • Thus we focus to the information required - not to the questions i.e. how the information should be asked. We can make later examples on this. But in this workshop we focus to the data model itself.
  • Review the contents of your group area.
  • Make further clarification, and cross checks.
  • Improve naming of the elements if needed, whilst respect the RDA standard original naming. 
  • Indicate what information is needed to provide or launch machine-actionability & which fields can be used automatically via digital object, or with AI e.g. extracting information from an existing source.

  • Link to ontologies, data spaces and repositories and other relevant sources of information when you notice gaps of usage of available auxiliary information.

  • Identify the purpose and user of the data elements.
  • Discuss also in the group which national additions should be mandatory / strongly recommended in addition to RDA standard?

  • IF you cannot solve some data element, mark it with bold & question mark RED???

Documentation:

  • All notions are document the work into the table below. 
  • Do not create any other solutions for documenting the work.
  • Aim to write in English. 

Reference:
You can refer back to the RDA maDMP data model which is the core, but we can make suggestions for developing its machine-actionability. In addition, we can add relevant DMP fields to national context. But note that this is general data model - not containing scientific discipline specific information.
https://github.com/RDA-DMP-Common/RDA-DMP-Common-Standard

OSTrails Plan-Track-Assess Pathways:https://zenodo.org/records/13145788

For use cases you can refer to:
Marttila, J., Manninen, S., Ahokas, M., Hindersson-Söderholm, T., Keckman-Koivuniemi. H. (2022). Dynaamiset DMP:t -työryhmän loppuraportti. https://zenodo.org/records/6601258
Marttila, J. & Manninen, S. (2022). Dynaamiset DMP:t -työryhmän toivekartoitus. https://zenodo.org/records/6594597


Some explanations to column headings below:

E. Interoperability from data source (DMP or other system)
Does the field provide or launch interoperability for maDMPs and other systems?
1= Yes; 2=Uses DO as source (automatically filled; 3=Manually filled in or propose a DO

F. General, national or organizational? 
General for EU / EOSC / International needs; National recommended practises in Finland; Organizational relates to organization specific practices 

G. Public / disclosed Organisation's  internal information or public?
Justify why information needs to be disclosed within project/organization?

H. Who needs this field
Indicate purpose of the information if suggested to be added on RDA standard (minimum)

I. Is this in RDA ma DMP Standards
Indicate whether field is in core maDMP standard 

J. Section in RDA maDMP standard
Add section if in a separate section than in national reference data model

K. Additional information:
Define use case, give example of question or comment


For string  fields suggest the max char size


Focus on what information is already elsewhere is seccondary unless beneficial to be part of the maDMP. Use case needs to be defined.
National suggested fields are in BLUE; RDA Standard fields in GREEN.

Which suggested fields are MUST have, SHOULD or COULD have?

A. ma DMP structure

B.
Name / Core info for minimum DMP in bold

C.
Description

D.
Type

E. Interoperability from data source 
1=automate;
2=DO;
3=manual

F.  
General for EU, EOSC, International / National / Organizational / Funder specific

G.
Public /
Closed 

H.
Who needs this field? 

I.
Is this in the RDA maDMP standard
(x=yes)

J. Phase 
1=Planning
2=Applied
3=Granted
4=Mandatory update
5=Final reporting 

K. 
Mandatory (M) / Voluntary (O)

L. 
Use case
Example questions
Comments









DMPdmp_idDMP idNested Data Structure1generalPublic

System for interoperability

x1M

Request id for DMP

Where does this originate from, especially if using different tools/systems for DMPs?


type_dmpA description on what kind of DMP to do

Controlled vocabulary:

Student, Academic,
National,
International,
Organization 

3Organisational / national / international PublicSystem for appriopriate DMP template
 1M

Input formula should be later updated or extended to a richer format.     

Input profiles: for example: (Define national typology for recommended use of DMPs (light, detailed), key issues personal data, confidentiality of information, resource intensity, number of actors (outsiders))


title_dmpTitle of a DMPString3generalPublicUser x 1MMax 100 char 
(points to cost section)cost_dmp

Give an estimation or aggregated sum of costs related to data management, and in multiple instances break down costs into details.  
(Sum from costs given in cost section)

Nested Data Structure2generalClosed
(see comment)
Funderx2
→ further development: this could be linked to the budget with grant id. What costs included? Needs clear guidelines
Closed early in the process, but depends if the DMP will be actively made public / "published" at some point during the data life-cycle

created_dmpDate and time of first version of a DMPDateTime1 (system)generalPublicOrganizationxMSystem recorded

modified_dmpMust be set each time DMP is modified. Indicates DMP version. Encoded using the relevant ISO 8601 Date and Time compliant stringDateTime1 (system)generalPublicOrganizationxMSystem recorded

nextreview_dmpNext review date to update DMPDateTime2 / 3Organisational / Funder specificPublic Organization, PI
3OResearch project benefits of timing the update of DMP, and Data Support can better plan the assistance. Suggested to be added for making dmp alive and updated e.g. for reporting purposes

datasetTo describe data on a general level.Nested Data Structure3generalPublic Organization, PIx1MAt least one dataset should be defined. See "Dataset" in the table.

datalifecycleTo describe data lifecycle on a general level.Nested Data Structure3generalPublic Organization, PI
1O
Group 1: Project, Contact, Contributor, Funding & Cost

A. ma DMP structure

B.
Name
/ Core info for minimum DMP in bold

C.
Description

D.
Type

E. Interoperability from data source 
1=automate;
2=DO;
3=manual

F.  
General for EU, EOSC, International / National / Organizational / Funder specific

G.
Public /
Closed 

H.
Who needs this field? 

I.
Is this in the RDA maDMP standard
(x=yes)

 J. Phase 1=Planning
2=Applied
3=Granted
4=Mandatory update
5=Final reporting 

K. 
Mandatory (M) / Voluntary (O)

L. 
Use case
Example questions
Comments

Projecttitle_projectName/Title of the projectString3generalPublicEveryonex 1MIf project information is not yet available anywhere, how much should be produced here? Is it possible to have multiple DMPs for one project or a maDMP without a funder or project?

project_idProject identifierNested Data Structure2generalPublicEveryonex 1MRAiD: https://raid.org/

start_projectProject start date. Date3 (Can trigger update process e.g. after 3-6 months after start)generalPublicEveryonex 1MEncoded using relevant ISO Date and time compliant string 

end_projectProject end date. Date3 (Can trigger update process & reporting stage)generalPublicEveryonex 1OEncoded using relevant ISO Date and time compliant string If DMP is used for continuous process no end date is required

durationProject duration.Time yy-mm1



 2Oderived

description_projectProject short description String1 (project_id links to long description) otherwise 3generalPublicEveryonex 1MShort description e.g. max char 2000; include link to project plan if needing (project id field links to the longer description)
Where will the master copy of this information be filled? (from RAiD, but are there alternatives???)

disciplineScientific discipline of project

UNESCO science classification


pore-in via main categories

2 / 3generalPublicEveryone
 1O

2 if Analytics / AI can be used to suggest based on ORCID, Project_ID or Description 

3 if need to be added by researcher

This can be used to guide instructions and lists of what is available
Fields of education and training 2013 (ISCED-F 2013) Keywords and freeword allow mapping to ontologies and hence smart searches (whereas controlled vocabularies and taxonomies tend force users to use whatever ius close if there is no appropriate term available) 


funding_projectFunding related with a projectNested Data Structure2 (Derived from Funding status & Grant_id)generalPublicEveryonex 3OPublic after publishing the grant.
Contactcontact_idORCID of Contact person for a DMP / Principal (responsible) researcher

Orcid

1generalPublicEveryonex 1OThis has its own attributes (name, orcid, contact)
If Contact person and Principle researcher are different need to separate?

mboxE-mail addressemailgeneralPublicEveryonex 1Mfrom orcid

nameName of the contact person / principal researcherString2 (from ORCID) / 3generalPublicEveryonex 1Mfrom orcid
Contributor 

#_Nested Data Structure if many contributors (and data controllers)












contributor_id

Contributor ORCID


ORCID

2: Digital authentication e.g. by e-mail Contributor will add their ORCID or from Funding application 
3: Has risk of errors for ORCID
generalPublicEveryonex 2O

Needs to be defined - or where could be derived? From funding decision?


mbox_contributor

E-mail addressemail2 / 3 (depending if person has allowed sharing)generalPublicEveryonex 2M

name_contributor

Name of the contact personString2 (from ORCID) / 3generalPublicEveryonex 2M

role_contributor

Type of role / e.g. Work package leader / Data controller / Principle investigator / Author of data setControlled list2 / 3 generalPublicEveryonexM

Data controller is required for research data services 

Use case for AI search from funding proposal by roles


organization_contributor

Organization of contributing researcher

String2 (from ORCID/ROR) / or 3 generalPublicEveryone
 2OIf ROR exists this can be derived from ROR

ROR_contributor

ROR of organization of contributing researcher

ROR3generalPublicEveryone
1OThis has its own attributes (ROR) 

A. ma DMP structure

B.
Name

C.
Description

D.
Type

E. Interoperability from data source 
1=automate;
2=DO;
3=manual

F.  
General for EU, EOSC, International / National / Organizational / Funder specific

G.
Public /
Closed 

H.
Who needs this field? 

I.
Is this in the RDA maDMP standard
(x=yes)

J. Phase 
1=Planning
2=Applied
3=Granted
4=Mandatory update
5=Final reporting 

K. 
Mandatory (M) / Voluntary (O)

L. 
Use case
Example questions
Comments

Funding#_Nested Data Structure if many funding sources for a large research program unless defined that DMP relates to single grant decision








Does this information create any requirements for the DMP

funder_idFunder id / e.g. Registry number of associated project Y-tunnus / Business IDNested Data Structure2: ROR API via search option
3
generalPublicSystemx1MNested structure used if there are many of these. Field is empty if none

funderFunder nameNested Data Structure2generalPublicEveryone
1O

from Funding id


funding_submission_dlDeadline for funding submission Date

2: select funding (Akatemia)

3





1O



funding_decision_expectedExpected date for funding decision Date

2: select funding (Akatemia)

3





1O

Either known / estimated


funding_statusPhase of project life cycle: Planned, Applied, Granted, RejectedTerm from Controlled Vocabulory3  (nice to have feature: automatically derived information from grant ID the project is applied/granted)generalPublicEveryonex1-5O

from Funding id



grant_idGrant ID of the associated projectNested Data Structure2 if DOI (not currently)
3
generalPublicEveryonex3MM if exists 

start_fundFunding (Project) startDate2generalPublicEveryone
3Ofrom Grant id

end_fundFunding (Project) endDate2generalPublicEveryone
3Ofrom Grant id

duration_fundFunding (Project) durationRange2generalPublicEveryone
3Oderived from Funding id (in order to search databases)

ror_fundResponsible organization for funding application, RORNested Data Structure3: manually select from listgeneralPublicEveryone
1Oror of PI,
If there is a big consortium, would it be worth making WP-specific DMPs to make them actually work
Cost # list all cost object categories










currency_codeAllowed values defined by ISO 4217.
Note: Default is EUR or could this be linked to Funder_Id? 
Term from Controlled Vocabulory3 / 2 (from grant_id)generalClosed/PublicOrganizationx3Mfrom Funder id
 description_costDescription of costs
Note: Could this be linked to Grant ID for description of applied/granted budget?
String3 / 2 (from grant_id / application)generalClosed/PublicOrganizationx3Ofrom Grant id
 title_costTitle of costs
Note: Could this be linked to Grant ID for title of applied/granted budget?
String3 / 2 (from grant_id / application)generalClosed/PublicOrganizationx3Mfrom Grant id
 value_costValue of costs
Note1: Could this be linked to Grant ID for applied/granted budget?
Note2: Link with DMP / cost_dmp
Number3 / 2 (from grant_id / application)generalClosed/PublicOrganizationx3Mfrom Grant id
  sum_cost_dmp Sum of value of costs Number1generalClosed/PublicOrganization
3MAutomated sum of value_cost if multiple
Group 2: Dataset, Distribution 







Define what is dataset (RDA Standards defines dataset is rawdata. Can we use this? If multiple data sets need to be defined, what is the distinctive feature needed in DMP? (ownership, source, source data, raw data, repository for workflow, data product, reference data, master data, models & code) - And what needs to be defined in DMP Data model Glossary - see RDA standards explanations https://github.com/RDA-DMP-Common/RDA-DMP-Common-Standard?tab=readme-ov-file

A. ma DMP structure

B.
Name

C.
Description

D.
Type

E. Interoperability from data source 
1=automate;
2=DO;
3=manual

F.  
General for EU, EOSC, International / National / Organizational / Funder specific

G.
Public /
Closed 

H.
Who needs this field? 

I.
Is this in the RDA maDMP standard
(x=yes)

J. Phase 
1=Planning
2=Applied
3=Granted
4=Mandatory update
5=Final reporting 

 

K. 
Mandatory (M) / Voluntary (O)

L. 
Use case
Example questions
Comments

Dataset #_Nested Data Structure if many datasets are used. DMP has "dataset" association that can relate to many datasets. Each data set can have multiple files/distributions.







Relationships to 1..* datasets should be defined at DMP level. DMP can contain multiple datasets.


dataset_idDataset ID, Identifier for a datasetDOI, PID, URN, URL, handle, ark, other1 / 3generalPublicRPO (reseach performing organization), Repositories, Data cataloguesx2M (When published)

DOI, PID, URN, URL, handle, ark, other


Dataset may not exist when DMP is defined. DMP tool should provide temporary ID before dataset gets PID by some way.


Identifier should be typed.

 title_datasetData set title / nameString2generalPublicSame as above, but humans need this instead of machinesx
MThere can be many data sets, the information is related to one entity. A so-called metax entity, i.e. one must be able to express a wide variety of entities that then have attributes

type_datasetData set type (indication interview, questionnaire, photos, video, measurement, samples, simulation, code)Partly (Controlled vocabulary and "Other" option)2 / 3generalPublicData Catalogues, Repositories, RPOsx
M

Associated with a single dataset, can include ready-made options, but also an open text field. What is the correct granularity level here? Resource intensity can affect the needs of the description. In general, it is instructed to describe so that the attribute applies to the entire dataset. By describing just one data set, it would be possible to create a so-called data set. light-DMP. This is an important option to keep.

Need here some sort of defined and shared vocabulary on "data set types".

RDA Commons points to DataCite and Coar, but neither feel enough by themselves. Should do national type list based on those, but enhanced to give perhaps subtypes.


format_datasetDescription of used dataset formats during the active research. For example database, csv, xml, json.

(Format of the dataset to be used. - Format of the datasets to be published / distributed after project is different)
Controlled vocabulory2General



O

Relates to one data set 

How does this relate to other outputs than datasets like code? Or code that is close related to data usability, e.g. link or PID?

Format vs. Type? What is the difference.

File format should be in distribution, not here.


personal_data (exists already under security and privacy)Whether the dataset contains personal dataBoolean 3 - Needs human in the loopgeneralClosedData protection officers, RPOs (data protection/management experts), repositoryx
M

Associated with a single dataset, is this personal data the data of the data providers or of the target data? What is the role of individuals? Yes or No / Yes or No or Unknown

We think that "Unknown" is not an option here.

Type of personal data will be in its own section.

Can trigger automatic data protection processes.


sensitive_data

Whether there are legal restrictions that apply to using this data, e.g. military use, commercial restrictions, endangered species



Boolean 3 - Needs human in the loopgeneralClosedData protection officers, RPOs (data protection/management experts), repositoryx
 M

Related to the dataset, how can we ensure that this is not asked except when it is likely?
Yes or No / Yes or No or Unknown

Dual use and import controls?

This should be yes/no. Then if we need more information on, why it is sensitive, it should be its own property. How is this vs. confidentiality in security &privacy? 

I guess that in dataset we need to know if there is sensitive/confidential information or not. That triggers then more questions in security & privacy section.


data_sharing_issuesHow any legal and ethical issues related to the sharing of data (e.g. ownership, copyright, sensitivity) will be resolvedString
National 
National DMP template, Organization, PI

M
Optionaldata_sharing_contracts
Boolean





 O
Optionaldata_sharing_ownership
Boolean





 O
Optionaldata_sharing_copyright
Boolean





 O
Optionaldata_sharing_sensitivity
Boolean





 O
New fields emergingdata_sharing_other ?









If no distribution but metadata availabledata_landing_pageGive the link / PID to landing page of datalink / PID







Feedback to RDA standard in Github & suggested to be left out from maDMPstandarddescription_datasetDescriptionStringFIll in from this onwardsgeneralPublicRepository, Data catalogues, CRIS?x  O

Needs some kind of guidance on what level of description is needed. Need for space limitatation?

We already have name for the dataset. How much more description we want/need at this point?

We should not ask these in DMP. They are about publishind and metadata. If somebody wants to combine DMP and CRIS, this information needs to be interoperable, but this is NOT part of DMP. Same holds true for all red rows.

Feedback to RDA standarddistribution_datasetTechnical information on a specific instance of data

Nested Data Structure

Could be defined vocabulary


general

x This might need more clarification, as it relates to resources/infra needed.
Feedback to RDA standardissued_datasetDate of dataset been issuedDate
general

x  O
Feedback to RDA standardkeyword_datasetKeywordString / Term from controlled vocabulory
general

x  OShould be asked only when data is opened/catalogued.
Feedback to RDA standardlanguage_datasetLanguage of the dataset expressed using ISO 639-3Term from controlled vocabulory
general

x  O
Feedback to RDA standardmetadata_datasetDescribe metadata standards usedNested Data Structure
general

x  =












Move to security & privacysecurity_privacyTo list all issues and requirements related to security and privacyTerms from controlled vocabulary3 (from organisational list)generalclosed
x  These sound like reports compiled based on DMP
Technical resource#_Nested Data Structure if many datasets are used. Use dataset_id's to indicate ???









 name_technical_resourceName of the technical resourceString


   
 description_technical_resourceTo list all technical resources needed String3 / 2  (from organisational or national list)generalclosed
x  These sound like reports compiled based on DMP. So if you have confidential data, DMP compiles list of local services that CAN be used.

reuse_datasetIs previously collected data reused in this project
(Whether the data is collected, created or comes from elsewhere)
Boolean

PublicNational DMP template


Relates to one data set (does it? Or does it relate to whole research in this meaning)

Reuse of data is also information funders require. Also important here is the terms of use to the data.

If you're using data that is already published and is based on other data? So is this property of dataset or is this property of research?


source_datasetData sourcePartly (pid if possible, string if not)2 / 3




O

Relates to one data set, can include ready-made options, but also an open text field

Referencing can be really confusing. You can use data obtained from Twitter. Or dataset that somebody else compiled from Twitter... What do you reference here? Or do you make derivate dataset based on already existing dataset that is compiled from twitter? 


data_collectedData collected for this projectString3

National DMP template




data_producedData produced as an outcome of the projectString3

National DMP template




estimate_datasize

Give a rough estimate of the size of the data produced/collected 

Value3

National DMP template


Relates to one data set

How does this differ from dataset size from distribution?


data_resource_estimateProject data magnitudeValue1 (Add DMP datasets & estimated size)





???
Suggest to RDA TF to leave this out (see comments)!data_quality_assuranceData quality AssuranceString3general

x  

Is this even necessary in modern DMP? This is like asking "do you do your science properly?"

Could have some guiding meaning to wake up researchers to see what all is possible?

Could there be lists connected to datatype (so it should offer relevant lists)?

Our experts can help on risk management related to security, privacy, storage etc. But can we say anything relevant on quality control? Do we need to?

What is use case here?!? Is only case educating the researcher if he is ignorant on his methodology?


method_quality_assuranceWays of quality assuranceList:
TAU list as an example, should make a national, up-to-date list
3national



oAlso, something else, to be specific. Could these come according to the discipline?

A. ma DMP structure

B.
Name

C.
Description

D.
Type

E. Interoperability from data source 
1=automate;
2=DO;
3=manual

F.  
General for EU, EOSC, International / National / Organizational / Funder specific

G.
Public /
Closed 

H.
Who needs this field? 

I.
Is this in the RDA maDMP standard
(x=yes)

 J. Phase 1=Planning
2=Applied
3=Granted
4=Mandatory update
5=Final reporting 

K. 
Mandatory (M) / Voluntary (O)

L. 
Use case
Example questions
Comments

Distribution (Preservation and sharing of material during the project) - OPTIONALaccess_urlA URL of the resource that gives access to a distribution of the dataset. e.g. landing page.URI
general

x
O

In case of DMP you should use these to describe active use of the data. Others should be in life-cycle.



title_datasetTitle is a property in both Dataset and Distribution, in compliance with W3C DCAT. In some cases these might be identical, but in most cases the Dataset represents a more abstract concept, while the distribution can point to a specific file.

general

x
M

available_until_urlIndicates how long this distribution will be/ should be available. Encoded using the relevant ISO 8601 Date and Time compliant stringDate
general

x
O

byte_size

Byte Size (In RDA this is a number, CSC proposes a category list:
S: < 10 TB, M: 10-50 TB , L: 50-100TB, XL: 100-200 TB, XXL: > 200 TB
Number or  Size Category: S, M, L, XL, XXL
general

x
M

E.g. 
S < 10 TB, 10 <= M < 50 TB, 50 < L <= 100 TB, 
100 < XL <= 200 TB, 200 < XXL

Important as it affects what all tools are available.


data_accessIndicates access mode for data and data sharing.
Allowed Values:
  • open
  • shared
  • closed
Term from Controlled Vocabulary
general

x

This can change during the study. First I use it 3 years as closed, then I open it. Should here be what I want to do after the active use or what happens right now? → Should be the current publication status of the distribution. Dataset lifecycle documents the plan for the dataset.

description_distributionDescription is a property in both Dataset and Distribution, in compliance with W3C DCAT. In some cases these might be identical, but in most cases the Dataset represents a more abstract concept, while the distribution can point to a specific file.String
general

x
O

download_urlThe URL of the downloadable file in a given format. E.g. CSV file or RDF file.

general

x
O

formatFormat according to: https://www.iana.org/assignments/media-types/media-types.xhtml if appropriate, otherwise use the common name for this format

general

x
M
This should be at dataset level → If not distribution spesificlicense_datasetList all licenses applied to a specific distribution of data.

general

x
O
Is this same as access_url if dataset is published???host_datasetTo provide information on quality of service provided by infrastructure (e.g. repository) where data is stored. Service URNURN
general

x
OOutside the own organization? Same as location_open_data in lifecycle

A. ma DMP structure

B.
Name

C.
Description

D.
Type

E. Interoperability from data source 
1=automate;
2=DO;
3=manual

F.  
General for EU, EOSC, International / National / Organizational / Funder specific

G.
Public /
Closed 

H.
Who needs this field? 

I.
Is this in the RDA maDMP standard
(x=yes)

 J. Phase 1=Planning
2=Applied
3=Granted
4=Mandatory update
5=Final reporting 

K. 
Mandatory (M) / Voluntary (O)

L. 
Use case
Example questions
Comments

Dataset life cycle

(extension to Dataset)

 datalifecycle_descriptionDescribe datasets created in project, and after the project at general level, and how they are managedString3generalPublic / RestrictedPI, Organization, Service procider
1O

Funder and CSC needs this information


Data life-cycleWhere will the data be stored during the project?URN from CSC Service Catalogue & list presented by organization, if something else, what?3National  Public/ClosedPI, Organization, Service procider
1M

Relates to a dataset, extra-important if data subject to the Act on the Secondary Use of Data

Add to general data life-cycle

Specify by data set if needed


data_usersWith whom will the data be shared during the project?Open, Inside Europe, Inside research consortium, Inside organization, complex structure3National  
PI, Organization, Service procider
1ORefers to the technical solutions, will a DPA be needed? Is joint controller agreement, NDAs etc. already elsewhere? Or does this refer to the consortium projects?

shareage_solutionHow the data will be shared during the project? Define technical?Choose from Service catalog3National  
PI, Organization, Service procider
1M
NEWversion_mgmt_dataHow the data versions are managed?String3National  
PI, Organization, Service procider
4O / MMandatory for large data intensive projects (At CSC >50 TB)
NEWretention_dataHow data retentions are managed?String3National  
PI, Organization, Service procider
4O / MMandatory for large data intensive projects (At CSC >50 TB)
Data retention plan is needed for managing the size of the project
NEWexit_plan_dataWhat is the exit plan from computational and storage services in the end of the project?String3National  
PI, Organization, Service procider
4MExit plan is needed to ensure that research data with value for re-use is saved within the available resources 

backup_dataHow data will be backed up during the project? To be planned by the researcher or organization specific solutions?String3Organizationl  ?PI, Organization, Service procider
1M

application_process_dataWhat applications are used to process data?Controlled list CSC Service Catalogue &  organization services3General, National & Organizational
PI, Organization, Service procider
1OAffects the choice of storage environment (e.g. whether the video is only available for viewing or whether it needs to be available at the file level in an analysis program)

computing_environmentsWhich computing environments are needed for research?Controlled list CSC Service Catalogue &  organization services3National & Organizational
PI, Organization, Service procider
1ORelates to data set

computing_capacity_CPUHow much core hours for computing capacity is required in CPU?Value 3general
PI, Organization, Service procider
1M

computing_capacity_GPUHow much core hours for computing capacity is required in GPU?Value 3general
PI, Organization, Service procider
1M

preservation_statementPreservation StatementString3general

x 2M

archiving_services_dataAre archiving services or long term preservation for data needed?BooleanHow to determine the value of data?National



M

Relates to data set

Is this long-term storage, e.g. 20 in Zenodo, archiving in institutional archive or something else?
PAS merged here. The storage word is problematic, clarify further if needed


data_close_justificationIf the project does not collect or produce any data fully or partially suitable for reuse, justify why the data cannot be made available even partially.String
National
National DMP template

OThis is mandatory if data is closed. Should there be dataset level field for dataset publication (open / closed) ?

location_open_dataWhere will the data be opened? Controlled list of data repositoriesSpecial requirements for data repositories for preliminary data?National National DMP template

O

FSD comments: It is essential for the repository/archive to know (in the case of research projects that have received a positive funding decision) what kind of data are planned to be opened in the repository/archive and by whom.

Covered under distribution maybe?

This field responds also to requirement of National DMP template on: where the data or a publishable portion of them will be made available after the end of the project


Moved here from Securitylength_storage_data How long the data is stored for the original research purpose
3National




Relates to dataset, original purpose

deletion_dataHow is data deleted/destroyed?Controlled list, including option of not deleting data and explanation as to why.3National



MCould be specified that this relates to unpublished data. Or data that are mentioned to be shared e.g. for 5 or 10 years, etc.

deletion_when_dataWhen is data deleted/destroyed?Date (Also freetext if not known, end of project etc?)




OCould be specified that this relates to unpublished data.

see above: 
archiving_services_data
Will the data be archived? - (Overlapping with question: Are archiving services for data needed?)









archiving_when_dataWhen to archive?Date. Same comment as above.3National



OActive data can be deleted and archived at the same time

archiving_location_dataWhere to archive?Controlled list CSC Service Catalogue &  organization services3National



O
  
Group 3: Security, privacy, technical resources and metadata

 

A. ma DMP structure

B.
Name

C.
Description

D.
Type

E. Interoperability from data source 
1=automate;
2=DO;
3=manual

F.  
General for EU, EOSC, International / National / Organizational / Funder specific

G.
Public /
Closed 

H.
Who needs this field? 

I.
Is this in the RDA maDMP standard
(x=yes)

 J. Phase 1=Planning
2=Applied
3=Granted
4=Mandatory update
5=Final reporting 

K. 
Mandatory (M) / Voluntary (O)

L. 
Use case
Example questions
Comments

Security and privacy (Legislation)Note: Security and privacy needs special attention to meet national context in Finland to benefit from ma-features - Green as is in RDA standard. Please make needed clarifications in BLUE. 

 
 title_securityTitle of security measures String
general
Organization x 1M
 description_security_privacyDescription of security and privacy measuresControlled Vocabulary
general
Organization  x 1M
 National additions in Finland:










confidentialityDoes the data contain confidential information
(EU definition (question) , law (julkisuuslaki); agreements incl trade secrets - classification from governmental bodies)

 Term from Controlled Vocabulary:
Confidential, Classified 

DMP, 3EU/national security levels (restricted, confidential, secret, top secret / käyttörajoitettu, luottamuksellinen, salainen, erittäin salainen)ClosedOrganisation, open science services
 1M

May contain a "yes" condition, after which it is indicated which datasets this relates to. Confidential, business secrets, sensitive geospatial data, sensitive biodiversity data, national security, trade secrets. Dataset-specific.
Yes / No / Unknown

Comment:  a joined classification for security levels. EU security levels here as an example.

This is in RDA standard dataset specific attribute. This could be derived variable from dataset specific questions to project level. If information is needed as well at project level.personal_data 

Does the research handle personal data for research purposes? 

  • If yes → Triggers Risk assessment process &
  • DPIA process
  • Ethical review process

Documented 

Boolean

2 / 3EU/national security levels (restricted, confidential, secret, top secret / käyttörajoitettu, luottamuksellinen, salainen, erittäin salainen)ClosedOrganization
1M
Optional Part of DPIA processpre_dpia

Has risk assessment been filled in?

(risk assessment/pre-dpia, selftest if DPIA is needed)


 Boolean
EU/national security levels (restricted, confidential, secret, top secret / käyttörajoitettu, luottamuksellinen, salainen, erittäin salainen)

Closed


Organization
1M
Optional Part of DPIA process

dpia


Should DPIA be done?

 

BooleanRequirement comes from the law

Voluntary question
EU/national security levels (restricted, confidential, secret, top secret / käyttörajoitettu, luottamuksellinen, salainen, erittäin salainen)ClosedOrganization
1O

Good question, should this be in the DMP at all. In this context, it is also possible to make a real assessment of whether it is being done. However, privacy information should be structured and compatible so that you can ask for it here if you wish. This should be optional in the sense that you don't accidentally ask twice if this triggers another process

Comment: pre-DPIA usually executed to see if a full DPIA is necessary. 














dpia_id

If DPIA exist give URI / DOI

URI  EU/national security levels (restricted, confidential, secret, top secret / käyttörajoitettu, luottamuksellinen, salainen, erittäin salainen)Closed Organization
1M 
if other risk assessments doneriskassessment_id

give URI / DOI

URI/PID  EU/national security levels (restricted, confidential, secret, top secret / käyttörajoitettu, luottamuksellinen, salainen, erittäin salainen)ClosedOrganization
1M 

conditional if yes in personal_data; if DPIA exists point out which information exists for direct data retrieval: + date





privacy-notice

Is there a need for a privacy notice?
   ClosedData protection team


1MDate & title for privacy notice, data transfer agreements

privacy_notice_id

If privacy notice exist give link / archive number
   ClosedPI, Organization
1MUse case maDMP could transfer the number / link of the privacy notice to data protection team when it is been done to indicate the status. 

personal_data_list

What personal data do you process Controlled listTAU's list as an example, myCSC can also be found,
 Closed 
1MThese should really be asked one by one, to reach MA. So all options as Booleans. 
IF data privacy notice exists / risk assessment this is not asked / but could be filled in automatically; personal_data_sp_categoryWhat special categories of personal data do you process StringRequirement comes from the law
Closed

1MCategories of special categories of personal data

personal_data_sp_category_racial_or_ethnic origin

 Boolean 3 / 1
Closed

1M 

political opinions

 Boolean 
Closed

1M 

religion or philosophical beliefs

 Boolean 
Closed

1M 

trade union membership

 Boolean 
Closed

1M 

data concerning health

 Boolean 
Closed

1M 

sexual orientation or activity and

 Boolean 
Closed

1M 

genetic and biometric data for identifying the person.

 Boolean

Closed

1M 

other_personal_data_sp_category


String 
Closed

1M 

data_prosessing_basisBasis for data processingString
NationalClosed

1M

data_prosessing_sp_categoryBasis for processing special categories of personal dataString
National - (for enabling to provide optimal services)ClosedPI, Organization, Service provider
1M

personal_data_transfer_outside_EUWhether personal data is transferred outside the EUBoolean
National - (for enabling to provide optimal services)ClosedPI, Organization, Service provider
1M

personal_data_transfer_countryTo which countries personal data is transferredString
National - (for enabling to provide optimal services)ClosedPI, Organization, Service provider
1M

personal_data_external_processorsAre there external processorsBoolean
Local (Responsibility for organization)ClosedPI, Organization, Service provider
1M

personal_data_minimizedHow is the processing of personal data minimized?String
Local (Responsibility for organization)ClosedPI, Organization, Service provider
1MAnonymization, pseudonymization, removal of direct identifiers..., dataset-specific?
IN RDA Standard Ethical issues are part of DMP domain - FI pilot considers grounds for separate category or merge with license & add user rights:
RIghts, ethics & license  





  
Ethical issuesethical_issues_existTo indicate whether there are ethical issues related to data that this DMP describes.
Allowed Values:
  • yes
  • no
  • unknown
List
generalClosedPI, Organization, Service providerx1MThis is an important trigger because then the DMP must be very good

ethical_issues_reportRDA: "To indicate where a protocol from a meeting with an ethical commitee can be found" or direct ID to report

URI



generalClosedPI, Organization, Service providerx1M

Add link

Comment: Date when the decision was made


ethical_issues_descriptionDescribe ethical issues directly in a DMPString
generalClosedPI, Organization, Service providerx1M

research_permitWhether permission is required to collect data in research data setBoolean
generalClosedPI, Organization
1M

Actual research permit


A. ma DMP structure

B.
Name

C.
Description

D.
Type

E. Interoperability from data source 
1=automate;
2=DO;
3=manual

F.  
General for EU, EOSC, International / National / Organizational / Funder specific

G.
Public /
Closed 

H.
Who needs this field? 

I.
Is this in the RDA maDMP standard
(x=yes)

J. Phase 1=Planning
2=Applied
3=Granted
4=Mandatory update
5=Final reporting 

K. 
Mandatory (M) / Voluntary (O)

L. 
Use case
Example questions
Comments

Rights related to dataownership_data_rightWho owns the data/rights related to the data?Person / Organization3national



MPerson or organization? Dataset-specific? The organisation can be a research organisation, a customer organisation or an organisation that otherwise only owns the data (e.g. an archive)

ipr_copyright Is there IPR or copyright issuesBoolean3national



M

agreements_data_rightWhat agreements are needed related to the rights to the material?String3national



MAre data right agreements included here?

agreementsWhat other agreements are needed?String3




M
Licenselicense_refLink to license document.URI3general

x
MDataset-specific - What kind of license is granted for the use of data
https://creativecommons.org/licenses/by/4.0/

start_dateIf date is set in the future, it indicates embargo period.Date3general

x
M

A. ma DMP structure

B.
Name

C.
Description

D.
Type

E. Interoperability from data source 
1=automate;
2=DO;
3=manual

F.  
General for EU, EOSC, International / National / Organizational / Funder specific

G.
Public /
Closed 

H.
Who needs this field? 

I.
Is this in the RDA maDMP standard
(x=yes)

 J. Phase 1=Planning
2=Applied
3=Granted
4=Mandatory update
5=Final reporting 

K. 
Mandatory (M) / Voluntary (O)

L. 
Use case
Example questions
Comments

Metadata schemaIs the data built according to a specific schema?Boolean
national




Relates to data set

Ideally, metadata from existing datasets could be imported directly from e.g. Zenodo. Also, metadata could be brought in for any datasets published in the project. Info flow might be easiest this way around rather than from DMP API to repository.


descriptionDescriptionString



x

provides taxonomy for...

languageLanguage of the metadata expressed using ISO 639-3Term from Controlled Vocabulary
national

x



metadata_standard_idName this schema/link to schemaPID / other id
general

x

Relates to data set

identifier
String



x

http://www.dublincore.org/specifications/dublin-core/dcmi-terms/

typeIdentifier type. Allowed values: URL, OtherTerm from controlled vocabulory



x

URL

vocabulary_linkAre vocabularies also used?Term from controlled vocabulory3national






format_documentationWhat is the format of the documentation?List3Local ?




something else, what

location_documentationWhere is the documentation?String/URL3Local ?






generated_documentationIs documentation generated automatically?Boolean3Local ?






access_documentationCan the the documentation be accessed?Boolean3Local ?






publish_methodologyHas the methodology/workflow been published somewhere?Boolean3Local ?




registration of research?

workflowIs the workflow described?Boolean3Local ?



OEspecially important in the case of large datasets, from which the data itself cannot be preserved, but is produced again if necessary

description_documentationWhat does the documentation consist of?String3Local ?



OWorkflow, variable description, … ?

metadata_purposeWhat is the basic metadata described for?String3Local ?



OQvain, own CRIS, something else?

metadata_openOpen the basic metadata?Boolean3Local ?



O

metadata_locationWhere is the basic metadata openned?String /URL3Local ?





Securitydescription_securityDescription of security measuresControlled List 3
Closed
x


 title_securityName of the technical resourceString3
Closed
x


Is separate role to data controller needed in DMP?access_control_idWho is responsible for access control? Person / Role?ORCID
National





Is separate role to data controller needed in DMP?access_control_nameWho is responsible for access control?String 








protection_level_dataWhat is the required level of data protection?Controlled List (from Service catalog)






Relates to Data Protection Level of the data set: 
Open access, Restricted access, Restricted access & controlled use; Restricted access & restricted use
Is separate role to data controller needed in DMP? Is this organization or project specific?security_officerInformation Security OfficerString derived from the storage locationLocal

















Further comments from National DMP template template :





Checked against Responsibilities from Academy templates (6.1) & (6.2)





→ For service design workshop in MarchList all steps where responsibilities need to be definedPerson / Role?

















































  • No labels