NOTE: This the workspace of the workshop, and old version of the data model.

Towards structural maDMP template

The target of this workshop is to make progress in maDMP reference data model.  

Below is the structure of the RDA standard, and the elements from RDA standard are in the table below marked in GREEN and there is also a column "Is this in the  RDA template?" indicating that the data element is in RDA standard.

The elements derived from our national workshop consisting of questions to DMP are marked in BLUE

The sections are grouped according to the RDA standard:
DMP, Project, Contact, Contributor, Funding, Cost, Dataset, Distribution, Host, License, Security and Privacy, Technical Resource, Metadata

In addition, we have noted in previous workshop that it would be important to cover Data life cycle and Ethics. These can be incorporated to RDA maDMP datamodel or suggested as additional sections. Note: This is to be discussed in the webinar 16th Dec. (Other IDs to consider: RAID)


We will work in small groups. 
Group 1: Project, Contact, Contributor, Funding & Cost
Group 2: Dataset, Distribution 
Group 3: Security, privacy, technical resources and metadata


Guidance to way of working in small groups:

Target & Focus:

  • Our focus is to prepare a reference data model, not a DMP template.
  • Thus we focus to the information required - not to the questions i.e. how the information should be asked. We can make later examples on this. But in this workshop we focus to the data model itself.
  • Review the contents of your group area.
  • Make further clarification, and cross checks.
  • Improve naming of the elements if needed, whilst respect the RDA standard original naming. 
  • Indicate what information is needed to provide or launch machine-actionability & which fields can be used automatically via digital object, or with AI e.g. extracting information from an existing source.

  • Link to ontologies, data spaces and repositories and other relevant sources of information when you notice gaps of usage of available auxiliary information.

  • Identify the purpose and user of the data elements.
  • Discuss also in the group which national additions should be mandatory / strongly recommended in addition to RDA standard?

  • IF you cannot solve some data element, mark it with bold & question mark RED???

Documentation:

  • All groups document the work into the table below. 
  • Do not create any other solutions for documenting the work.
  • Aim to write in English. 

Reference:
You can refer back to the RDA maDMP data model which is the core, but we can make suggestions for developing its machine-actionability. In addition, we can add relevant DMP fields to national context. But note that this is general data model - not containing scientific discipline specific information.
https://github.com/RDA-DMP-Common/RDA-DMP-Common-Standard

OSTrails Plan-Track-Assess Pathways:https://zenodo.org/records/13145788

For use cases you can refer to:
Marttila, J., Manninen, S., Ahokas, M., Hindersson-Söderholm, T., Keckman-Koivuniemi. H. (2022). Dynaamiset DMP:t -työryhmän loppuraportti. https://zenodo.org/records/6601258
Marttila, J. & Manninen, S. (2022). Dynaamiset DMP:t -työryhmän toivekartoitus. https://zenodo.org/records/6594597


Some explanations to column headings below:

E. Interoperability from data source (DMP or other system)
Does the field provide or launch interoperability for maDMPs and other systems?
1= Yes; 2=Uses DO as source (automatically filled; 3=Manually filled in or propose a DO

F. General, national or organizational? 
General for EU / EOSC / International needs; National recommended practises in Finland; Organizational relates to organization specific practices 

G. Public / disclosed Organisation's  internal information or public?
Justify why information needs to be disclosed within project/organization?

H. Who needs this field
Indicate purpose of the information if suggested to be added on RDA standard (minimum)

I. Is this in RDA ma DMP Standards
Indicate whether field is in core maDMP standard 

J. Section in RDA maDMP standard
Add section if in a separate section than in national reference data model

K. Additional information:
Define use case, give example of question or comment


For string  fields suggest the max char size


Focus on what information is already elsewhere is seccondary unless beneficial to be part of the maDMP. Use case needs to be defined.
National suggested fields are in BLUE; RDA Standard fields in GREEN.

Which suggested fields are MUST have, SHOULD or COULD have?

A. ma DMP structure

B.
Name / Core info for minimum DMP in bold

C.
Description

D.
Type

E. Interoperability from data source 
1=automate;
2=DO;
3=manual

F.  
General for EU, EOSC, International / National / Organizational / Funder specific

G.
Public /
Closed 

H.
Who needs this field? 

I.
Is this in the RDA maDMP standard
(x=yes)

J. Phase 
1=Planning
2=Applied
3=Granted
4=Mandatory update
5=Final reporting 

K. 
Mandatory (M) / Voluntary (O)

L. 
Use case
Example questions
Comments









DMPdmp_idDMP idNested Data Structure1generalPublic


x1M

Request id for DMP

Where does this originate from, especially if using different tools/systems for DMPs?


type_dmpA description on what kind of DMP to do

Controlled vocabulary:

Student, Academic,
National,
International,
Organization 

3Organisational / national / international PublicUser / System
 1M

Input formula should be later updated or extended to a richer format.     

Input profiles: for example: (Define national typology for recommended use of DMPs (light, detailed), key issues personal data, confidentiality of information, resource intensity, number of actors (outsiders))


title_dmpTitle of a DMPString3generalPublicUser x 1MMax 100 char 
(points to cost section)cost_dmp

Give an estimation or aggregated sum of costs related to data management, and in multiple instances break down costs into details.  
(Sum from costs given in cost section)

→ further development: this could be linked to the budget with grant id. What costs included? Needs clear guidelines

Nested Data Structure2generalClosed
(see comment)

x2
Closed early in the process, but depends if the DMP will be actively made public / "published" at some point during the data life-cycle

created_dmpDate and time of first version of a DMPDateTime1 (system)generalPublicOrganizationxMSystem recorded

modified_dmpMust be set each time DMP is modified. Indicates DMP version. Encoded using the relevant ISO 8601 Date and Time compliant stringDateTime1 (system)generalPublicOrganizationxMSystem recorded

nextreview_dmpNext review date to update DMPDateTime2 / 3Organisational / Funder specificPublic Organization, PI
2OResearch project benefits of timing the update of DMP, and Data Support can better plan the assistance. Suggested to be added for making dmp alive and updated e.g. for reporting purposes

datasetTo describe data on a non-technical level.Nested Data Structure2 ???generalPublic x1MAt least one dataset should be defined. See "Dataset" in the table.
Group 1: Project, Contact, Contributor, Funding & Cost

A. ma DMP structure

B.
Name

C.
Description

D.
Type

E. Interoperability from data source 
1=automate;
2=DO;
3=manual

F.  
General for EU, EOSC, International / National / Organizational / Funder specific

G.
Public /
Closed 

H.
Who needs this field? 

I.
Is this in the RDA maDMP standard
(x=yes)

 J. Phase 1=Planning
2=Applied
3=Granted
4=Mandatory update
5=Final reporting 

K. 
Mandatory (M) / Voluntary (O)

L. 
Use case
Example questions
Comments

Projecttitle_projectName/Title of the projectString3generalPublicEveryonex 1MIf project information is not yet available anywhere, how much should be produced here? Is it possible to have multiple DMPs for one project or a maDMP without a funder or project?

project_idProject identifierNested Data Structure2generalPublicEveryonex 1MRAiD: https://raid.org/

start_projectProject start date. Date3 (Can trigger update process e.g. after 3-6 months after start)generalPublicEveryonex 1MEncoded using relevant ISO Date and time compliant string 

end_projectProject end date, if known. Date3 (Can trigger update process & reporting stage)generalPublicEveryonex 1OEncoded using relevant ISO Date and time compliant string If DMP is used for continuous process no end date is required

durationProject duration.Time yy-mm1



 2Oderived

description_projectProject short description
String1 (project_id links to long description) otherwise 3generalPublicEveryonex 1MShort description e.g. max char 2000; include link to project plan if needing (project id field links to the longer description)
Where will the master copy of this information be filled? (from RAiD, but are there alternatives???)

disciplineScientific discipline of project

UNESCO science classification


pore-in via main categories

2 / 3generalPublicEveryone
 1O

2 if Analytics / AI can be used to suggest based on ORCID, Project_ID or Description 

3 if need to be added by researcher

This can be used to guide instructions and lists of what is available
Fields of education and training 2013 (ISCED-F 2013) Keywords and freeword allow mapping to ontologies and hence smart searches (whereas controlled vocabularies and taxonomies tend force users to use whatever ius close if there is no appropriate term available) 


funding_projectFunding related with a projectNested Data Structure2 (Derived from Funding status & Grant_id)generalPublicEveryonex 3OPublic after publishing the grant.
Contactcontact_idORCID of Contact person for a DMP / Principal (responsible) researcher

Orcid

1generalPublicEveryonex 1OThis has its own attributes (name, orcid, contact)
If Contact person and Principle researcher are different need to separate?

mboxE-mail addressemailgeneralPublicEveryonex 1Mfrom orcid

nameName of the contact person / principal researcherString2 (from ORCID) / 3generalPublicEveryonex 1Mfrom orcid
Contributor 

#_Nested Data Structure if many contributors


 

  
  

contributor_id

Contributor ORCID


ORCID

2: Digital authentication e.g. by e-mail Contributor will add their ORCID or from Funding application 
3: Has risk of errors for ORCID
generalPublicEveryonex 2O

Needs to be defined - or where could be derived? From funding decision?


mbox_contributor

E-mail addressemail2 / 3 (depending if person has allowed sharing)generalPublicEveryonex 2M

name_contributor

Name of the contact personString2 (from ORCID) / 3generalPublicEveryonex 2M

role_contributos

Type of role / e.g. Work package leader / Data controller / Principle investigator / Author of data setString2 / 3 generalPublicEveryonexMUse case for AI search from funding proposal?)

organization_contributor

Organization of contributing researcher

ROR
2 (from ORCID) / or 3 RORgeneralPublicEveryone
 2OThis has its own attributes (ROR) Should there be its own field for organizations (ROR???)

A. ma DMP structure

B.
Name

C.
Description

D.
Type

E. Interoperability from data source 
1=automate;
2=DO;
3=manual

F.  
General for EU, EOSC, International / National / Organizational / Funder specific

G.
Public /
Closed 

H.
Who needs this field? 

I.
Is this in the RDA maDMP standard
(x=yes)

J. Phase 
1=Planning
2=Applied
3=Granted
4=Mandatory update
5=Final reporting 

K. 
Mandatory (M) / Voluntary (O)

L. 
Use case
Example questions
Comments

Funding#_Nested Data Structure if many funding sources for a large research program unless defined that DMP relates to single grant decision








Does this information create any requirements for the DMP

funder_idFunder id / e.g. Registry number of associated project Y-tunnus / Business IDNested Data Structure2: ROR API via search option
3
generalPublicSystemx1MThere can be many of these (or none???)

funderFunderNested Data Structure2generalPublicEveryone
1O

from Funding id


funding_submission_dl
 Date

2: select funding (Akatemia)

3


  
1O



funding_decision_expected
 Date

2: select funding (Akatemia)

3


  
1O

Either known / estimated


funding_statusPhase of project life cycle: Planned, Applied, Granted, Rejected Term from Controlled Vocabulory(nice to have feature: automatically derived information from grant ID the project is applied/granted)generalPublicEveryonex1

from Funding id

(If granted then public???)


grant_idGrant ID of the associated projectNested Data Structure2 if DOI (not currently)
3
general

x3
from Funding id

start_fundFunding (Project) startDate2



3
 from Funding id

end_fundFunding (Project) endDate2



3
 from Funding id

duration_fundFunding (Project) durationRange2



3
derived from Funding id (in order to search databases)

ror_fundResponsible organization for funding application, RORNested Data Structure2: ROR API via search option
3: manually select from list




1
from Funding id;
ror, if there is a big consortium, would it be worth making WP-specific DMPs to make them actually work
Cost # list all cost object categories 









currency_codeAllowed values defined by ISO 4217.
Note: Default is EUR or could this be linked to Funder_Id? 
Term from Controlled Vocabulory3 / 2 (from grant_id)


x3M
 description_costDescription of costs
Note: Could this be linked to Grant ID for description of applied/granted budget?
String3 / 2 (from grant_id / application)


x3OIf needed
 title_costTitle of costs
Note: Could this be linked to Grant ID for title of applied/granted budget?
String3 / 2 (from grant_id / application)


x3M
 value_costValue of costs
Note1: Could this be linked to Grant ID for applied/granted budget?
Note2: Link with DMP / cost_dmp
Number3 / 2 (from grant_id / application)


x3M
  sum_cost_dmp 
1



3

Group 2: Dataset, Distribution 







Define what is dataset (RDA Standards defines dataset is rawdata. Can we use this? If multiple data sets need to be defined, what is the distinctive feature needed in DMP? (ownership, source, source data, raw data, repository for workflow, data product, reference data, master data, models & code) - And what needs to be defined in DMP Data model Glossary - see RDA standards explanations https://github.com/RDA-DMP-Common/RDA-DMP-Common-Standard?tab=readme-ov-file

A. ma DMP structure

B.
Name

C.
Description

D.
Type

E. Interoperability from data source 
1=automate;
2=DO;
3=manual

F.  
General for EU, EOSC, International / National / Organizational / Funder specific

G.
Public /
Closed 

H.
Who needs this field? 

I.
Is this in the RDA maDMP standard
(x=yes)

J. Phase 
1=Planning
2=Applied
3=Granted
4=Mandatory update
5=Final reporting 

 

K. 
Mandatory (M) / Voluntary (O)

L. 
Use case
Example questions
Comments

Dataset #_Nested Data Structure if many datasets are used. Use dataset_id's to indicate ???DMP has "dataset" association that can relate to many datasets. Each data set can have multiple files/distributions.







Relationships to 1..* datasets should be defined at DMP level. DMP can contain multiple datasets.


dataset_idDataset ID, Identifier for a datasetDOI, PID, URN, URL, handle, ark, other1 / 3generalPublicRPO (reseach performing organization), Repositories, Data cataloguesx2M (When published)

DOI, PID, URN, URL, handle, ark, other


Dataset may not exist when DMP is defined. DMP tool should provide temporary ID before dataset gets PID by some way.


Identifier should be typed.

 title_datasetData set title / nameString2generalPublicSame as above, but humans need this instead of machinesx
MThere can be many data sets, the information is related to one entity. A so-called metax entity, i.e. one must be able to express a wide variety of entities that then have attributes
TO be checkedtype_datasetData set type (indication interview, questionnaire, photos, video, measurement, samples, simulation, code)Partly (Controller vocabulary and "Other" option)2 / 3generalPublicData Catalogues, Repositories, RPOsx
M

Associated with a single dataset, can include ready-made options, but also an open text field. What is the correct granularity level here? Resource intensity can affect the needs of the description. In general, it is instructed to describe so that the attribute applies to the entire dataset. By describing just one data set, it would be possible to create a so-called data set. light-DMP. This is an important option to keep.

Need here some sort of defined and shared vocabulary on "data set types".

RDA Commons points to DataCite and Coar, but neither feel enough by themselves. Should do national type list based on those, but enhanced to give perhaps subtypes.


format_datasetDescription of used dataset formats during the active research. For example database, csv, xml, json.

(Format of the dataset to be used. - Format of the datasets to be published / distributed after project is different)
Yes2General



O

Relates to one data set 

How does this relate to other outputs than datasets like code? Or code that is close related to data usability, e.g. link or PID?

Format vs. Type? What is the difference.

File format should be in distribution, not here.

Is this a duplicate to what is now also planned in Security and Privacy segment? - This is RDA standard attribute at data set level. Below in Serucity & Privacy information is at project level.personal_data (exists already under security and privacy)Whether the dataset contains personal dataBoolean 3 - Needs human in the loopgeneralClosedData protection officers, RPOs (data protection/management experts), repositoryx
M

Associated with a single dataset, is this personal data the data of the data providers or of the target data? What is the role of individuals? Yes or No / Yes or No or Unknown

We think that "Unknown" is not an option here.

Type of personal data will be in its own section.

Can trigger automatic data protection processes.

Is this a duplicate to what is now also planned in Security and Privacy segment? - same as above sensitive_data

Is data subject to the Act on the Secondary Use of Data?

Should this phrasing include "Are there legal restrictions that apply to using this data, e.g. military use, commercial restrictions, endangered species?" 

Boolean 3 - Needs human in the loopgeneralClosedData protection officers, RPOs (data protection/management experts), repositoryx  M

Related to the dataset, how can we ensure that this is not asked except when it is likely?
Yes or No / Yes or No or Unknown

Dual use and import controls?

This should be yes/no. Then if we need more information on, why it is sensitive, it should be its own property. How is this vs. confidentiality in security &privacy? 

I guess that in dataset we need to know if there is sensitive/confidential information or not. That triggers then more questions in security & privacy section.


data_sharing_issuesHow any legal and ethical issues related to the sharing of data (e.g. ownership, copyright, sensitivity) will be resolvedString
National 
National DMP template, Organization, PI

M
Optionaldata_sharing_contracts Boolean


 

 
Optionaldata_sharing_ownership Boolean


 

 
Optionaldata_sharing_copyright Boolean


 

 
Optionaldata_sharing_sensitivity Boolean


 

 
New fields emergingdata_sharing_other 



 

 

storage_active_dataWhere will the data be stored during the project?URN from CSC Service Catalogue & list presented by organization, if something else, what?
National  ?National DMP template

MRelates to a dataset, extra-important if data subject to the Act on the Secondary Use of Data
If no distribution but metadata availabledata_landing_pageGive the link / PID to landing page of datalink / PID
 
 


 

data_usersWith whom will the data be shared during the project?Open, Inside Europe, Inside research consortium, Inside organization, complex structure
 
 


Refers to the technical solutions, will a DPA be needed? Is joint controller agreement, NDAs etc. already elsewhere? Or does this refer to the consortium projects?

shareage_solutionHow the data will be shared during the project? Define technical?Choose from Service catalog
 
 


 

backup_dataHow data will be backed up during the project? To be planned by the researcher or organization specific solutions?String
Organizationl  ?National DMP template

M 

application_process_dataWhat applications are used to process data?
2 (Use case for AI to extract information from DMP to pre-fill available information fromGeneral, National & Organizational



OAffects the choice of storage environment (e.g. whether the video is only available for viewing or whether it needs to be available at the file level in an analysis program)

computing_environmentsWhich computing environments are needed for research?Controlled list CSC Service Catalogue &  organization services
National & Organizational



ORelates to data set
 computing_capacity_CPUHow much core hours for computing capacity is required in CPU?Value 
 



 
 computing_capacity_GPUHow much core hours for computing capacity is required in GPU?Value 
 



 

  

 


 

    
Feedback to RDA standard in Github & suggested to be left out from maDMPstandarddescription_datasetDescriptionStringFIll in from this onwardsgeneralPublicRepository, Data catalogues, CRIS?x  O

Needs some kind of guidance on what level of description is needed. Need for space limitatation?

We already have name for the dataset. How much more description we want/need at this point?

We should not ask these in DMP. They are about publishind and metadata. If somebody wants to combine DMP and CRIS, this information needs to be interoperable, but this is NOT part of DMP. Same holds true for all red rows.

Feedback to RDA standarddistribution_datasetTechnical information on a specific instance of data

Nested Data Structure

Could be defined vocabulary


general

x This might need more clarification, as it relates to resources/infra needed.
Feedback to RDA standardissued_datasetDate of dataset been issuedDate
general

x  O
Feedback to RDA standardkeyword_datasetKeywordString / Term from controlled vocabulory
general

x  OShould be asked only when data is opened/catalogued.
Feedback to RDA standardlanguage_datasetLanguage of the dataset expressed using ISO 639-3Term from controlled vocabulory
general

x  O
Feedback to RDA standardmetadata_datasetDescribe metadata standards usedNested Data Structure
general

x  =
Move to data life cyclepreservation_statementPreservation StatementString
general

x  Should be in life cycle and then structured.
Move to security & privacysecurity_privacyTo list all issues and requirements related to security and privacyTerms from controlled vocabulary3 (from organisational list)generalclosed
x  These sound like reports compiled based on DMP
Technical resource#_Nested Data Structure if many datasets are used. Use dataset_id's to indicate ???









 name_technical_resourceName of the technical resourceString



   
 description_technical_resourceTo list all technical resources needed String3 / 2  (from organisational or national list)generalclosed
x  These sound like reports compiled based on DMP. So if you have confidential data, DMP compiles list of local services that CAN be used.

reuse_datasetIs previously collected data reused in this project
(Whether the data is collected, created or comes from elsewhere)
Boolean

PublicNational DMP template


Relates to one data set (does it? Or does it relate to whole research in this meaning)

Reuse of data is also information funders require. Also important here is the terms of use to the data.

If you're using data that is already published and is based on other data? So is this property of dataset or is this property of research?


source_datasetData sourcePartly (pid if possible, string if not)





O

Relates to one data set, can include ready-made options, but also an open text field

Referencing can be really confusing. You can use data obtained from Twitter. Or dataset that somebody else compiled from Twitter... What do you reference here? Or do you make derivate dataset based on already existing dataset that is compiled from twitter? 


data_collectedData collected for this project



National DMP template




data_producedData produced as an outcome of the project



National DMP template




estimate_datasize

Give a rough estimate of the size of the data produced/collected 

Range


National DMP template


Relates to one data set

How does this differ from dataset size from distribution?


data_resource_estimateProject data magnitudeYesAdd DMP datasets & estimated size





???

data_quality_assuranceData quality AssuranceString
general

x  

Is this even necessary in modern DMP? This is like asking "do you do your science properly?"

Could have some guiding meaning to wake up researchers to see what all is possible?

Could there be lists connected to datatype (so it should offer relevant lists)?

Our experts can help on risk management related to security, privacy, storage etc. But can we say anything relevant on quality control? Do we need to?

What is use case here?!? Is only case educating the researcher if he is ignorant on his methodology?


method_quality_assuranceWays of quality assuranceList:
TAU list as an example, should make a national, up-to-date list

national



oAlso, something else, to be specifiec. Could these come according to the discipline?

process_quality_assuranceIs there a process for quality assurance applied?

Note: Project not organization specific questions in DMP
Boolean
national / organizational ?



oIs this necessary? More organization specific than DMP standard?

A. ma DMP structure

B.
Name

C.
Description

D.
Type

E. Interoperability from data source 
1=automate;
2=DO;
3=manual

F.  
General for EU, EOSC, International / National / Organizational / Funder specific

G.
Public /
Closed 

H.
Who needs this field? 

I.
Is this in the RDA maDMP standard
(x=yes)

 J. Phase 1=Planning
2=Applied
3=Granted
4=Mandatory update
5=Final reporting 

K. 
Mandatory (M) / Voluntary (O)

L. 
Use case
Example questions
Comments

Distribution (Preservation and sharing of material during the project) - OPTIONALaccess_urlA URL of the resource that gives access to a distribution of the dataset. e.g. landing page.URI
general

x
O

In case of DMP you should use these to describe active use of the data. Others should be in life-cycle.



title_datasetTitle is a property in both Dataset and Distribution, in compliance with W3C DCAT. In some cases these might be identical, but in most cases the Dataset represents a more abstract concept, while the distribution can point to a specific file.

general

x
M

available_until_urlIndicates how long this distribution will be/ should be available. Encoded using the relevant ISO 8601 Date and Time compliant stringDate
general

x
O

byte_size

Byte Size (In RDA this is a number, CSC proposes a category list:
S: < 10 TB, M: 10-50 TB , L: 50-100TB, XL: 100-200 TB, XXL: > 200 TB
Number or  Size Category: S, M, L, XL, XXL
general

x
M

E.g. 
S < 10 TB, 10 <= M < 50 TB, 50 < L <= 100 TB, 
100 < XL <= 200 TB, 200 < XXL

Important as it affects what all tools are available.


data_accessIndicates access mode for data and data sharing.
Allowed Values:
  • open
  • shared
  • closed
Term from Controlled Vocabulary
general

x

This can change during the study. First I use it 3 years as closed, then I open it. Should here be what I want to do after the active use or what happens right now? → Should be the current publication status of the distribution. Dataset lifecycle documents the plan for the dataset.

description_distributionDescription is a property in both Dataset and Distribution, in compliance with W3C DCAT. In some cases these might be identical, but in most cases the Dataset represents a more abstract concept, while the distribution can point to a specific file.String
general

x
O

download_urlThe URL of the downloadable file in a given format. E.g. CSV file or RDF file.

general

x
O

formatFormat according to: https://www.iana.org/assignments/media-types/media-types.xhtml if appropriate, otherwise use the common name for this format

general

x
M
This should be at dataset level → If not distribution spesificlicense_datasetList all licenses applied to a specific distribution of data.

general

x
O
Is this same as access_url if dataset is published???host_datasetTo provide information on quality of service provided by infrastructure (e.g. repository) where data is stored. Service URNURN
general

x
OOutside the own organization? Same as location_open_data in lifecycle

A. ma DMP structure

B.
Name

C.
Description

D.
Type

E. Interoperability from data source 
1=automate;
2=DO;
3=manual

F.  
General for EU, EOSC, International / National / Organizational / Funder specific

G.
Public /
Closed 

H.
Who needs this field? 

I.
Is this in the RDA maDMP standard
(x=yes)

 J. Phase 1=Planning
2=Applied
3=Granted
4=Mandatory update
5=Final reporting 

K. 
Mandatory (M) / Voluntary (O)

L. 
Use case
Example questions
Comments

Dataset life cycle

(extension to Dataset)

  







What is the difference of distribution and data life cycle??? Should we have data set life cycle?


archiving_services_dataAre archiving services or long term preservation for data needed?BooleanHow to determine the value of data?National



M

Relates to data set

Is this long-term storage, e.g. 20 in Zenodo, archiving in institutional archive or something else?
PAS merged here. The storage word is problematic, clarify further if needed


data_close_justificationIf the project does not collect or produce any data fully or partially suitable for reuse, justify why the data cannot be made available even partially.String
National
National DMP template

OThis is mandatory if data is closed. Should there be dataset level field for dataset publication (open / closed) ?

location_open_dataWhere will the data be opened? Controlled list of data repositoriesSpecial requirements for data repositories for preliminary data?National National DMP template

O

FSD comments: It is essential for the repository/archive to know (in the case of research projects that have received a positive funding decision) what kind of data are planned to be opened in the repository/archive and by whom.

Covered under distribution maybe?

This field responds also to requirement of National DMP template on: where the data or a publishable portion of them will be made available after the end of the project


Moved here from Securitylength_storage_data How long the data is stored for the original research purpose

National




Relates to dataset, original purpose

deletion_dataHow is data deleted/destroyed?Controlled list, including option of not deleting data and explanation as to why.
National



MCould be specified that this relates to unpublished data. Or data that are mentioned to be shared e.g. for 5 or 10 years, etc.

deletion_when_dataWhen is data deleted/destroyed?Date (Also freetext if not known, end of project etc?) 




OCould be specified that this relates to unpublished data.

see above: 
archiving_services_data
Will the data be archived? - (Overlapping with question: Are archiving services for data needed?)









archiving_when_dataWhen to archive?Date. Same comment as above.
National



OActive data can be deleted and archived at the same time

archiving_location_dataWhere to archive?Controlled list
National



O
  
Group 3: Security, privacy, technical resources and metadata

 

A. ma DMP structure

B.
Name

C.
Description

D.
Type

E. Interoperability from data source 
1=automate;
2=DO;
3=manual

F.  
General for EU, EOSC, International / National / Organizational / Funder specific

G.
Public /
Closed 

H.
Who needs this field? 

I.
Is this in the RDA maDMP standard
(x=yes)

 J. Phase 1=Planning
2=Applied
3=Granted
4=Mandatory update
5=Final reporting 

K. 
Mandatory (M) / Voluntary (O)

L. 
Use case
Example questions
Comments

Security and privacy (Legislation)Note: Security and privacy needs special attention to meet national context in Finland to benefit from ma-features - Green as is in RDA standard. Please make needed clarifications in BLUE. 

 
 title_securityTitle of security measures - ?String
general
 x   
 description_security_privacyDescription of security and privacy measuresControlled Vocabulary
general
   x
 National additions in Finland: 



     

confidentialityDoes the data contain confidential information
(EU definition (question) , law (julkisuuslaki); agreements incl trade secrets - classification from governmental bodies)

Boolean / Term from Controlled Vocabulary:
Confidential, Classified 

DMP, 3EU/national security levels (restricted, confidential, secret, top secret / käyttörajoitettu, luottamuksellinen, salainen, erittäin salainen)ClosedOrganisation, open science services   

May contain a "yes" condition, after which it is indicated which datasets this relates to. Confidential, business secrets, sensitive geospatial data, sensitive biodiversity data, national security, trade secrets. Dataset-specific.
Yes / No / Unknown

Comment:  a joined classification for security levels. EU security levels here as an example.

This is in RDA standard dataset specific attribute. This could be derived variable from dataset specific questions to project level. If information is needed as well at project level.personal_data 

Does the research handle personal data for research purposes? 

  • If yes → Triggers Risk assessment process &
  • DPIA process
  • Ethical review process

Documented 

Boolean

2 / 3
Closed




Optional Part of DPIA process

dpia


Should DPIA be done?

 

BooleanRequirement comes from the law

Voluntary question

Closed



Good question, should this be in the DMP at all. In this context, it is also possible to make a real assessment of whether it is being done. However, privacy information should be structured and compatible so that you can ask for it here if you wish. This should be optional in the sense that you don't accidentally ask twice if this triggers another process

Comment: pre-DPIA usually executed to see if a full DPIA is necessary. 

Optional Part of DPIA processdpia

Is there a need for a DPIA, has risk assessment been filled in?

(risk assessment/pre-dpia, selftest if DMP is needed)

 

  

Closed






 

dpia_id

If DPIA exist give URI / DOI

URI  Closed 


 
if other risk assessments doneriskassessment_id

give URI / DOI

URI/PID  
Closed



 

conditional if yes in personal_data; if DPIA exists point out which information exists for direct data retrieval: + date





privacy-notice

Is there a need for a privacy notice?
   Data protection team




Date & title for privacy notice, data transfer agreements

privacy_notice_id

If privacy notice exist give link / archive number
    


Use case maDMP could transfer the number / link of the privacy notice to data protection team when it is been done to indicate the status. 

personal_data_list

What personal data do you process Controlled listTAU's list as an example, myCSC can also be found,
 Closed 


These should really be asked one by one, to reach MA. So all options as Booleans. 
IF data privacy notice exists / risk assessment this is not asked / but could be filled in automatically; personal_data_sp_categoryWhat special categories of personal data do you process / StringRequirement comes from the law
Closed



Categories of special categories of personal data

personal_data_sp_category_racial_or_ethnic origin

 Boolean 3 / 1
Closed



 

political opinions

 Boolean 
Closed



 

religion or philosophical beliefs

 Boolean 
Closed



 

trade union membership

 Boolean 
Closed



 

data concerning health

 Boolean 
Closed



 

sexual orientation or activity and

 Boolean 
Closed



 

genetic and biometric data for identifying the person.

 Boolean

Closed



 

other_personal_data_sp_category


String 
Closed



 

#if many data controllers then list them here 







 

DCdata_controllerWho is the data controllerString / ORCID / ROR
National - (for indication on good data management in project) 




If a person, we can return back to identity issues?

Should be divided into fields like "controller_person/controller_organisation" → String/Boolean

ROR for organisations


data_prosessing_basisBasis for data processingString
National






data_prosessing_sp_categoryBasis for processing special categories of personal dataString
National - (for enabling to provide optimal services)






personal_data_transfer_outside_EUWhether personal data is transferred outside the EUBoolean
National - (for enabling to provide optimal services)






personal_data_transfer_countryTo which countries personal data is transferredString
National - (for enabling to provide optimal services)






personal_data_external_processorsAre there external processorsBoolean
Local (Responsibility for organization)






personal_data_minimizedHow is the processing of personal data minimized?String
Local (Responsibility for organization)




Anonymization, pseudonymization, removal of direct identifiers..., dataset-specific?
IN RDA Standard Ethical issues are part of DMP domain - FI pilot considers grounds for separate category or merge with license & add user rights:
RIghts, ethics & license  





  
Ethical issuesethical_issues_existTo indicate whether there are ethical issues related to data that this DMP describes.
Allowed Values:
  • yes
  • no
  • unknown
List
generalClosed
x

This is an important trigger because then the DMP must be very good

ethical_issues_reportRDA: "To indicate where a protocol from a meeting with an ethical commitee can be found" or direct ID to report

URI



generalClosed
x

Add link

Comment: Date when the decision was made


ethical_issues_descriptionDescribe ethical issues directly in a DMPString
generalClosed
x



research_permitWhether permission is required to collect data in research data setBoolean






Actual research permit


A. ma DMP structure

B.
Name

C.
Description

D.
Type

E. Interoperability from data source 
1=automate;
2=DO;
3=manual

F.  
General for EU, EOSC, International / National / Organizational / Funder specific

G.
Public /
Closed 

H.
Who needs this field? 

I.
Is this in the RDA maDMP standard
(x=yes)

J. Phase 1=Planning
2=Applied
3=Granted
4=Mandatory update
5=Final reporting 

K. 
Mandatory (M) / Voluntary (O)

L. 
Use case
Example questions
Comments

Rights related to dataownership_data_rightWho owns the data/rights related to the data?Person / Organizationnational




Person or organization? Dataset-specific? The organisation can be a research organisation, a customer organisation or an organisation that otherwise only owns the data (e.g. an archive)

ipr_copyright Is there IPR or copyright issuesBoolean
national






agreements_data_rightWhat agreements are needed related to the rights to the material?String
national




Are data right agreements included here?

agreementsWhat other agreements are needed?String






 
Licenselicense_refLink to license document.URI
general



xDataset-specific - What kind of license is granted for the use of data
https://creativecommons.org/licenses/by/4.0/

start_dateIf date is set in the future, it indicates embargo period. Date
general



x

A. ma DMP structure

B.
Name

C.
Description

D.
Type

E. Interoperability from data source 
1=automate;
2=DO;
3=manual

F.  
General for EU, EOSC, International / National / Organizational / Funder specific

G.
Public /
Closed 

H.
Who needs this field? 

I.
Is this in the RDA maDMP standard
(x=yes)

 J. Phase 1=Planning
2=Applied
3=Granted
4=Mandatory update
5=Final reporting 

K. 
Mandatory (M) / Voluntary (O)

L. 
Use case
Example questions
Comments

Metadata schemaIs the data built according to a specific schema?Boolean
national




Relates to data set

Ideally, metadata from existing datasets could be imported directly from e.g. Zenodo. Also, metadata could be brought in for any datasets published in the project. Info flow might be easiest this way around rather than from DMP API to repository.


descriptionDescriptionString



x

provides taxonomy for...

languageLanguage of the metadata expressed using ISO 639-3Term from Controlled Vocabulary
national

x



metadata_standard_idName this schema/link to schemaPID / other id
general

x

Relates to data set

identifier
String



x

http://www.dublincore.org/specifications/dublin-core/dcmi-terms/

typeIdentifier type. Allowed values: URL, OtherTerm from controlled vocabulory



x

URL

vocabulary_linkAre vocabularies also used?Term from controlled vocabulory3national






format_documentationWhat is the format of the documentation?List3Local ?




something else, what

location_documentationWhere is the documentation?String/URL3Local ?






generated_documentationIs documentation generated automatically?Boolean3Local ?






access_documentationCan the the documentation be accessed?Boolean3Local ?






publish_methodologyHas the methodology/workflow been published somewhere?Boolean3Local ?




registration of research?

workflowIs the workflow described?Boolean3Local ?



OEspecially important in the case of large datasets, from which the data itself cannot be preserved, but is produced again if necessary

description_documentationWhat does the documentation consist of?String3Local ?



OWorkflow, variable description, … ?

metadata_purposeWhat is the basic metadata described for?String3Local ?



OQvain, own CRIS, something else?

metadata_openOpen the basic metadata?Boolean3Local ?



O

metadata_locationWhere is the basic metadata openned?String /URL3Local ?





Securitydescription_securityDescription of security measuresControlled List 3
Closed
x


 title_securityName of the technical resourceString3
Closed
x


Is separate role to data controller needed in DMP?access_control_idWho is responsible for access control? Person / Role?ORCID
National





Is separate role to data controller needed in DMP?access_control_nameWho is responsible for access control?String 
 




 

protection_level_dataWhat is the required level of data protection?Controlled List 
 




Relates to Data Protection Level of the data set: 
Open access, Restricted access, Restricted access & controlled use; Restricted access & restricted use
Is separate role to data controller needed in DMP? Is this organization or project specific?security_officerInformation Security OfficerString derived from the storage locationLocal

















Further comments from National DMP template template :





Checked against Responsibilities from Academy templates (6.1) & (6.2)





→ For service design workshop in MarchList all steps where responsibilities need to be definedPerson / Role?

















































  • No labels