NOTE: This is the active version of the national maDMP reference model.
Towards structural maDMP template
The target of this workshop is to make progress in maDMP reference data model. Slides
Below is the structure of the RDA standard, and the elements from RDA standard are in the table below marked in GREEN and there is also a column "Is this in the RDA template?" indicating that the data element is in RDA standard.
The elements derived from our national workshop consisting of questions to DMP are marked in BLUE.
The sections are grouped according to the RDA standard:
DMP, Project, Contact, Contributor, Funding, Cost, Dataset, Distribution, Host, License, Security and Privacy, Technical Resource, Metadata.
In addition, we have noted in previous workshop that it would be important to cover Data life cycle and Ethics. These can be incorporated to RDA maDMP datamodel or suggested as additional sections. Note: This is to be discussed in the webinar 16th Dec. (Other IDs to consider: RAID)
Guidance to way of working:
Target & Focus:
- Our focus is to prepare a reference data model, not a DMP template.
- Thus we focus to the information required - not to the questions i.e. how the information should be asked. We can make later examples on this. But in this workshop we focus to the data model itself.
- Review the contents of your group area.
- Make further clarification, and cross checks.
- Improve naming of the elements if needed, whilst respect the RDA standard original naming.
Indicate what information is needed to provide or launch machine-actionability & which fields can be used automatically via digital object, or with AI e.g. extracting information from an existing source.
Link to ontologies, data spaces and repositories and other relevant sources of information when you notice gaps of usage of available auxiliary information.
- Identify the purpose and user of the data elements.
Discuss also in the group which national additions should be mandatory / strongly recommended in addition to RDA standard?
- IF you cannot solve some data element, mark it with bold & question mark RED???
Documentation:
- All notions are document the work into the table below.
- Do not create any other solutions for documenting the work.
- Aim to write in English.
Reference:
You can refer back to the RDA maDMP data model which is the core, but we can make suggestions for developing its machine-actionability. In addition, we can add relevant DMP fields to national context. But note that this is general data model - not containing scientific discipline specific information.
https://github.com/RDA-DMP-Common/RDA-DMP-Common-Standard
OSTrails Plan-Track-Assess Pathways:https://zenodo.org/records/13145788
For use cases you can refer to:
Marttila, J., Manninen, S., Ahokas, M., Hindersson-Söderholm, T., Keckman-Koivuniemi. H. (2022). Dynaamiset DMP:t -työryhmän loppuraportti. https://zenodo.org/records/6601258
Marttila, J. & Manninen, S. (2022). Dynaamiset DMP:t -työryhmän toivekartoitus. https://zenodo.org/records/6594597
Some explanations to column headings below:
E. Interoperability from data source (DMP or other system)
Does the field provide or launch interoperability for maDMPs and other systems?
1= Yes; 2=Uses DO as source (automatically filled; 3=Manually filled in or propose a DO
F. General, national or organizational?
General for EU / EOSC / International needs; National recommended practises in Finland; Organizational relates to organization specific practices
G. Public / disclosed Organisation's internal information or public?
Justify why information needs to be disclosed within project/organization?
H. Who needs this field
Indicate purpose of the information if suggested to be added on RDA standard (minimum)
I. Is this in RDA ma DMP Standards
Indicate whether field is in core maDMP standard
J. Section in RDA maDMP standard
Add section if in a separate section than in national reference data model
K. Additional information:
Define use case, give example of question or comment
For string fields suggest the max char size
Focus on what information is already elsewhere is seccondary unless beneficial to be part of the maDMP. Use case needs to be defined.
National suggested fields are in BLUE; RDA Standard fields in GREEN.
Which suggested fields are MUST have, SHOULD or COULD have?
A. ma DMP structure | B. | C. | D. | E. Interoperability from data source | F. | G. | H. | I. | J. Phase | K. | L. |
|---|---|---|---|---|---|---|---|---|---|---|---|
| DMP | dmp_id | DMP id | Nested Data Structure | 1 | general | Public | System for interoperability | x | 1 | M | Request id for DMP Where does this originate from, especially if using different tools/systems for DMPs? |
| type_dmp | A description on what kind of DMP to do | Controlled vocabulary: | 3 | Organisational / national / international | Public | System for appriopriate DMP template | 1 | M | Input formula should be later updated or extended to a richer format. Input profiles: for example: (Define national typology for recommended use of DMPs (light, detailed), key issues personal data, confidentiality of information, resource intensity, number of actors (outsiders)) | ||
| title_dmp | Title of a DMP | String | 3 | general | Public | User | x | 1 | M | Max 100 char | |
| (points to cost section) | cost_dmp | Give an estimation or aggregated sum of costs related to data management, and in multiple instances break down costs into details. | Nested Data Structure | 2 | general | Closed (see comment) | Funder | x | 2 | → further development: this could be linked to the budget with grant id. What costs included? Needs clear guidelines Closed early in the process, but depends if the DMP will be actively made public / "published" at some point during the data life-cycle | |
| created_dmp | Date and time of first version of a DMP | DateTime | 1 (system) | general | Public | Organization | x | 1 | M | System recorded | |
| modified_dmp | Must be set each time DMP is modified. Indicates DMP version. Encoded using the relevant ISO 8601 Date and Time compliant string | DateTime | 1 (system) | general | Public | Organization | x | 2 | M | System recorded | |
| nextreview_dmp | Next review date to update DMP | DateTime | 2 / 3 | Organisational / Funder specific | Public | Organization, PI | 3 | O | Research project benefits of timing the update of DMP, and Data Support can better plan the assistance. Suggested to be added for making dmp alive and updated e.g. for reporting purposes | ||
| dataset | To describe data on a general level. | Nested Data Structure | 3 | general | Public | Organization, PI | x | 1 | M | At least one dataset should be defined. See "Dataset" in the table. | |
| datalifecycle | To describe data lifecycle on a general level. | Nested Data Structure | 3 | general | Public | Organization, PI | 1 | O | |||
| Group 1: Project, Contact, Contributor, Funding & Cost | |||||||||||
A. ma DMP structure | B. | C. | D. | E. Interoperability from data source | F. | G. | H. | I. | J. Phase 1=Planning | K. | L. |
| Project | title_project | Name/Title of the project | String | 3 | general | Public | Everyone | x | 1 | M | If project information is not yet available anywhere, how much should be produced here? Is it possible to have multiple DMPs for one project or a maDMP without a funder or project? |
| project_id | Project identifier | Nested Data Structure | 2 | general | Public | Everyone | x | 1 | M | RAiD: https://raid.org/ | |
| start_project | Project start date. | Date | 3 (Can trigger update process e.g. after 3-6 months after start) | general | Public | Everyone | x | 1 | M | Encoded using relevant ISO Date and time compliant string | |
| end_project | Project end date. | Date | 3 (Can trigger update process & reporting stage) | general | Public | Everyone | x | 1 | O | Encoded using relevant ISO Date and time compliant string If DMP is used for continuous process no end date is required | |
| duration | Project duration. | Time yy-mm | 1 | 2 | O | derived | |||||
| description_project | Project short description | String | 1 (project_id links to long description) otherwise 3 | general | Public | Everyone | x | 1 | M | Short description e.g. max char 2000; include link to project plan if needing (project id field links to the longer description) Where will the master copy of this information be filled? (from RAiD, but are there alternatives???) | |
| discipline | Scientific discipline of project | UNESCO science classification
| 2 / 3 | general | Public | Everyone | 1 | O | 2 if Analytics / AI can be used to suggest based on ORCID, Project_ID or Description 3 if need to be added by researcher This can be used to guide instructions and lists of what is available | ||
| funding_project | Funding related with a project | Nested Data Structure | 2 (Derived from Funding status & Grant_id) | general | Public | Everyone | x | 3 | O | Public after publishing the grant. | |
| Contact | contact_id | ORCID of Contact person for a DMP / Principal (responsible) researcher | Orcid | 1 | general | Public | Everyone | x | 1 | O | This has its own attributes (name, orcid, contact) If Contact person and Principle researcher are different need to separate? |
| mbox | E-mail address | 3 | general | Public | Everyone | x | 1 | M | from orcid | ||
| name | Name of the contact person / principal researcher | String | 2 (from ORCID) / 3 | general | Public | Everyone | x | 1 | M | from orcid | |
| Contributor | #_Nested Data Structure if many contributors (and data controllers) | ||||||||||
contributor_id | Contributor ORCID | ORCID | 2: Digital authentication e.g. by e-mail Contributor will add their ORCID or from Funding application 3: Has risk of errors for ORCID | general | Public | Everyone | x | 2 | O | Needs to be defined - or where could be derived? From funding decision? | |
mbox_contributor | E-mail address | 2 / 3 (depending if person has allowed sharing) | general | Public | Everyone | x | 2 | M | |||
name_contributor | Name of the contact person | String | 2 (from ORCID) / 3 | general | Public | Everyone | x | 2 | M | ||
role_contributor | Type of role / e.g. Work package leader / Data controller / Principle investigator / Author of data set | Controlled list | 2 / 3 | general | Public | Everyone | x | 2 | M | Data controller is required for research data services Use case for AI search from funding proposal by roles | |
organization_contributor | Organization of contributing researcher | String | 2 (from ORCID/ROR) / or 3 | general | Public | Everyone | 2 | O | If ROR exists this can be derived from ROR | ||
ROR_contributor | ROR of organization of contributing researcher | ROR | 3 | general | Public | Everyone | 1 | O | This has its own attributes (ROR) | ||
A. ma DMP structure | B. | C. | D. | E. Interoperability from data source | F. | G. | H. | I. | J. Phase | K. | L. |
| Funding | #_Nested Data Structure if many funding sources for a large research program unless defined that DMP relates to single grant decision | Does this information create any requirements for the DMP | |||||||||
| funder_id | Funder id / e.g. Registry number of associated project Y-tunnus / Business ID | Nested Data Structure | 2: ROR API via search option 3 | general | Public | System | x | 1 | M | Nested structure used if there are many of these. Field is empty if none | |
| funder | Funder name | Nested Data Structure | 2 | general | Public | Everyone | 1 | O | from Funding id | ||
| funding_submission_dl | Deadline for funding submission | Date | 2: select funding (Akatemia) 3 | 1 | O | ||||||
| funding_decision_expected | Expected date for funding decision | Date | 2: select funding (Akatemia) 3 | 1 | O | Either known / estimated | |||||
| funding_status | Phase of project life cycle: Planned, Applied, Granted, Rejected | Term from Controlled Vocabulory | 3 (nice to have feature: automatically derived information from grant ID the project is applied/granted) | general | Public | Everyone | x | 1-5 | O | from Funding id | |
| grant_id | Grant ID of the associated project | Nested Data Structure | 2 if DOI (not currently) 3 | general | Public | Everyone | x | 3 | M | M if exists | |
| start_fund | Funding (Project) start | Date | 2 | general | Public | Everyone | 3 | O | from Grant id | ||
| end_fund | Funding (Project) end | Date | 2 | general | Public | Everyone | 3 | O | from Grant id | ||
| duration_fund | Funding (Project) duration | Range | 2 | general | Public | Everyone | 3 | O | derived from Funding id (in order to search databases) | ||
| ror_fund | Responsible organization for funding application, ROR | Nested Data Structure | 3: manually select from list | general | Public | Everyone | 1 | O | ror of PI, If there is a big consortium, would it be worth making WP-specific DMPs to make them actually work | ||
| Cost | # list all cost object categories | ||||||||||
| currency_code | Allowed values defined by ISO 4217. Note: Default is EUR or could this be linked to Funder_Id? | Term from Controlled Vocabulory | 3 / 2 (from grant_id) | general | Closed/Public | Organization | x | 3 | M | from Funder id | |
| description_cost | Description of costs Note: Could this be linked to Grant ID for description of applied/granted budget? | String | 3 / 2 (from grant_id / application) | general | Closed/Public | Organization | x | 3 | O | from Grant id | |
| title_cost | Title of costs Note: Could this be linked to Grant ID for title of applied/granted budget? | String | 3 / 2 (from grant_id / application) | general | Closed/Public | Organization | x | 3 | M | from Grant id | |
| value_cost | Value of costs Note1: Could this be linked to Grant ID for applied/granted budget? Note2: Link with DMP / cost_dmp | Number | 3 / 2 (from grant_id / application) | general | Closed/Public | Organization | x | 3 | M | from Grant id | |
| sum_cost_dmp | Sum of value of costs | Number | 1 | general | Closed/Public | Organization | 3 | M | Automated sum of value_cost if multiple | ||
| Group 2: Dataset, Distribution | |||||||||||
| Define what is dataset (RDA Standards defines dataset is rawdata. Can we use this? If multiple data sets need to be defined, what is the distinctive feature needed in DMP? (ownership, source, source data, raw data, repository for workflow, data product, reference data, master data, models & code) - And what needs to be defined in DMP Data model Glossary - see RDA standards explanations https://github.com/RDA-DMP-Common/RDA-DMP-Common-Standard?tab=readme-ov-file | |||||||||||
A. ma DMP structure | B. | C. | D. | E. Interoperability from data source | F. | G. | H. | I. | J. Phase | K. | L. |
| Dataset | #_Nested Data Structure if many datasets are used. | DMP has "dataset" association that can relate to many datasets. Each data set can have multiple files/distributions. | Relationships to 1..* datasets should be defined at DMP level. DMP can contain multiple datasets. | ||||||||
| dataset_id | Dataset ID, Identifier for a dataset | DOI, PID, URN, URL, handle, ark, other | 1 / 3 | general | Public | RPO (reseach performing organization), Repositories, Data catalogues | x | 2 | M (When published) | DOI, PID, URN, URL, handle, ark, other Dataset may not exist when DMP is defined. DMP tool should provide temporary ID before dataset gets PID by some way. Identifier should be typed. | |
| title_dataset | Data set title / name | String | 2 | general | Public | Same as above, but humans need this instead of machines | x | M | There can be many data sets, the information is related to one entity. A so-called metax entity, i.e. one must be able to express a wide variety of entities that then have attributes | ||
| type_dataset | Data set type (indication interview, questionnaire, photos, video, measurement, samples, simulation, code) | Partly (Controlled vocabulary and "Other" option) | 2 / 3 | general | Public | Data Catalogues, Repositories, RPOs | x | M | Associated with a single dataset, can include ready-made options, but also an open text field. What is the correct granularity level here? Resource intensity can affect the needs of the description. In general, it is instructed to describe so that the attribute applies to the entire dataset. By describing just one data set, it would be possible to create a so-called data set. light-DMP. This is an important option to keep. Need here some sort of defined and shared vocabulary on "data set types". RDA Commons points to DataCite and Coar, but neither feel enough by themselves. Should do national type list based on those, but enhanced to give perhaps subtypes. | ||
| format_dataset | Description of used dataset formats during the active research. For example database, csv, xml, json. (Format of the dataset to be used. - Format of the datasets to be published / distributed after project is different) | Controlled vocabulory | 2 | General | O | Relates to one data set How does this relate to other outputs than datasets like code? Or code that is close related to data usability, e.g. link or PID? Format vs. Type? What is the difference. File format should be in distribution, not here. | |||||
| personal_data (exists already under security and privacy) | Whether the dataset contains personal data | Boolean | 3 - Needs human in the loop | general | Closed | Data protection officers, RPOs (data protection/management experts), repository | x | M | Associated with a single dataset, is this personal data the data of the data providers or of the target data? What is the role of individuals? Yes or No / Yes or No or Unknown We think that "Unknown" is not an option here. Type of personal data will be in its own section. Can trigger automatic data protection processes. | ||
| sensitive_data | Whether there are legal restrictions that apply to using this data, e.g. military use, commercial restrictions, endangered species | Boolean | 3 - Needs human in the loop | general | Closed | Data protection officers, RPOs (data protection/management experts), repository | x | M | Related to the dataset, how can we ensure that this is not asked except when it is likely? Dual use and import controls? This should be yes/no. Then if we need more information on, why it is sensitive, it should be its own property. How is this vs. confidentiality in security &privacy? I guess that in dataset we need to know if there is sensitive/confidential information or not. That triggers then more questions in security & privacy section. | ||
| data_sharing_issues | How any legal and ethical issues related to the sharing of data (e.g. ownership, copyright, sensitivity) will be resolved | String | National | National DMP template, Organization, PI | M | ||||||
| Optional | data_sharing_contracts | Boolean | O | ||||||||
| Optional | data_sharing_ownership | Boolean | O | ||||||||
| Optional | data_sharing_copyright | Boolean | O | ||||||||
| Optional | data_sharing_sensitivity | Boolean | O | ||||||||
| New fields emerging | data_sharing_other ? | ||||||||||
| If no distribution but metadata available | data_landing_page | Give the link / PID to landing page of data | link / PID | ||||||||
| Feedback to RDA standard in Github & suggested to be left out from maDMPstandard | description_dataset | Description | String | FIll in from this onwards | general | Public | Repository, Data catalogues, CRIS? | x | O | Needs some kind of guidance on what level of description is needed. Need for space limitatation? We already have name for the dataset. How much more description we want/need at this point? We should not ask these in DMP. They are about publishind and metadata. If somebody wants to combine DMP and CRIS, this information needs to be interoperable, but this is NOT part of DMP. Same holds true for all red rows. | |
| Feedback to RDA standard | distribution_dataset | Technical information on a specific instance of data | Nested Data Structure Could be defined vocabulary | general | x | O | This might need more clarification, as it relates to resources/infra needed. | ||||
| Feedback to RDA standard | issued_dataset | Date of dataset been issued | Date | general | x | O | |||||
| Feedback to RDA standard | keyword_dataset | Keyword | String / Term from controlled vocabulory | general | x | O | Should be asked only when data is opened/catalogued. | ||||
| Feedback to RDA standard | language_dataset | Language of the dataset expressed using ISO 639-3 | Term from controlled vocabulory | general | x | O | |||||
| Feedback to RDA standard | metadata_dataset | Describe metadata standards used | Nested Data Structure | general | x | = | |||||
| Move to security & privacy | security_privacy | To list all issues and requirements related to security and privacy | Terms from controlled vocabulary | 3 (from organisational list) | general | closed | x | These sound like reports compiled based on DMP | |||
| Technical resource | #_Nested Data Structure if many datasets are used. Use dataset_id's to indicate ??? | ||||||||||
| name_technical_resource | Name of the technical resource | String | 3 | ||||||||
| description_technical_resource | To list all technical resources needed | String | 3 / 2 (from organisational or national list) | general | closed | x | These sound like reports compiled based on DMP. So if you have confidential data, DMP compiles list of local services that CAN be used. | ||||
| reuse_dataset | Is previously collected data reused in this project (Whether the data is collected, created or comes from elsewhere) | Boolean | Public | National DMP template | Relates to one data set (does it? Or does it relate to whole research in this meaning) Reuse of data is also information funders require. Also important here is the terms of use to the data. If you're using data that is already published and is based on other data? So is this property of dataset or is this property of research? | ||||||
| source_dataset | Data source | Partly (pid if possible, string if not) | 2 / 3 | O | Relates to one data set, can include ready-made options, but also an open text field Referencing can be really confusing. You can use data obtained from Twitter. Or dataset that somebody else compiled from Twitter... What do you reference here? Or do you make derivate dataset based on already existing dataset that is compiled from twitter? | ||||||
| data_collected | Data collected for this project | String | 3 | National DMP template | |||||||
| data_produced | Data produced as an outcome of the project | String | 3 | National DMP template | |||||||
| estimate_datasize | Give a rough estimate of the size of the data produced/collected | Value | 3 | National DMP template | Relates to one data set How does this differ from dataset size from distribution? | ||||||
| data_resource_estimate | Project data magnitude | Value | 1 (Add DMP datasets & estimated size) | ??? | |||||||
| Suggest to RDA TF to leave this out (see comments)! | data_quality_assurance | Data quality Assurance | String | 3 | general | x | Is this even necessary in modern DMP? This is like asking "do you do your science properly?" Could have some guiding meaning to wake up researchers to see what all is possible? Could there be lists connected to datatype (so it should offer relevant lists)? Our experts can help on risk management related to security, privacy, storage etc. But can we say anything relevant on quality control? Do we need to? What is use case here?!? Is only case educating the researcher if he is ignorant on his methodology? | ||||
| method_quality_assurance | Ways of quality assurance | List: TAU list as an example, should make a national, up-to-date list | 3 | national | o | Also, something else, to be specific. Could these come according to the discipline? | |||||
A. ma DMP structure | B. | C. | D. | E. Interoperability from data source | F. | G. | H. | I. | J. Phase 1=Planning 2=Applied 3=Granted 4=Mandatory update 5=Final reporting | K. | L. |
| Distribution (Preservation and sharing of material during the project) - OPTIONAL | access_url | A URL of the resource that gives access to a distribution of the dataset. e.g. landing page. | URI | general | x | O | In case of DMP you should use these to describe active use of the data. Others should be in life-cycle. | ||||
| title_dataset | Title is a property in both Dataset and Distribution, in compliance with W3C DCAT. In some cases these might be identical, but in most cases the Dataset represents a more abstract concept, while the distribution can point to a specific file. | general | x | M | |||||||
| available_until_url | Indicates how long this distribution will be/ should be available. Encoded using the relevant ISO 8601 Date and Time compliant string | Date | general | x | O | ||||||
byte_size | Byte Size (In RDA this is a number, CSC proposes a category list: S: < 10 TB, M: 10-50 TB , L: 50-100TB, XL: 100-200 TB, XXL: > 200 TB | Number or Size Category: S, M, L, XL, XXL | general | x | M | E.g. Important as it affects what all tools are available. | |||||
| data_access | Indicates access mode for data and data sharing. Allowed Values:
| Term from Controlled Vocabulary | general | x | This can change during the study. First I use it 3 years as closed, then I open it. Should here be what I want to do after the active use or what happens right now? → Should be the current publication status of the distribution. Dataset lifecycle documents the plan for the dataset. | ||||||
| description_distribution | Description is a property in both Dataset and Distribution, in compliance with W3C DCAT. In some cases these might be identical, but in most cases the Dataset represents a more abstract concept, while the distribution can point to a specific file. | String | general | x | O | ||||||
| download_url | The URL of the downloadable file in a given format. E.g. CSV file or RDF file. | general | x | O | |||||||
| format | Format according to: https://www.iana.org/assignments/media-types/media-types.xhtml if appropriate, otherwise use the common name for this format | general | x | M | |||||||
| This should be at dataset level → If not distribution spesific | license_dataset | List all licenses applied to a specific distribution of data. | general | x | O | ||||||
| Is this same as access_url if dataset is published??? | host_dataset | To provide information on quality of service provided by infrastructure (e.g. repository) where data is stored. Service URN | URN | general | x | O | Outside the own organization? Same as location_open_data in lifecycle | ||||
A. ma DMP structure | B. | C. | D. | E. Interoperability from data source | F. | G. | H. | I. | J. Phase 1=Planning 2=Applied 3=Granted 4=Mandatory update 5=Final reporting | K. | L. |
Dataset life cycle (extension to Dataset) | datalifecycle_description | Describe datasets created in project, and after the project at general level, and how they are managed | String | 3 | general | Public / Restricted | PI, Organization, Service procider | 1 | O | Funder and CSC needs this information | |
| Data life-cycle | Where will the data be stored during the project? | URN from CSC Service Catalogue & list presented by organization, if something else, what? | 3 | National | Public/Closed | PI, Organization, Service procider | 1 | M | Relates to a dataset, extra-important if data subject to the Act on the Secondary Use of Data Add to general data life-cycle Specify by data set if needed | ||
| data_users | With whom will the data be shared during the project? | Open, Inside Europe, Inside research consortium, Inside organization, complex structure | 3 | National | PI, Organization, Service procider | 1 | O | Refers to the technical solutions, will a DPA be needed? Is joint controller agreement, NDAs etc. already elsewhere? Or does this refer to the consortium projects? | |||
| shareage_solution | How the data will be shared during the project? Define technical? | Choose from Service catalog | 3 | National | PI, Organization, Service procider | 1 | M | ||||
| NEW | version_mgmt_data | How the data versions are managed? | String | 3 | National | PI, Organization, Service procider | 4 | O / M | Mandatory for large data intensive projects (At CSC >50 TB) | ||
| NEW | retention_data | How data retentions are managed? | String | 3 | National | PI, Organization, Service procider | 4 | O / M | Mandatory for large data intensive projects (At CSC >50 TB) Data retention plan is needed for managing the size of the project | ||
| NEW | exit_plan_data | What is the exit plan from computational and storage services in the end of the project? | String | 3 | National | PI, Organization, Service procider | 4 | M | Exit plan is needed to ensure that research data with value for re-use is saved within the available resources | ||
| backup_data | How data will be backed up during the project? To be planned by the researcher or organization specific solutions? | String | 3 | Organizationl | ? | PI, Organization, Service procider | 1 | M | |||
| application_process_data | What applications are used to process data? | Controlled list CSC Service Catalogue & organization services | 3 | General, National & Organizational | PI, Organization, Service procider | 1 | O | Affects the choice of storage environment (e.g. whether the video is only available for viewing or whether it needs to be available at the file level in an analysis program) | |||
| computing_environments | Which computing environments are needed for research? | Controlled list CSC Service Catalogue & organization services | 3 | National & Organizational | PI, Organization, Service procider | 1 | O | Relates to data set | |||
| computing_capacity_CPU | How much core hours for computing capacity is required in CPU? | Value | 3 | general | PI, Organization, Service procider | 1 | M | ||||
| computing_capacity_GPU | How much core hours for computing capacity is required in GPU? | Value | 3 | general | PI, Organization, Service procider | 1 | M | ||||
| preservation_statement | Preservation Statement | String | 3 | general | x | 2 | M | ||||
| archiving_services_data | Are archiving services or long term preservation for data needed? | Boolean | How to determine the value of data? | National | M | Relates to data set Is this long-term storage, e.g. 20 in Zenodo, archiving in institutional archive or something else? | |||||
| data_close_justification | If the project does not collect or produce any data fully or partially suitable for reuse, justify why the data cannot be made available even partially. | String | National | National DMP template | O | This is mandatory if data is closed. Should there be dataset level field for dataset publication (open / closed) ? | |||||
| location_open_data | Where will the data be opened? | Controlled list of data repositories | Special requirements for data repositories for preliminary data? | National | National DMP template | O | FSD comments: It is essential for the repository/archive to know (in the case of research projects that have received a positive funding decision) what kind of data are planned to be opened in the repository/archive and by whom. Covered under distribution maybe? This field responds also to requirement of National DMP template on: where the data or a publishable portion of them will be made available after the end of the project | ||||
| Moved here from Security | length_storage_data | How long the data is stored for the original research purpose | 3 | National | Relates to dataset, original purpose | ||||||
| deletion_data | How is data deleted/destroyed? | Controlled list, including option of not deleting data and explanation as to why. | 3 | National | M | Could be specified that this relates to unpublished data. Or data that are mentioned to be shared e.g. for 5 or 10 years, etc. | |||||
| deletion_when_data | When is data deleted/destroyed? | Date (Also freetext if not known, end of project etc?) | 3 | O | Could be specified that this relates to unpublished data. | ||||||
| see above: archiving_services_data | Will the data be archived? - (Overlapping with question: Are archiving services for data needed?) | ||||||||||
| archiving_when_data | When to archive? | Date. Same comment as above. | 3 | National | O | Active data can be deleted and archived at the same time | |||||
| archiving_location_data | Where to archive? | Controlled list CSC Service Catalogue & organization services | 3 | National | O | ||||||
| Group 3: Security, privacy, technical resources and metadata | |||||||||||
A. ma DMP structure | B. | C. | D. | E. Interoperability from data source | F. | G. | H. | I. | J. Phase 1=Planning 2=Applied 3=Granted 4=Mandatory update 5=Final reporting | K. | L. |
| Security and privacy (Legislation) | Note: Security and privacy needs special attention to meet national context in Finland to benefit from ma-features - Green as is in RDA standard. Please make needed clarifications in BLUE. | ||||||||||
| title_security | Title of security measures | String | general | Organization | x | 1 | M | ||||
| description_security_privacy | Description of security and privacy measures | Controlled Vocabulary | general | Organization | x | 1 | M | ||||
| National additions in Finland: | |||||||||||
| confidentiality | Does the data contain confidential information (EU definition | Term from Controlled Vocabulary: | DMP, 3 | EU/national security levels (restricted, confidential, secret, top secret / käyttörajoitettu, luottamuksellinen, salainen, erittäin salainen) | Closed | Organisation, open science services | 1 | M | May contain a "yes" condition, after which it is indicated which datasets this relates to. Confidential, business secrets, sensitive geospatial data, sensitive biodiversity data, national security, trade secrets. Dataset-specific. Comment: a joined classification for security levels. EU security levels here as an example. | ||
| This is in RDA standard dataset specific attribute. This could be derived variable from dataset specific questions to project level. If information is needed as well at project level. | personal_data | Does the research handle personal data for research purposes?
Documented | Boolean | 2 / 3 | EU/national security levels (restricted, confidential, secret, top secret / käyttörajoitettu, luottamuksellinen, salainen, erittäin salainen) | Closed | Organization | 1 | M | ||
| Optional Part of DPIA process | pre_dpia | Has risk assessment been filled in? (risk assessment/pre-dpia, selftest if DPIA is needed) | Boolean | EU/national security levels (restricted, confidential, secret, top secret / käyttörajoitettu, luottamuksellinen, salainen, erittäin salainen) | Closed | Organization | 1 | M | |||
| Optional Part of DPIA process | dpia | Should DPIA be done?
| Boolean | Requirement comes from the law Voluntary question | EU/national security levels (restricted, confidential, secret, top secret / käyttörajoitettu, luottamuksellinen, salainen, erittäin salainen) | Closed | Organization | 1 | O | Good question, should this be in the DMP at all. In this context, it is also possible to make a real assessment of whether it is being done. However, privacy information should be structured and compatible so that you can ask for it here if you wish. This should be optional in the sense that you don't accidentally ask twice if this triggers another process Comment: pre-DPIA usually executed to see if a full DPIA is necessary. | |
| dpia_id | If DPIA exist give URI / DOI | URI | EU/national security levels (restricted, confidential, secret, top secret / käyttörajoitettu, luottamuksellinen, salainen, erittäin salainen) | Closed | Organization | 1 | M | ||||
| if other risk assessments done | riskassessment_id | give URI / DOI | URI/PID | EU/national security levels (restricted, confidential, secret, top secret / käyttörajoitettu, luottamuksellinen, salainen, erittäin salainen) | Closed | Organization | 1 | M | |||
| conditional if yes in personal_data; if DPIA exists point out which information exists for direct data retrieval: + date | |||||||||||
privacy-notice | Is there a need for a privacy notice? | Closed | Data protection team | 1 | M | Date & title for privacy notice, data transfer agreements | |||||
privacy_notice_id | If privacy notice exist give link / archive number | Closed | PI, Organization | 1 | M | Use case maDMP could transfer the number / link of the privacy notice to data protection team when it is been done to indicate the status. | |||||
personal_data_list | What personal data do you process | Controlled listTAU's list as an example, myCSC can also be found, | Closed | 1 | M | These should really be asked one by one, to reach MA. So all options as Booleans. | |||||
| IF data privacy notice exists / risk assessment this is not asked / but could be filled in automatically; | personal_data_sp_category | What special categories of personal data do you process | String | Requirement comes from the law | Closed | 1 | M | Categories of special categories of personal data | |||
personal_data_sp_category_racial_or_ethnic origin | Boolean | 3 / 1 | Closed | 1 | M | ||||||
political opinions | Boolean | Closed | 1 | M | |||||||
religion or philosophical beliefs | Boolean | Closed | 1 | M | |||||||
trade union membership | Boolean | Closed | 1 | M | |||||||
data concerning health | Boolean | Closed | 1 | M | |||||||
sexual orientation or activity and | Boolean | Closed | 1 | M | |||||||
genetic and biometric data for identifying the person. | Boolean | Closed | 1 | M | |||||||
other_personal_data_sp_category | String | Closed | 1 | M | |||||||
| data_prosessing_basis | Basis for data processing | String | National | Closed | 1 | M | |||||
| data_prosessing_sp_category | Basis for processing special categories of personal data | String | National - (for enabling to provide optimal services) | Closed | PI, Organization, Service provider | 1 | M | ||||
| personal_data_transfer_outside_EU | Whether personal data is transferred outside the EU | Boolean | National - (for enabling to provide optimal services) | Closed | PI, Organization, Service provider | 1 | M | ||||
| personal_data_transfer_country | To which countries personal data is transferred | String | National - (for enabling to provide optimal services) | Closed | PI, Organization, Service provider | 1 | M | ||||
| personal_data_external_processors | Are there external processors | Boolean | Local (Responsibility for organization) | Closed | PI, Organization, Service provider | 1 | M | ||||
| personal_data_minimized | How is the processing of personal data minimized? | String | Local (Responsibility for organization) | Closed | PI, Organization, Service provider | 1 | M | Anonymization, pseudonymization, removal of direct identifiers..., dataset-specific? | |||
| IN RDA Standard Ethical issues are part of DMP domain - FI pilot considers grounds for separate category or merge with license & add user rights: | |||||||||||
| RIghts, ethics & license | |||||||||||
| Ethical issues | ethical_issues_exist | To indicate whether there are ethical issues related to data that this DMP describes. Allowed Values:
| List | general | Closed | PI, Organization, Service provider | x | 1 | M | This is an important trigger because then the DMP must be very good | |
| ethical_issues_report | RDA: "To indicate where a protocol from a meeting with an ethical commitee can be found" or direct ID to report | URI | general | Closed | PI, Organization, Service provider | x | 1 | M | Add link Comment: Date when the decision was made | ||
| ethical_issues_description | Describe ethical issues directly in a DMP | String | general | Closed | PI, Organization, Service provider | x | 1 | M | |||
| research_permit | Whether permission is required to collect data in research data set | Boolean | general | Closed | PI, Organization | 1 | M | Actual research permit | |||
A. ma DMP structure | B. | C. | D. | E. Interoperability from data source | F. | G. | H. | I. | J. Phase 1=Planning 2=Applied 3=Granted 4=Mandatory update 5=Final reporting | K. | L. |
| Rights related to data | ownership_data_right | Who owns the data/rights related to the data? | Person / Organization | 3 | national | M | Person or organization? Dataset-specific? The organisation can be a research organisation, a customer organisation or an organisation that otherwise only owns the data (e.g. an archive) | ||||
| ipr_copyright | Is there IPR or copyright issues | Boolean | 3 | national | M | ||||||
| agreements_data_right | What agreements are needed related to the rights to the material? | String | 3 | national | M | Are data right agreements included here? | |||||
| agreements | What other agreements are needed? | String | 3 | M | |||||||
| License | license_ref | Link to license document. | URI | 3 | general | x | M | Dataset-specific - What kind of license is granted for the use of data https://creativecommons.org/licenses/by/4.0/ | |||
| start_date | If date is set in the future, it indicates embargo period. | Date | 3 | general | x | M | |||||
A. ma DMP structure | B. | C. | D. | E. Interoperability from data source | F. | G. | H. | I. | J. Phase 1=Planning 2=Applied 3=Granted 4=Mandatory update 5=Final reporting | K. | L. |
| Metadata | schema | Is the data built according to a specific schema? | Boolean | national | Relates to data set Ideally, metadata from existing datasets could be imported directly from e.g. Zenodo. Also, metadata could be brought in for any datasets published in the project. Info flow might be easiest this way around rather than from DMP API to repository. | ||||||
| description | Description | String | x | provides taxonomy for... | |||||||
| language | Language of the metadata expressed using ISO 639-3 | Term from Controlled Vocabulary | national | x | |||||||
| metadata_standard_id | Name this schema/link to schema | PID / other id | general | x | Relates to data set | ||||||
| identifier | String | x | http://www.dublincore.org/specifications/dublin-core/dcmi-terms/ | ||||||||
| type | Identifier type. Allowed values: URL, Other | Term from controlled vocabulory | x | URL | |||||||
| vocabulary_link | Are vocabularies also used? | Term from controlled vocabulory | 3 | national | |||||||
| format_documentation | What is the format of the documentation? | List | 3 | Local ? | something else, what | ||||||
| location_documentation | Where is the documentation? | String/URL | 3 | Local ? | |||||||
| generated_documentation | Is documentation generated automatically? | Boolean | 3 | Local ? | |||||||
| access_documentation | Can the the documentation be accessed? | Boolean | 3 | Local ? | |||||||
| publish_methodology | Has the methodology/workflow been published somewhere? | Boolean | 3 | Local ? | registration of research? | ||||||
| workflow | Is the workflow described? | Boolean | 3 | Local ? | O | Especially important in the case of large datasets, from which the data itself cannot be preserved, but is produced again if necessary | |||||
| description_documentation | What does the documentation consist of? | String | 3 | Local ? | O | Workflow, variable description, … ? | |||||
| metadata_purpose | What is the basic metadata described for? | String | 3 | Local ? | O | Qvain, own CRIS, something else? | |||||
| metadata_open | Open the basic metadata? | Boolean | 3 | Local ? | O | ||||||
| metadata_location | Where is the basic metadata openned? | String /URL | 3 | Local ? | |||||||
| Security | description_security | Description of security measures | Controlled List | 3 | Closed | x | |||||
| title_security | Name of the technical resource | String | 3 | Closed | x | ||||||
| Is separate role to data controller needed in DMP? | access_control_id | Who is responsible for access control? Person / Role? | ORCID | National | |||||||
| Is separate role to data controller needed in DMP? | access_control_name | Who is responsible for access control? | String | ||||||||
| protection_level_data | What is the required level of data protection? | Controlled List (from Service catalog) | Relates to Data Protection Level of the data set: Open access, Restricted access, Restricted access & controlled use; Restricted access & restricted use | ||||||||
| Is separate role to data controller needed in DMP? Is this organization or project specific? | security_officer | Information Security Officer | String | derived from the storage location | Local | ||||||
| Further comments from National DMP template template : | |||||||||||
| Checked against Responsibilities from Academy templates (6.1) & (6.2) | |||||||||||
| → For service design workshop in March | List all steps where responsibilities need to be defined | Person / Role? | |||||||||
