Metax V3 Datamodel and metadata in general
- General V3 API Guide: https://metax.fairdata.fi/v3/docs/user-guide/
- Swagger: https://metax.fairdata.fi/v3/swagger
- Metax V3 Schema: metax_v3_dataset_schema.json
Data Catalog
Metadata for datasets is always stored in a data catalog. In Metax, there are two common data catalogs: IDA and General catalogs.
IDA catalog: metadata for data stored in the Fairdata IDA service.
General catalog: metadata for data located outside of Fairdata (so-called remote resources).
Metadata created with Fairdata Qvain and through the Metax End User API interface are stored in these catalogs.
When an organization imports metadata for its datasets into Metax from its own metadata repository, they are always imported into the organization's dedicated catalog. CSC creates the catalog, so the organization doesn't need to worry about creating it themselves.
Catalog record
A single record stored in a data catalog contains information about a single dataset: the dataset's descriptive information and technical information about the dataset.
In addition to descriptive information, a record contains, among other things, the following information:
- User who initially described the dataset
Current and previous versions, and their identifiers (including the dataset "1464881e-637a-40b5-ab8a-8898618ae905" in the example, which has both a previous and a next version)
Example:
Possible deprecation (yes/no)
Origin of metadata (organization/user)
Whether it's a cumulative dataset
Information about individual dataset records can be viewed in json format: https://metax.fairdata.fi/v3/datasets
Code values / Value sets (reference data)
In storing dataset metadata, efforts should be made to use code values whenever possible. The use of code values enhances data interoperability.
A general list of code values used in Metax: https://metax.fairdata.fi/v3/docs/user-guide/reference-data
- Use the "limit" parameter when listing a different number of codes than the default pagination displays, for example: https://metax.fairdata.fi/v3/reference-data/fields-of-science?limit=10
Fields for describing datasets
Contain all metadata related to research data content (descriptive metadata of a dataset). The following information is mandatory for all datasets:
Persistent identifier (PID)
- Title
- Publish date
Description
- Keywords
Creator
- Publisher
Access rights & license
The information provided in certain fields must adhere to the relevant code sets. The code sets used are indicated in the field descriptions below. Metax's used code sets can be viewed at:
Fields in Metax
Persistent identifier (mandatory)
Field name: persistent_identifier
A unique, persistent identifier for the dataset. Usually DOI or URN identifier.
Type of persistent identifier (when created by Fairdata)
Field name: generate_pid_on_publish
Example: "generate_pid_on_publish": "DOI"
Title (mandatory)
Field name: title/fi, title/en
Example:
A descriptive title for the dataset. The title should concisely describe the dataset's subject matter and be unique enough that it is unlikely to be used to describe another dataset.
Description (mandatory)
Field name: description/fi, description/en
Example:
A free-form description of the dataset. The description can include information on how the dataset was created, its purpose, structure, and processing. Also include details about the content, any shortcomings, and limitations.
Publish date (mandatory)
Field name: issued
The formal publication date of the dataset.
If no publication date is provided, the publication date of the dataset is automatically set to the current date.
Keywords (mandatory)
Field name: keyword
Appropriate keywords for the dataset. Keywords improve the findability of the dataset when precise terms are used.
Subject Headings
Field name: theme
Subject headings are selected from controlled vocabularies, ontologies, or classifications, such as terms from the KOKO ontology.
Example:
Actors
Individuals and organizations involved in the research or creation of the dataset. You can specify Creators, Publishers, Curators, Rights holders, and Other contributors. Creator and Publisher are mandatory information. Further information about different actors is provided below.
- Metax uses the Research.fi portal's organizational register as the source for organizations: https://research.fi/en/results/organizations
- Metax API for listing available organizations: https://metax.fairdata.fi/v3/organizations
Field name: actors
Actor Information:
- roles: Type of actor
- creator:
- A person or organization who originally produced the dataset.
- Creator is a mandatory actor, and there can be multiple creators.
- publisher:
- Actor that has permission to distribute the dataset or who has made the dataset available.
- Usually a research organization.
- Publisher is a mandatory actor, and only one (either an individual OR an organization) can be added as the publisher.
- curator:
- A person (or organization) who is responsible for ongoing maintenance of the dataset and keeping it available. Data curators are specialists who collect, organize, clean and transform data to make it accessible for organizations and individuals.
- Not mandatory. Multiple curators can be added.
- rights_holder:
- A person or organization who holds the copyright, neighboring rights or moral rights of the dataset; usually the author of the data or the organization of the author.
- The rights holder is usually the creator of the dataset or the creator's organization.
- Not mandatory. Multiple rights holders can be added.
- contributor:
- Any other person or organization that has contributed significantly in the creation of the dataset (not quite creators
- Not mandatory. Multiple other contributors can be added.
- creator:
- person/name
- The name of the actor if it's an individual type actor
- person/email
- The email of the actor if it's an individual type actor
- Etsin users are able to send messages via Etsin without seeing the actual email address.
- person/external_id
- The identifier of the actor (e.g., ORCID) if it's an individual type actor
- organization
- Either the organization of a person or an organization type actor
- Organization is mandatory information for person actors
Example:
Field of Science
Field name: field_of_science
The scientific discipline of the dataset.
Code List: Field of Science Classification
Example:
Access rights (mandatory)
Field name: access_rights/access_type
Example:
This information determines how the files of the published dataset can be accessed. This field does not affect the visibility of the dataset's metadata, which is always visible in Etsin after publication.
If the dataset's availability is anything other than "Open", a reason for restricting file downloads must also be selected. If "Embargo" is selected as the access type, the end date of the embargo must also be specified.
Code List: Access Type Code List and Restriction Grounds
License (mandatory)
Field name: access_rights/license
Example: ks. "Access rights"
The license defines how the data in the dataset can be used (metadata is automatically CC0 licensed).
The license can be a standard license (URL) from a code list or alternatively, only the URL to a website where the license is defined (custom URL).
Code List: Licenses
Last modification date
Field name: modified
The most recent date when the dataset or its metadata was modified.
Citation Format
Field name: bibliographic_citation
The primary citation format for the dataset.
Project
Field name: projects
Example:
A project refers to the initiative from which the dataset was generated. A project is not mandatory, and multiple projects can be added.
For each project, the executing organizations (participating_organizations) and funding are defined.
Funding includes the funding agency (funder) and the identifier for the funding decision (funding_identifier). The identifier is not mandatory. A single project can have multiple funding decisions.
Metax utilizes the Research.fi portal's organizational registry, including funding organizations: https://research.fi/en/results/organizations
- Metax API for listing available organizations: https://metax.fairdata.fi/v3/organizations
Language
Field name: language
The language in which the dataset or data is written.
Code List: ISO639-3 codes
Example:
Geographical Area
Field name: spatial
Example:
The geographic area covered by the dataset (e.g., locations where observations were made).
Code List: YSO-paikat
Time Period
Field name: temporal/start_date, temporal/end_date
Example:
The time period(s) covered by the dataset (e.g., the time during which observations were made). In datetime format (yyyy-MM-dd'T'HH.mm.ss.SSSXXX).
Related publications and other material
Field name: relation
Example:
References to other datasets, publications, or resources that help understand and utilize this dataset.
Types of referenced datasets: Metax Resource Types
Types of references: Metax Relation Types
History and events (provenienssi)
Field name: provenance
Events or actions that the dataset has been subjected to. These include events related to dataset collection, analysis, or presentation.
Data Source - IDA
Files stored in the IDA service. The necessary fields are described below.
Files
Field name: fileset
Associates the dataset with one or more files in IDA.
Data Source - Ulkoinen lähde
URLs from external services where the files are located. The necessary fields are described below.
Remote resource
Field name: remote_resources
The specific storage location or data format of the dataset. For example, a database or a query interface to the data.
Use Category
Field name: use_category
Example:
The "type" of the associated data. The definition uses the code list for use categories.
Code List: Use Category Code List