Metax V3 Datamodel and metadata in general

Data Catalog 

Metadata for datasets is always stored in a data catalog. In Metax, there are two common data catalogs: IDA and General catalogs.

  • IDA catalog: metadata for data stored in the Fairdata IDA service.

  • General catalog: metadata for data located outside of Fairdata (so-called remote resources).

Metadata created with Fairdata Qvain and through the Metax End User API interface are stored in these catalogs.

When an organization imports metadata for its datasets into Metax from its own metadata repository, they are always imported into the organization's dedicated catalog. CSC creates the catalog, so the organization doesn't need to worry about creating it themselves.

The following information needs to be defined for a new catalog:

  • Identifier

  • Title

  • Publisher (organization responsible for dataset publication)

Optionally, the following information can also be defined:

  • Description (a brief textual description of the catalog)

  • General website for the catalog/organization

  • Logo (currently mainly used in Etsin description pages)

Information about data catalogs can be viewed in json format: https://metax.fairdata.fi/v3/data-catalogs

Catalog record

A single record stored in a data catalog contains information about a single dataset: the dataset's descriptive information and technical information about the dataset.

In addition to descriptive information, a record contains, among other things, the following information:

  • User who initially described the dataset
  • Current and previous versions, and their identifiers (including the dataset "1464881e-637a-40b5-ab8a-8898618ae905" in the example, which has both a previous and a next version)

    • Example:

      "dataset_versions": [
      		{
      			"id": "374fb09f-4ec5-48a5-a73b-729f640480aa",
      			"title": {
      				"en": "Small V3 Dataset",
      				"fi": "Pieni V3 Testiaineisto"
      			},
      			"persistent_identifier": "urn:nbn:fi:fd-dummy-1b7fb192-1f49-4eda-a845-560bf92eb0e6",
      			"state": "published",
      			"created": "2024-04-25T10:20:00Z",
      			"removed": null,
      			"deprecated": null,
      			"version": 3
      		},
      		{
      			"id": "1464881e-637a-40b5-ab8a-8898618ae905",
      			"title": {
      				"en": "Small V3 Dataset",
      				"fi": "Pieni V3 Testiaineisto"
      			},
      			"persistent_identifier": "urn:nbn:fi:fd-dummy-f97fccad-191f-45c8-9f92-d08441a6c8b9",
      			"state": "published",
      			"created": "2024-04-25T10:18:41Z",
      			"removed": null,
      			"deprecated": null,
      			"version": 2
      		},
      		{
      			"id": "d33ee9d8-5694-4067-9450-4b9f55130755",
      			"title": {
      				"en": "Small V3 Dataset",
      				"fi": "Pieni V3 Testiaineisto"
      			},
      			"persistent_identifier": "urn:nbn:fi:fd-dummy-d0cc73cd-5a0e-47c1-bae5-c07e1a98e317",
      			"state": "published",
      			"created": "2024-04-25T10:18:21Z",
      			"removed": null,
      			"deprecated": null,
      			"version": 1
      		}
      	]
  • Possible deprecation (yes/no)

  • Origin of metadata (organization/user)

  • Whether it's a cumulative dataset

Information about individual dataset records can be viewed in json format: https://metax.fairdata.fi/v3/datasets

Code values / Value sets (reference data)

In storing dataset metadata, efforts should be made to use code values whenever possible. The use of code values enhances data interoperability.

A general list of code values used in Metax: https://metax.fairdata.fi/v3/docs/user-guide/reference-data

Fields for describing datasets

Contain all metadata related to research data content (descriptive metadata of a dataset). The following information is mandatory for all datasets:

  • Persistent identifier (PID)

  • Title
  • Publish date
  • Description

  • Keywords
  • Creator

  • Publisher
  • Access rights  & license

The information provided in certain fields must adhere to the relevant code sets. The code sets used are indicated in the field descriptions below. Metax's used code sets can be viewed at:

Fields in Metax

Persistent identifier (mandatory)

Field name: persistent_identifier

A unique, persistent identifier for the dataset. Usually DOI or URN identifier.

Type of persistent identifier (when created by Fairdata)

Field name: generate_pid_on_publish

Example: "generate_pid_on_publish": "DOI"

Title (mandatory)

Field name: title/fi, title/en

Example:

"title":{
	    "fi":"Aineiston otsikko",
        "en":"Dataset Title"
}

A descriptive title for the dataset. The title should concisely describe the dataset's subject matter and be unique enough that it is unlikely to be used to describe another dataset.

Description (mandatory)

Field name: description/fi, description/en

Example:

"description":{
	    "fi":"Kuvaus",
        "en":"Description"
}

A free-form description of the dataset. The description can include information on how the dataset was created, its purpose, structure, and processing. Also include details about the content, any shortcomings, and limitations.

Publish date (mandatory)

Field name: issued

The formal publication date of the dataset.

If no publication date is provided, the publication date of the dataset is automatically set to the current date.

Keywords (mandatory)

Field name: keyword

Appropriate keywords for the dataset. Keywords improve the findability of the dataset when precise terms are used.

Subject Headings

Field name: theme

Subject headings are selected from  controlled vocabularies, ontologies, or classifications, such as terms from the KOKO ontology.

Example:  

"theme": [
	{
		"url": "http://www.yso.fi/onto/koko/p36817"
	}
]
Actors

Individuals and organizations involved in the research or creation of the dataset. You can specify Creators, Publishers, Curators, Rights holders, and Other contributors. Creator and Publisher are mandatory information. Further information about different actors is provided below.

Field name: actors

Actor Information:

  • roles: Type of actor 
    • creator:
      • A person or organization who originally produced the dataset.
      • Creator is a mandatory actor, and there can be multiple creators.
    • publisher:
      • Actor that has permission to distribute the dataset or who has made the dataset available.
      • Usually a research organization.
      • Publisher is a mandatory actor, and only one (either an individual OR an organization) can be added as the publisher.
    • curator:
      • A person (or organization) who is responsible for ongoing maintenance of the dataset and keeping it available. Data curators are specialists who collect, organize, clean and transform data to make it accessible for organizations and individuals.
      • Not mandatory. Multiple curators can be added.
    • rights_holder:
      • A person or organization who holds the copyright, neighboring rights or moral rights of the dataset; usually the author of the data or the organization of the author.
      • The rights holder is usually the creator of the dataset or the creator's organization.
      • Not mandatory. Multiple rights holders can be added.
    • contributor:
      • Any other person or organization that has contributed significantly in the creation of the dataset (not quite creators 
      • Not mandatory. Multiple other contributors can be added.
  • person/name
    • The name of the actor if it's an individual type actor
  • person/email
    • The email of the actor if it's an individual type actor
    • Etsin users are able to send messages via Etsin without seeing the actual email address.
  • person/external_id
    • The identifier of the actor (e.g., ORCID) if it's an individual type actor
  • organization
    • Either the organization of a person or an organization type actor
    • Organization is mandatory information for person actors

Example:

"actors": [
		{
            "roles": [
                "creator", "rights_holder"
            ],
			"person": {
				"name": "Teemu Testihenkilö",
				 "email": "teemu.testihenkilo@organisaatio.fi",
				 "external_identifier": "0000-0001-1234-1234"
			},
			"organization": {
                "url": "http://uri.suomi.fi/codelist/fairdata/organization/code/09206320"
            }
			
		},
		{
			"roles": ["publisher"],
            "organization": {
                "url": "http://uri.suomi.fi/codelist/fairdata/organization/code/09206320"
            }
        }
]
Field of Science

Field name: field_of_science

The scientific discipline of the dataset.

Code List: Field of Science Classification

Example:

"field_of_science":[
	{
		"url":"http://www.yso.fi/onto/okm-tieteenala/ta114"
	},
	{
		"url":"http://www.yso.fi/onto/okm-tieteenala/ta115"
	}
]
Access rights (mandatory)

Field name: access_rights/access_type

Example:

"access_rights": {
	"access_type": {
		"url": "http://uri.suomi.fi/codelist/fairdata/access_type/code/restricted"
	},
	"restriction_grounds": [
		{
			"url": "http://uri.suomi.fi/codelist/fairdata/restriction_grounds/code/national_interest"
		}
	],
	"license": [
	{
			"url": "http://uri.suomi.fi/codelist/fairdata/license/code/CC-BY-4.0"
		}
	]
}

This information determines how the files of the published dataset can be accessed. This field does not affect the visibility of the dataset's metadata, which is always visible in Etsin after publication.

If the dataset's availability is anything other than "Open", a reason for restricting file downloads must also be selected. If "Embargo" is selected as the access type, the end date of the embargo must also be specified.

Code List: Access Type Code List and Restriction Grounds

License (mandatory)

Field name: access_rights/license

Example: ks. "Access rights"

The license defines how the data in the dataset can be used (metadata is automatically CC0 licensed).

The license can be a standard license (URL) from a code list or alternatively, only the URL to a website where the license is defined (custom URL).

Code List: Licenses

Last modification date

Field name: modified

The most recent date when the dataset or its metadata was modified.

Citation Format

Field name: bibliographic_citation

The primary citation format for the dataset.

Project

Field name: projects

Example:

"projects":[
	{
    	"title":{
			"en":"Project ABC",
			"fi":"Projekti ABC"
		},
    	"project_identifier":"Project identifier here",
	 	"participating_organizations":[
			{
				"url":"http://uri.suomi.fi/codelist/fairdata/organization/code/09206320"
			}
		],
		"funding": [
			{
				"funder": {
					"organization": {
						"url": "http://uri.suomi.fi/codelist/fairdata/organization/code/09206320"
					},
					"funder_type": {
	                    "url":"http://uri.suomi.fi/codelist/fairdata/funder_type/code/academy-of-finland"
					}
				},
				"funding_identifier": "Funding identifier here"
			},
 			{
				"funder": {
                    "organization" : {
  						"pref_label": {
     						"fi": "organisaatio",
     						"en": "organization"
  						}
					},
					"funder_type": {
	                    "url":"http://uri.suomi.fi/codelist/fairdata/funder_type/code/academy-of-finland"
					}
				},
				"funding_identifier": "follow-up funding"
			}  
        ]
	}
]

A project refers to the initiative from which the dataset was generated. A project is not mandatory, and multiple projects can be added.

For each project, the executing organizations (participating_organizations) and funding are defined.

Funding includes the funding agency (funder) and the identifier for the funding decision (funding_identifier). The identifier is not mandatory. A single project can have multiple funding decisions.

Metax utilizes the Research.fi portal's organizational registry, including funding organizations: https://research.fi/en/results/organizations

Language

Field name: language

The language in which the dataset or data is written.

Code List: ISO639-3 codes

Example:

"language":[
	{
		"url":"http://lexvo.org/id/iso639-3/eng"
	},
	{
		"url":"http://lexvo.org/id/iso639-3/fin"
	}
]
Geographical Area

Field name: spatial

Example:

"spatial":[
		{
			"custom_wkt": [
				"POINT(25.01213 60.17074)"
			],
			"reference": {
				"url": "http://www.yso.fi/onto/yso/p167902"
			},
			"full_address": "Kruunuvuorenranta",
			"geographic_name": "Helsinki",
			"altitude_in_meters": 23
		},
		{
			"custom_wkt": [
				"POINT(24.6361 60.15379)"
			],
			"reference": {
				"url": "http://www.yso.fi/onto/yso/p109504"
			},
			"full_address": "Kivenlahdentie",
			"geographic_name": "Espoo",
			"altitude_in_meters": 100
		}
]

The geographic area covered by the dataset (e.g., locations where observations were made).

Code List: YSO-paikat

Time Period

Field name: temporal/start_date, temporal/end_date

Example:

"temporal":[
	{
		"end_date":"2021-12-07",
		"start_date":"2021-01-01"
	},
	{
		"start_date":"2021-12-28"
	}
]

The time period(s) covered by the dataset (e.g., the time during which observations were made). In datetime format (yyyy-MM-dd'T'HH.mm.ss.SSSXXX). 

Related publications and other material

Field name: relation

Example:

"relation":[
		{
			"entity":{
				"type":{
					"url":"http://uri.suomi.fi/codelist/fairdata/resource_type/code/collection"
				},
				"title":{
					"en":"testi",
					"und":"testi"
				},
				"description":{
					"en":"testi",
					"und":"testi"
				},
				"entity_identifier": "urn:nbn:fi:att:03cc54b6-b1f9-41a6-b6db-ddb417375017"
            },
			"relation_type":{
				"url":"http://purl.org/dc/terms/hasPart"
			}
		}
	]

References to other datasets, publications, or resources that help understand and utilize this dataset.

Types of referenced datasets:  Metax Resource Types

Types of references: Metax Relation Types

History and events (provenienssi)

Field name: provenance

Events or actions that the dataset has been subjected to. These include events related to dataset collection, analysis, or presentation.

Data Source - IDA

Files stored in the IDA service. The necessary fields are described below.

Files

Field name: fileset

Associates the dataset with one or more files in IDA.

Data Source - Ulkoinen lähde

URLs from external services where the files are located. The necessary fields are described below.

Remote resource

Field name: remote_resources

The specific storage location or data format of the dataset. For example, a database or a query interface to the data.

Use Category

Field name: use_category

Example:

"remote_resources":[
	{
            "title":{
				"en": "Data in remote location"
			},
            "access_url":"https://datasomewhere.fi",
            "download_url":"https://downloadsomewhere.fi",
            "use_category":{
                "url":"http://uri.suomi.fi/codelist/fairdata/use_category/code/outcome"
            }
	}
]

The "type" of the associated data. The definition uses the code list for use categories.

Code List: Use Category Code List


  • No labels