DataCite Metadata Schema Properties

Based on the DataCite Metadata Schema 4.4, some property names may conflict with the JSON returned by the DataCite API, which should be taken as the source of truth for the code itself.

The main ID represents a direct child of the the main <resource/> tag: N

Subsequent numbers on the ID indicate a child of the tag with the main ID: N.M

Letters indicate an attribute of a tag, rather than a child: N.M.a

Key

Source

  • Supported

  • Partially supported

  • Not supported

Actions / Possible Source

  • No action needed

  • Low priority

  • High priority

  • Not evaluated

NB: some attributes may not be supported, but not require action due to the attribute not being relevant for our purposes.

Mandatory Properties

Note that some (child) tags/attributes may not actually be mandatory if they can occur zero times. Similarly, some tags are optional, but their children are required.

ID

Property

Occurence

Source

Actions / Possible Source

ID

Property

Occurence

Source

Actions / Possible Source

1

Identifier

1

Prefix from settings, suffix from DataCite response

 

└── 1.a

identifierType

1

DataCite response

 

2

Creator

1-n

Specified by user in DG

 

├── 2.1

creatorName

1

User.fullName (nullable)

 

│ └── 2.1.a

nameType

0-1

“Personal”

 

├── 2.2

givenName

0-1

User.givenName (nullable)

 

├── 2.3

familyName

0-1

User.familyName (nullable)

 

├── 2.4

nameIdentifier

0-n

API now populates from DB

User.orcidId (nullable, defined for~13% of DLS users) other schemes are possible, but not represented in ICAT

│ ├── 2.4.a

nameIdentifierScheme

1

Hardcoded

“ORCID”

│ └── 2.4.b

schemeURI

0-1

Hardcoded

http://orcid.org

└── 2.5

affiliation

0-n

DataPublicationUser.Affiliation.name hardcoded to first Affiliation only as User table only supports a single string, and is used as source

Support multiple affiliations (ICAT 5 table, none defined yet for DLS)

├── 2.5.a

affiliationIdentifier

0-n

DataPublicationUser.Affiliation.pid hardcoded to first Affiliation only

Support multiple affiliations

├── 2.5.b

affiliationIdentifierScheme

1

“ROR” (hardcoded but ICAT schema indicates “Identifier such as ROR or ISNI”

We hardcode to use the ROR to get an identifier from the name ofthe affiliation

└── 2.5.c

schemeURI

0-1

Hardcoded to https://ror.org/

Hardcode to whatever corresponds to 2.5.b (e.g. https://ror.org/ )

3

Title

1-n

Specified by user in DG (1)

 

└── 3.a

titleType

0-1

 

Types only relevant when not the “main” title, we only pass 1 title so this is not needed

4

Publisher

1

Settings

 

5

PublicationYear

1

Automatic

 

10

ResourceType

1

Hardcoded to Experimental Data or Experimental Dataset for Dataset, Collection respectively (worry that exposing free text field to user may result in innappropriate use - e.g. too long, too specific etc.)

Free text field, could:

  • expose to user as free text

  • expose to user as select element with sensible options

  • hardcode to something generic like “Experimental Data”

└── 10.a

resourceTypeGeneral

1

“Dataset”, “Collection”

 

Recommended Properties

ID

Property

Occurence

Source

Possible Actions

ID

Property

Occurence

Source

Possible Actions

6

Subject

0-n

 

Free text field, could:

  • Expose to user as (multiple) free text fields

  • Selector based off some externally defined ontology

  • Auto/selector based off ICAT Technique table (ICAT 5 table)

├── 6.a

subjectScheme

0-1

 

Dependent on there being a well defined scheme for 6

├── 6.b

schemeURI

0-1

 

Dependent on there being a well defined scheme for 6

├── 6.c

valueURI

0-1

 

Dependent on there being a well defined scheme for 6

└── 6.d

classificationCode

0-1

 

Dependent on there being a well defined scheme for 6

7

Contributor

0-n

Supported in the backend via optional contributorType sent with request (frontend NYI)

Currently everyone is mapped as a Creator, which is for the "main researchers involved". If we want to use Contributor, we can mostly source the same information in the same way as for Creator

├── 7.a

contributorType

1

 

This would need additional UI element compared to Creator

├── 7.1

contributorName

1

 

See Creator

│ └── 7.1.a

nameType

0-1

 

See Creator

├── 7.2

givenName

0-1

 

See Creator

├── 7.3

familyName

0-1

 

See Creator

├── 7.4

nameIdentifier

0-n

 

See Creator

│ ├── 7.4.a

nameIdentifierScheme

1

 

See Creator

│ └── 7.4.b

schemeURI

0-1

 

See Creator

└── 7.5

affiliation

0-n

 

See Creator

├── 7.5.a

affiliationIdentifier

0-n

 

See Creator

├── 7.5.b

affiliationIdentifierScheme

1

 

See Creator

└── 7.5.c

schemeURI

0-1

 

See Creator

8

Date

0-n

Automatic (1)

 

├── 8.a

dateType

1

“Created”

 

└── 8.b

dateInformation

0-1

 

Only needed if further clarification needed for the date

12

RelatedIdentifier

0-n

Supported in DOI API, not supported in DG

DOI API expects this to be provided in full from frontend, and then creates ICAT entries from it (including fields absent here, such as a title):

  • Manual entry (runs risk of inaccuracies?)

  • Structured entry (e.g. provide DOI relationType, scrape/hardcode the rest?

  • Add support for alternative approach - use hasPart for ICAT child entities with DOIs (but do/will we actually set these)?

├── 12.a

relatedIdentifierType

1

 

Controlled list of UID types, DOI is most relevant for us?

├── 12.b

relationType

1

 

Relatively well defined list, offer as a select element in UI?

├── 12.c

relatedMetadataScheme

0-1

 

Only relevant for HasMetadata IsMetadataFor

├── 12.d

schemeURI

0-1

 

Only relevant for HasMetadata IsMetadataFor

├── 12.e

schemeType

0-1

 

Only relevant for HasMetadata IsMetadataFor

└── 12.f

resourceTypeGeneral

0-1

 

See 10.1

17

Description

0-n

Specified by user in DG (1)

 

└── 17.a

descriptionType

1

Can be provided with request, otherwise defaults to “Other”

Not sure how we manage to avoid setting this. Well defined list, offer a select in UI or just hardcode to “abstract”?

18

GeoLocation

0-n

Statically set from settings file

Just hardcode this for the location of the facility (probably via settings file?)

├── 18.1

geoLocationPoint

0-1

Statically set from settings file

See 18

── 18.1.1

pointLongitude

1

Statically set from settings file

See 18

│ └── 18.1.2

pointLatitude

1

Statically set from settings file

See 18

├── 18.2

geoLocationBox

0-1

 

See 18

── 18.1.1

westBoundLongitude

1

 

See 18

── 18.2.2

eastBoundLongitude

1

 

See 18

── 18.2.3

southBoundLatitude

1

 

See 18

│ └── 18.2.4

borthBoundLatitude

1

 

See 18

├── 18.3

geoLocationPlace

0-1

 

See 18

└── 18.4

geoLocationPolygon

0-n

 

See 18

── 18.4.1

polygonPoint

4-n

 

See 18

── 18.4.2.1

pointLongitude

1

 

See 18

│ └── 18.4.2.2

pointLatitude

1

 

See 18

└── 18.4.2

inPolygonPoint

0-1

 

See 18

── 18.4.2.1

pointLongitude

1

 

See 18

└── 18.4.2.2

pointLatitude

1

 

See 18

Optional Properties

ID

Property

Occurence

Source

Possible Actions

ID

Property

Occurence

Source

Possible Actions

9

Language

0-1

 

Hardcode to “en”

11

AlternateIdentifier

0-n

 

Could populate with ICAT id for the DataPublication

└── 11.a

alternateIdentifierType

1

 

Could populate with description of ICAT id for the DataPublication

13

Size

0-n

 

Sum all the fileSizes (might need to be careful to not double count if multiple layers of the hierarchy are included)

14

Format

0-n

 

Auto populate from constituent DatafileFormats (if set)

15

Version

0-1

 

May be relevant if we start versioning to account for removal of data?

16

Rights

0-n

Statically set from settings file

Dependent on data policy, but if desired settings file seems like best source

├── 16.a

rightsURI

0-1

Statically set from settings file

See 16

├── 16.b

rightsIdentifier

0-1

Statically set from settings file

See 16

├── 16.c

rightsIdentifierScheme

0-1

Statically set from settings file

See 16

└── 16.d

schemeURI

0-1

Statically set from settings file

See 16

19

FundingReference

0-n

 

ICAT entity exists, but not clear how it will be populated:

  • User inputs information when creating the DataPublication, this creates a FundingReference in ICAT (prone to error, burden on the user)

  • FundingReference is centrally created (by who?) and user just has to select it by some criteria when creating DataPublication

├── 19.1

funderName

1

 

FundingReference.finderName

├── 19.2

funderIdentifier

0-1

 

FundingReference.finderIdentifier

│ ├── 19.2.a

funderIdentifierType

0-1

 

Attempt to guess from 19.2?

│ └── 19.2.b

schemeURI

0-1

 

Attempt to guess from 19.2?

├── 19.3

awardNumber

0-1

 

FundingReference.awardNumber

│ └── 19.3.a

awardURI

0-1

 

Not a field in ICAT, difficult to guess?

└── 19.4

awardTitle

0-1

 

FundingReference.awardTitle

20

RelatedItem

0-n

 

Seems to be intended for “a journal or book of which the article or chapter is part” - this is not relevant for data, and any use cases for relations should (?) be covered by 12

├── 20.a

relatedItemType

0-n

 

See 20

├── 20.b

relationType

0-n

 

See 20

├── 20.1

relatedItemIdentifier

0-1

 

See 20

│ ├── 20.1.a

relatedItemIdentifierType

0-1

 

See 20

│ ├── 20.1.b

relatedMetadataScheme

0-1

 

See 20

│ ├── 20.1.c

schemeURI

0-1

 

See 20

│ └── 20.1.d

schemeType

0-1

 

See 20

├── 20.2

creator

0-n

 

See 20

│ ├── 20.2.1

creatorName

1

 

See 20

│ │ └── 20.2.1.a

nameType

0-1

 

See 20

│ ├── 20.2.2

givenName

0-1

 

See 20

│ └── 20.2.3

familyName

0-1

 

See 20

├── 20.3

title

1-n

 

See 20

│ └── 20.3.a

titleType

0-1

 

See 20

├── 20.4

publicationYear

0-1

 

See 20

├── 20.5

volume

0-1

 

See 20

├── 20.6

issue

0-1

 

See 20

├── 20.7

number

0-1

 

See 20

│ └── 20.7.a

numberType

0-1

 

See 20

├── 20.8

firstPage

0-1

 

See 20

├── 20.9

lastPage

0-1

 

See 20

├── 20.10

publisher

0-1

 

See 20

├── 20.11

edition

0-1

 

See 20

└── 20.12

contributor

0-n

 

See 20

├── 20.12.a

contributorType

1

 

See 20

├── 20.12.1

contributorName

1

 

See 20

│ └── 20.12.1.a

nameType

0-1

 

See 20

├── 20.12.2

givenName

0-1

 

See 20

└── 20.12.3

familyName

0-1

 

See 20

Other Properties

These are sent to the DataCite API as JSON alongside the above, but do not form part of the schema (here for completeness).

ID

Property

Occurence

Source

ID

Property

Occurence

Source

 

url

1

Settings and DataPublication.id

 

event

1

“publish”