Summary
This document provides background information about data standards, json-schemas in general and the structure of the Wildlife Disease Data standard specifically. WDDS is focused making it easy to store disease data in a consistent and FAIR format.
What is a JSON-schema
A JSON-schema is a human and machine readable document that defines a
data standard by describing the structure, properties, and constraints
for a dataset. For those of us more accustomed to thinking about
spreadsheet files and data frames, a property is roughly equivalent to a
field or column. The JSON-schema defines the rules around the type of
data used in particular property (character, numeric, logical, etc), and
its values (e.g. massUnits must be one of kg, mg, or g;
latitude must be between -90 and 90; sampleID
must be unique). The schema also describes how those fields should be
combined into a coherent whole (i.e. the structure of the dataset).
In a JSON-schema, fields can have parent child relationships. A field
may itself be schema. For example, the data property in
this standard defines a data object that is a flat table
with constraints, types, and/or requirements. In this way, JSON-schema
allows for the construction of modular schema documents that can
leverage existing schemas (e.g. darwin core, or datacite).
Once we have created a schema, we can then validate data against it. The validation process happens via a validation engine and tells us if the data conform to the standard. If the data do not conform, then the validation engine tells us precisely where the data are non-conformant and what the data standard expected to see.
For more detailed information see JSON-Schema.org
Wildlife Disease Data Standard (WDDS) Structure
The Wildlife Disease Data Standard is composed of two sub-schemas (1)
disease_data and (2) project_metadata.
disease_data describes the structure and contents of the
wildlife disease data. It has certain required fields and is extensible.
This data should be stored as a tidy dataset in a flat
file like a CSV. This component of the standard relies heavily on the Darwin Core data standard.
project_metadata describes the structure and contents of
the descriptive metadata. That is, metadata about the project that
enables discovery, identification, and attribution. This component of
the standard relies heavily on the Data Cite
Metadata Schema.
Researchers may validate their data against each sub-schema
separately, or use them in tandem to validate an entire data package.
The term “data package” refers to a list or JSON object that contains
both the disease_data and project_metadata
components.
Important vocabulary
Property: synonymous with field or column in a
table. A property corresponds to a particular attribute (e.g. age,
collectedBy, latitude, etc) of the data.
Required: A property is must be included for a given
schema or object within a schema.
Type: Type
of data. Common values include array, object, string, number,
integer, null, and boolean.
Array: A comma separated group of values. Similar to a
vector in R but a little more flexible.
Array Items: Array items define acceptable values for
an array.
- minItems - how many items must be present in the array - minimum -
inclusive - smallest value allowed in an array - maximum - inclusive -
largest value allowed in an array - enum - controlled vocabulary for an
array
Best practices for free text fields
We recommend that data producers use controlled vocabularies or ontologies when filling out free text fields. We recognize that selecting an appropriate vocabulary can be challenging and recommend the following platforms for finding appropriate terms.
Recommended ontology hosting and search platforms with distinct funding sources.
| Name | URL | 
|---|---|
| Ontobee | https://ontobee.org/ | 
| Ontology Lookup Service | https://www.ebi.ac.uk/ols4/ | 
| BioPortal | https://bioportal.bioontology.org/ | 
All three platforms allow users to search for terms stored in ontologies, explore relationships between terms, and find analogues. A user will have to explore a given ontology to find the most appropriate term. In Table S2 we list specific ontologies or authorities that may be appropriate for a given field.
Recommended ontologies or authorities for specific fields.
| Field | URL | 
|---|---|
| Host Identification | https://www.gbif.org/species/search | 
| Gene Target | https://www.ebi.ac.uk/ols4/ontologies/go | 
| Sample Collection Method | http://purl.obolibrary.org/obo/OBI_0000659 | 
| Sample Collection Body Part | https://www.ebi.ac.uk/ols4/ontologies/uberon | 
| Sample Collection Material | http://purl.obolibrary.org/obo/OBI_0001479 | 
disease_data
Type: object
Description: REQUIRED Wildlife
disease data. Stored in tidy form.
Required Fields: sampleID, latitude, longitude,
sampleCollectionMethod, hostIdentification, detectionTarget,
detectionMethod, detectionOutcome, parasiteIdentification
Reference: schemas/disease_data.json
- 
sampleIDType: array
 Description: REQUIRED A researcher-generated unique ID for the sample: usually a unique string of both characters and integers (e.g., OS BZ19-114 to indicate an oral swab taken from animal BZ19-114; see worked example below), to avoid conflicts that can arise when datasets are merged with number-only notation for samples. Ideally, sample names should be kept consistent across all online databases and physical resources (e.g., museum collections or project-specific sample archives).
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
animalIDType: array
 Description: A researcher-generated unique ID for the individual animal from which the sample was collected: usually a unique string of both characters and integers (e.g., BZ19-114 to indicate animal 114 sampled in 2019 in Belize). Ideally, animal names should again be kept consistent across online databases and physical resources.
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
latitudeType: array
 Description: REQUIRED Latitude of the collection site in decimal format. See http://rs.tdwg.org/dwc/terms/decimalLatitude
 Array Items- 
type: number, null
 
- 
minItems: 1
 
- 
maximum: 90
 
- minimum: -90
 
- 
type: number, null
- 
longitudeType: array
 Description: REQUIRED Longitude of the collection site in decimal format. See http://rs.tdwg.org/dwc/terms/decimalLongitude
 Array Items- 
type: number, null
 
- 
minItems: 1
 
- 
maximum: 180
 
- minimum: -180
 
- 
type: number, null
- 
spatialUncertaintyType: array
 Description: Coordinate uncertainty from GPS recordings, post-hoc digitization, or systematic alterations (e.g., jittering or rounding) expressed in meters. See http://rs.tdwg.org/dwc/terms/coordinateUncertaintyInMeters
 Array Items- 
type: number, null
 
- 
minItems: 1
 
- minimum: 0
 
- 
type: number, null
- 
collectionDayType: array
 Description: The day of the month on which the specimen was collected. See http://rs.tdwg.org/dwc/terms/day
 Array Items- 
type: integer, null
 
- 
minItems: 1
 
- 
minimum: 1
 
- maximum: 31
 
- 
type: integer, null
- 
collectionMonthType: array
 Description: The month in which the specimen was collected. See http://rs.tdwg.org/dwc/terms/month
 Array Items- 
type: integer, null
 
- 
minItems: 1
 
- 
minimum: 1
 
- maximum: 12
 
- 
type: integer, null
- 
collectionYearType: array
 Description: The year in which the specimen was collected. See http://rs.tdwg.org/dwc/terms/year
 Array Items- type: integer, null
 
- 
sampleCollectionMethodType: array
 Description: REQUIRED The technique used to acquire the sample and/or the tissue from which the sample was acquired (e.g. visual inspection; swab; wing punch; necropsy).
 Example Values: visual inspection, swab, wing punch, necropsy
 Array Items- 
type: string
 
- minItems: 1
 
- 
type: string
- 
sampleMaterialType: array
 Description: Organic tissue or fluid being collected (e.g., “liver”; “blood”; “skin”; “whole organism”).
 Example Values: liver, blood, skin, whole organism
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
sampleCollectionBodyPartType: array
 Description: Part of the animal body that samples are generated or collected from (e.g., “rectum”; “wing”).
 Example Values: rectum, wing
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
hostIdentificationType: array
 Description: REQUIRED The Linnaean classification of the animal from which the sample was collected, reported at the lowest possible level (ideally, species binomial name: e.g., Odocoileus virginianus or Ixodes scapularis). As necessary, researchers may also include an additional field indicating when uncertainty exists in the identification of the host organism (see Adding new fields). See http://rs.tdwg.org/dwc/terms/scientificName
 Array Items- 
type: string, null
 
- 
minItems: 1
 
- not: [HOMOhomo]{4} [SAPIENSsapiens]{7}
 
- 
type: string, null
- 
organismSexType: array
 Description: The sex of the individual animal from which the sample was collected. See http://rs.tdwg.org/dwc/terms/sex
 Example Values: male, female, hermaphrodite
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
liveCaptureType: array
 Description: Whether the individual animal from which the sample was collected was alive at the time of capture. Should be TRUE or FALSE; lethal sampling should be recorded as TRUE as this field describes the organism at the time of capture.
 Array Items- 
type: boolean, null
 
- minItems: 1
 
- 
type: boolean, null
- 
hostLifeStageType: array
 Description: The life stage of the animal from which the sample was collected (as appropriate for the organism) (e.g., juvenile, adult). See http://rs.tdwg.org/dwc/terms/lifeStage
 Example Values: juvenile, adult, larva
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
ageType: array
 Description: The numeric age of the animal from which the sample was collected, at the time of sample collection, if known (e.g., in monitored populations).
 Array Items- 
type: number, null
 
- 
minItems: 1
 
- minimum: 0
 
- 
type: number, null
- 
ageUnitsType: array
 Description: The units in which age is measured (usually years).
 Array Items- 
type: string, null
 
- 
enum: years, months, days, hours, minutes,
seconds
 
- minItems: 1
 
- 
type: string, null
- 
massType: array
 Description: The mass of the animal from which the sample was collected, at the time of sample collection.
 Array Items- 
type: number, null
 
- 
minItems: 1
 
- minimum: 0
 
- 
type: number, null
- 
massUnitsType: array
 Description: The units that mass is recorded in (e.g., kg).
 Example Values: kg, g, mg, kilogram, milligram
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
lengthType: array
 Description: The numeric length of the animal from which the sample was collected, at the time of sample collection.
 Array Items- 
type: number, null
 
- 
minItems: 1
 
- minimum: 0
 
- 
type: number, null
- 
lengthMeasurementType: array
 Description: The axis of measurement for the organism being measured (e.g., snout-vent length or just SVL; wing length; primary feather).
 Example Values: snout-vent length, intertegular distance, primary feather
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
lengthUnitsType: array
 Description: The units that length is recorded in (e.g., meters).
 Example Values: mm, meters, cm, km
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
organismQuantityType: array
 Description: A number or enumeration value for the quantity of organisms. See http://rs.tdwg.org/dwc/terms/organismQuantity
 Example Values: 1, 1.4, 12
 Array Items- 
type: number, null
 
- 
minItems: 1
 
- minimum: 0
 
- 
type: number, null
- 
organismQuantityUnitsType: array
 Description: The units that organism quantity is recorded in (e.g. “individuals”). See http://rs.tdwg.org/dwc/iri/organismQuantityType
 Example Values: individual, biomass, Braun-Blanquet scale
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
detectionTargetType: array
 Description: REQUIRED The taxonomic identity of the parasite being screened for in the sample. This will often be coarser than the identity of a specific parasite identified in the sample: for example, in a study screening for novel bat coronaviruses, the entire family Coronaviridae might be the target; in a parasite dissection, the targets might be Acanthocephala, Cestoda, Nematoda, and Trematoda. For deep sequencing approaches (e.g., metagenomic and metatranscriptomic viral discovery), researchers should report each alignment target used as a new test to maximize reporting of negative data, or alternatively, select a subset that reflect specific study objectives and the focus of analysis (e.g., specific viral families). See http://rs.tdwg.org/dwc/terms/associatedOccurrences
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
detectionMethodType: array
 Description: REQUIRED The type of test performed to detect the parasite or parasite-specific antibody (e.g., ‘qPCR’, ‘ELISA’)
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
forwardPrimerSequenceType: array
 Description: The sequence of the forward primer used for parasite detection (e.g., for a pan-coronavirus primer: 5’ CDCAYGARTTYTGYTCNCARC 3’). (Strongly encouraged if applicable, e.g., for PCR.)
 Example Values: 5’ CDCAYGARTTYTGYTCNCARC 3’
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
reversePrimerSequenceType: array
 Description: The sequence of the reverse primer used for parasite detection (e.g., 5’ RHGGRTANGCRTCWATDGC 3’). (Strongly encouraged if applicable, e.g., for PCR.)
 Example Values: 5’ RHGGRTANGCRTCWATDGC 3’
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
geneTargetType: array
 Description: The parasite gene targeted by the primer (e.g. “RdRp” for PCR.).
 Example Values: RdRp
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
primerCitationType: array
 Description: Citation(s) for the primer(s) (ideally doi, or other permanent identifier for a work, e.g. PMID).
 Example Values: https://doi.org/10.1016/j.virol.2007.06.009, Complete genome sequence of bat coronavirus HKU2 from Chinese horseshoe bats revealed a much smaller spike gene with a different evolutionary lineage from the rest of the genome, PMC7103351, https://openalex.org/works/w2036144053
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
probeTargetType: array
 Description: Antibody or antigen targeted for detection. (Strongly encouraged if applicable, e.g., for ELISA.)
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
probeTypeType: array
 Description: Antibody or antigen used for detection. (Strongly encouraged if applicable, e.g., for ELISA.)
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
probeCitationType: array
 Description: Citation(s) for the probe(s) (ideally doi, or other permanent identifier for a work, e.g. PMID).
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
detectionOutcomeType: array
 Description: REQUIRED The test result (i.e., positive, negative, or inconclusive). To avoid ambiguity, these specific values are suggested over numeric values (0 or 1). See http://rs.tdwg.org/dwc/terms/occurrenceStatus
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
detectionMeasurementType: array
 Description: Any numeric measurement of parasite detection that is more detailed than simple positive or negative results (e.g., viral titer, parasite counts, sequence reads).
 Array Items- 
type: number, null
 
- minItems: 1
 
- 
type: number, null
- 
detectionMeasurementUnitsType: array
 Description: Units for quantitative measurements of parasite intensity or test results (e.g., Ct, TCID50/mL, or parasite count).
 Example Values: Ct, TCID50/mL, parasite count
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
parasiteIdentificationType: array
 Description: REQUIRED The identity of a parasite detected by the test, if any, reported to the lowest possible taxonomic level, either as a Linnaean binomial classification or within the convention of a relevant taxonomic authority (e.g., Borrelia burgdorferi or Zika virus). Parasite identification may be more specific than detection target.
 Example Values: Zika virus, Borrelia burgdorferi, Onchocerca volvulus
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
parasiteIDType: array
 Description: A researcher-generated unique ID for an individual parasite (primarily useful in nested cases where this ID is used as an animal ID in another row, such as pathogen testing of a blood-feeding arthropod removed from a vertebrate host).
 Example Values: 001, TICK201923
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
parasiteLifeStageType: array
 Description: The life stage of the parasite from which the sample was collected (as appropriate for the organism) (e.g., juvenile, adult).
 Example Values: juvenile, adult, sporozoite
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
- 
genbankAccessionType: array
 Description: The GenBank accession for any parasite genetic sequence(s), if appropriate. Accession numbers or other identifiers for related data stored on another platform should be added in a different field (e.g. GISAID Accession, Immport Accession). See http://rs.tdwg.org/dwc/terms/otherCatalogNumbers
 Example Values: U49845 | U49846, U11111
 Array Items- 
type: string, null
 
- minItems: 1
 
- 
type: string, null
project_metadata
Type: object
Description: REQUIRED Metadata
for a project that largely follows the Datacite data standard.
Required Fields: methodology, creators, titles,
publicationYear, language, descriptions, fundingReferences
Reference: schemas/project_metadata.json
- 
methodologyType: object 
 Description: REQUIRED A broad categorization of how data were collected.
 Properties:- 
eventBasedType: boolean
 Description: Whether or not research was conducted in response to a known or suspected infectious disease outbreak, observed animal morbidity or mortality, etc.
 
- 
archivalType: boolean
 Description: Whether samples were from an archival source (e.g., museum collections, biobanks).
 
- 
- 
creatorsType: array 
 Description: REQUIRED The full names of the creators. Should be in the format familyName, givenName.
 Array Items- 
nameType: string 
 Description: REQUIRED DataCite name
- 
nameTypeType: string 
 Description: DataCite nameType
- 
givenNameType: string 
 Description: DataCite givenName
- 
familyNameType: string 
 Description: DataCite familyName
- 
nameIdentifiersType: array 
 Description: DataCite nameIdentifiers
 Array Items- 
nameIdentifierType: string
 Description: REQUIRED DataCite nameIdentifier
 
- 
nameIdentifierSchemeType: string
 Description: REQUIRED DataCite nameIdentifierScheme
 
- 
schemeUriType: string
 Description: DataCite schemeUri
 
- 
- 
affiliationType: array 
 Description: DataCite affiliation
 Array Items- 
nameType: string
 Description: REQUIRED DataCite name
 
- 
affiliationIdentifierType: string
 Description: DataCite affiliationIdentifier
 
- 
affiliationIdentifierSchemeType: string
 Description: DataCite affiliationIdentifierScheme
 
- 
schemeUriType: string
 Description: DataCite schemeUri
 
- 
- 
langType: string 
 Description: DataCite lang
 
- 
- 
titlesType: array 
 Description: REQUIRED A name or title by which a resource is known.
 Array Items- 
titleType: string
 Description: REQUIRED DataCite title
 
- 
titleTypeType: string
 Description: DataCite titleType
 
- 
langType: string
 Description: DataCite lang
 
- 
- 
identifierType: array 
 Description: A unique string that identifies a resource.
 Array Items- 
identifierType: string
 Description: REQUIRED DataCite identifier
 
- 
identifierTypeType: string
 Description: DataCite identifierType
 
- 
- 
subjectsType: array 
 Description: Subject, keyword, classification code, or key phrase describing the resource.
 Array Items- 
subjectType: string
 Description: REQUIRED DataCite subject
 
- 
subjectSchemeType: string
 Description: DataCite subjectScheme
 
- 
schemeUriType: string
 Description: DataCite schemeUri
 
- 
valueUriType: string
 Description: DataCite valueUri
 
- 
classificationCodeType: string
 Description: DataCite classificationCode
 
- 
langType: string
 Description: DataCite lang
 
- 
- 
publicationYearType: string 
 Description: REQUIRED The year when the data was or will be made publicly available.
- 
rightsType: array 
 Description: Any rights information for this resource.
 Array Items- 
rightsType: string
 Description: DataCite rights
 
- 
rightsUriType: string
 Description: DataCite rightsUri
 
- 
rightsIdentifierType: string
 Description: DataCite rightsIdentifier
 
- 
rightsIdentifierSchemeType: string
 Description: DataCite rightsIdentifierScheme
 
- 
schemeUriType: string
 Description: DataCite schemeUri
 
- 
langType: string
 Description: DataCite lang
 
- 
- 
descriptionsType: array 
 Description: REQUIRED All additional information that does not fit in any of the other categories. May be used for technical information or detailed information associated with a scientific instrument.
 Array Items- 
descriptionType: string
 Description: REQUIRED DataCite description
 
- 
descriptionTypeType: string
 Description: REQUIRED DataCite descriptionType
 
- 
langType: string
 Description: DataCite lang
 
- 
- 
languageType: string 
 Description: REQUIRED The primary language of the resource.
- 
fundingReferencesType: array 
 Description: REQUIRED Name and other identifying information of a funding provider.
 Array Items- 
funderNameType: string
 Description: REQUIRED DataCite funderName
 
- 
funderIdentifierType: string
 Description: DataCite funderIdentifier
 
- 
funderIdentifierTypeType: string
 Description: DataCite funderIdentifierType
 
- 
awardNumberType: string
 Description: DataCite awardNumber
 
- 
awardUriType: string
 Description: DataCite awardUri
 
- 
awardTitleType: string
 Description: DataCite awardTitle
 
- 
- 
relatedIdentifiersType: array 
 Description: DataCite relatedIdentifiers
 Array Items- 
relationTypeType: string 
 Description: REQUIRED DataCite relationType
- 
relatedMetadataSchemeType: string 
 Description: DataCite relatedMetadataScheme
- 
schemeUriType: string 
 Description: DataCite schemeUri
- 
schemeTypeType: string 
 Description: DataCite schemeType
- 
resourceTypeGeneralType: string 
 Description: DataCite resourceTypeGeneral
- 
relatedIdentifierType: string 
 Description: REQUIRED DataCite relatedIdentifier
- 
relatedIdentifierTypeType: string 
 Description: REQUIRED DataCite relatedIdentifierType
 
-