Wildlife Disease Data
disease_data.Rmd
This vignette describes how to prepare data for validation against the wildlife disease data standard. We will 1) read an exampl excel spreadsheet, 2) do some light reformatting, and 3) produce a json object.
Required fields
The following fields are required to be in the data.
Field | Descriptions |
---|---|
sampleID | A researcher-generated unique ID for the sample: usually a unique string of both characters and integers (e.g., OS BZ19-114 to indicate an oral swab taken from animal BZ19-114; see worked example below), to avoid conflicts that can arise when datasets are merged with number-only notation for samples. Ideally, sample names should be kept consistent across all online databases and physical resources (e.g., museum collections or project-specific sample archives). |
animalID | A researcher-generated unique ID for the individual animal from which the sample was collected: usually a unique string of both characters and integers (e.g., BZ19-114 to indicate animal 114 sampled in 2019 in Belize). Ideally, animal names should again be kept consistent across online databases and physical resources. |
latitude | Latitude of the collection site in decimal format. |
longitude | Longitude of the collection site in decimal format. |
collectionDay | The day of the month on which the specimen was collected. |
collectionMonth | The month in which the specimen was collected. |
collectionYear | The year in which the specimen was collected. |
collectionMethodAndOrTissue | The technique used to acquire the sample and/or the tissue from which the sample was extracted (e.g., oropharyngeal swab) |
hostIdentification | The Linnaean classification of the animal from which the sample was collected, reported at the lowest possible level (ideally, species binomial name: e.g., Odocoileus virginianus or Ixodes scapularis). As necessary, researchers may also include an additional field indicating when uncertainty exists in the identification of the host organism (see Adding new fields). |
detectionTarget | The taxonomic identity of the parasite being screened for in the sample. This will often be coarser than the identity of a specific parasite identified in the sample: for example, in a study screening for novel bat coronaviruses, the entire family Coronaviridae might be the target; in a parasite dissection, the targets might be Acanthocephala, Cestoda, Nematoda, and Trematoda. For deep sequencing approaches (e.g., metagenomic and metatranscriptomic viral discovery), researchers should report each alignment target used as a new test to maximize reporting of negative data, or alternatively, select a subset that reflect specific study objectives and the focus of analysis (e.g., specific viral families). |
detectionMethod | The type of test performed to detect the parasite or parasite-specific antibody (e.g., ‘qPCR’, ‘ELISA’) |
detectionOutcome | The test result (i.e., positive, negative, or inconclusive). To avoid ambiguity, these specific values are suggested over numeric values (0 or 1). |
parasiteIdentification | The identity of a parasite detected by the test, if any, reported to the lowest possible taxonomic level, either as a Linnaean binomial classification or within the convention of a relevant taxonomic authority (e.g., Borrelia burgdorferi or Zika virus). Parasite identification may be more specific than detection target. |
Read in and Clean up the excel spreadsheet
## read
becker_data <- readxl::read_xlsx(path = here::here("inst/extdata/example_data/Becker_demo_dataset.xlsx"))
becker_data_prelim <- janitor::clean_names(becker_data,case = "lower_camel")
Check for required Fields
# check that required fields are in dataset
required_field_check <- wddsWizard::data_required_fields %in% names(becker_data_prelim)
wddsWizard::data_required_fields[!required_field_check]
#> [1] "sampleID" "animalID"
#> [3] "collectionMethodAndOrTissue"
Rename Fields to match standard
becker_data_prelim$collectionMethod
#> [1] "Oral swab" "Rectal swab"
becker_data_clean <- becker_data_prelim |>
dplyr::rename(
"sampleID" = "sampleId",
"animalID" = "animalId",
"collectionMethodAndOrTissue" = "collectionMethod"
)
# check that all required fields are in the data
all(wddsWizard::data_required_fields %in% names(becker_data_clean))
#> [1] TRUE
Prep for JSON
becker_prepped <- prep_data(becker_data_clean)
## wrap the prepped data in list
becker_data <- list(data = becker_prepped)
jsonlite::toJSON(becker_data,pretty = TRUE)
#> {
#> "data": {
#> "sampleID": ["OS BZ19-95", "RS BZ19-95"],
#> "animalID": ["BZ19-114", "BZ19-114"],
#> "latitude": [17.7643, 17.7643],
#> "longitude": [-88.6521, -88.6521],
#> "collectionDay": [23, 23],
#> "collectionMonth": [4, 4],
#> "collectionYear": [2019, 2019],
#> "collectionMethodAndOrTissue": ["Oral swab", "Rectal swab"],
#> "hostIdentification": ["Desmodus rotundus", "Desmodus rotundus"],
#> "organismSex": ["male", "male"],
#> "deadOrAlive": ["alive", "alive"],
#> "hostLifeStage": ["subadult", "subadult"],
#> "mass": [0.023, 0.023],
#> "massUnits": ["kg", "kg"],
#> "detectionTarget": ["Coronaviridae", "Coronaviridae"],
#> "detectionMethod": ["semi-nested PCR", "semi-nested PCR"],
#> "primerSequence": ["RdRp", "RdRp"],
#> "primerCitation": ["doi:10.3390/v9120364", "doi:10.3390/v9120364"],
#> "detectionOutcome": ["positive", "negative"],
#> "parasiteIdentification": ["Alphacoronavirus", null],
#> "genBankAccession": ["OM240578", null]
#> }
#> }