Getting Started

library(wddsWizard)
library(jsonvalidate)

This package is home to the Wildlife Disease Data Standard. It provides template csv files for storing data and a set of functions for validating datasets against the standard.

The Whole game

Flat data files (csv/xlsx) are imported into R, lightly restructured, and then converted to JSON and validated against the Wildlife Disease Data Standard. Data sets either pass (🎉) or fail with informative errors.

The data standard is designed to be flexible and accessible. It is composed of two components: disease data and project metadata. The disease data component describes the contents and structure of data related to the detection (or not) of a parasite in a given host. The project metadata component describes the contents and structure of data related to the creation of the disease data component. The disease data component allows us to create a collection of datasets that can be re-used, aggregated, and shared, while the project metadata component provides context for the data, makes it easier to find the dataset, and gives clear information about attribution and use (rights/license).

Using this package, you can validate one or both components of the data standard. It will be helpful to open the Data Standard Terms in a separate browser tab as you go through the vignette and explore templates.

If you use the templates provided, you can focus on the field descriptions and largely ignore types and array items.

Templates

We have created several templates to make it easier to get started using the Wildlife Disease Data Standard.

# list templates
wdds_template()
#> [1] "disease_data_template.csv"     "disease_data_template.xlsx"   
#> [3] "project_metadata_template.csv"

You can make your own copies of the template files with the use_template function.


use_template("disease_data_template.csv",file_name = "my_interesting_disease_data.csv",open = TRUE)

use_template("project_metadata_template.csv",file_name = "my_project_metadata.csv",open = TRUE)

The templates may contain additional fields that are not strictly required. Required fields are clearly marked in Data Standard Terms documentation.

The following fields are required for disease data:

wddsWizard::disease_data_required_fields
#> [1] "sampleID"               "latitude"               "longitude"             
#> [4] "sampleCollectionMethod" "hostIdentification"     "detectionTarget"       
#> [7] "detectionMethod"        "detectionOutcome"       "parasiteIdentification"

The following fields are required for project metadata:

wddsWizard::project_metadata_required_fields
#> [1] "methodology"       "creators"          "titles"           
#> [4] "publicationYear"   "language"          "descriptions"     
#> [7] "fundingReferences"

Disease Data

Disease data are expected to be in a “tidy” form (think CSV or XLS where each row is an observation and each column is a property). Each column in the table that is part of the standard will be validated. You may include additional columns as needed.

Disease data and project metadata can be validated separately. In the code below we will read in a csv file, do some light wrangling, transform the data to JSON, and then validate the data.


## read in the data
my_disease_data <- read.csv(file = here::here("inst/extdata/example_data/my_interesting_disease_data.csv"))

# clean up field names to match JSON schema
my_disease_data <- clean_field_names(my_disease_data)

### Check for required Fields -

# check that all required fields are in the data
all(wddsWizard::disease_data_required_fields %in% names(my_disease_data))
#> [1] TRUE

## Prep for JSON

my_disease_data_prepped <- prep_data(my_disease_data)

## make the JSON!
my_disease_data_json <- my_disease_data_prepped |>
  jsonlite::toJSON(pretty = TRUE)

### validate the JSON 

schema <- here::here("inst/extdata/wdds_schema/schemas/disease_data.json")

# this creates a function that we can use to validate our data
dd_validator <- jsonvalidate::json_validator(schema,engine = "ajv")

# use the validator to check if the disease data conforms to the disease_data component of the standard
dd_validation <- dd_validator(my_disease_data_json,verbose = TRUE)

## check for errors!

errors <- attributes(dd_validation)

if(!dd_validation){
  errors$errors  
} else {
  print("Valid disease data!😁")
}
#> [1] "Valid disease data!😁"

Project Metadata

Project metadata largely follow the Datacite Metadata Schema. Again, the data standard allows you to include additional properties.

Note that if you are comfortable with JSON, it may be easier to write project metadata directly as JSON.

In the example below, we will use project metadata created from the project metadata template to create JSON that can be validated against the project metadata component of the data standard.


# read in project metadata created from template
my_project_metadata <- read.csv(here::here("inst/extdata/example_data/my_project_metadata.csv"))

# prepare project metadata
my_project_metadata_prepped <- prep_from_metadata_template(my_project_metadata)


# check that all required fields are in the project metadata
all(wddsWizard::project_metadata_required_fields %in% names(my_project_metadata_prepped))
#> [1] TRUE

# convert to json

my_project_metadata_json <- my_project_metadata_prepped |>
  jsonlite::toJSON(pretty  = TRUE)

# validate against project metadata schema

schema <- here::here("inst/extdata/wdds_schema/schemas/project_metadata.json")

pm_validator <- jsonvalidate::json_validator(schema,engine = "ajv")

pm_validation <- pm_validator(my_project_metadata_json,verbose = TRUE)

## check for errors!

errors <- attributes(pm_validation)

if(!pm_validation){
  errors$errors  
} else {
  print("Valid project metadata!😁")
}
#> [1] "Valid project metadata!😁"

See the vignettes on Project Metadata and Wildlife Disease Data for more details on preparing those components.

Combine disease data and project metadata

Finally we will check the disease data and the project metadata against the standard.

Combine components

The first thing we have to do is combine the disease data and project metadata components in a list and check that we have all the required fields.


## use append so that you do not add levels to your list

data_package <- list(disease_data = my_disease_data_prepped,
                     project_metadata = my_project_metadata_prepped)

# check that all required fields are in the data

req_field_check <- wddsWizard::schema_required_fields %in% names(data_package)

if(all(!req_field_check)){
  wddsWizard::schema_required_fields[!req_field_check]  
} else {
  print("all required fields present 🥳")
}
#> [1] "all required fields present 🥳"

Make JSON

Next we will convert the data_package so that it can be validated.


# convert to json

data_package_json <- jsonlite::toJSON(data_package,pretty = TRUE)

Validate your json!

Here we will use the {jsonvalidate} package to make sure data_package_json conforms to the wildlife disease data standard.


schema <- here::here("inst/extdata/wdds_schema/wdds_schema.json")

wdds_validator <- jsonvalidate::json_validator(schema,engine = "ajv")

project_validation <- wdds_validator(data_package_json,verbose = TRUE)

if(project_validation){
  print("Your data package is valid! 🎊 ")
} else {
errors <- attributes(project_validation)
errors$errors
}
#> [1] "Your data package is valid! 🎊 "

Handling Errors

You’re going to get some