Skip to contents

Checks both file level properties like file name, extension, location etc as well as model output data, i.e. the contents of the file.

Usage

validate_submission(
  hub_path,
  file_path,
  round_id_col = NULL,
  validations_cfg_path = NULL,
  skip_submit_window_check = FALSE,
  skip_check_config = FALSE,
  submit_window_ref_date_from = c("file", "file_path")
)

Arguments

hub_path

Either a character string path to a local Modeling Hub directory or an object of class <SubTreeFileSystem> created using functions s3_bucket() or gs_bucket() by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the arrow package. The hub must be fully configured with valid admin.json and tasks.json files within the hub-config directory.

file_path

character string. Path to the file being validated relative to the hub's model-output directory.

round_id_col

Character string. The name of the column containing round_ids. Usually, the value of round property round_id in hub tasks.json config file.

validations_cfg_path

Path to validations.yml file. If NULL defaults to hub-config/validations.yml.

skip_submit_window_check

Logical. Whether to skip the submission window check.

skip_check_config

Logical. Whether to skip the hub config validation check. check.

submit_window_ref_date_from

whether to get the reference date around which relative submission windows will be determined from the file's file_path round ID or the file contents themselves. file requires that the file can be read. Only applicable when a round is configured to determine the submission windows relative to the value in a date column in model output files. Not applicable when explicit submission window start and end dates are provided in the hub's config.

Value

An object of class hub_validations. Each named element contains a hub_check class object reflecting the result of a given check. Function will return early if a check returns an error.

For more details on the structure of <hub_validations> objects, including how to access more information on individual checks, see article on <hub_validations> S3 class objects.

Details

Details of checks performed by validate_submission()

Name Check Early return Fail output Extra info
valid_config Hub config valid TRUE check_error
submission_time Current time within file submission window FALSE check_failure
file_exists File exists at `file_path` provided TRUE check_error
file_name File name valid TRUE check_error
file_location File located in correct team directory FALSE check_failure
round_id_valid File round ID is valid hub round IDs TRUE check_error
file_format File format is accepted hub/round format TRUE check_error
metadata_exists Model metadata file exists in expected location TRUE check_error
file_read File can be read without errors TRUE check_error
valid_round_id_col Round ID var from config exists in data column names. Skipped if `round_id_from_var` is FALSE in config. FALSE check_failure
unique_round_id Round ID column contains a single unique round ID. Skipped if `round_id_from_var` is FALSE in config. TRUE check_error
match_round_id Round ID from file contents matches round ID from file name. Skipped if `round_id_from_var` is FALSE in config. TRUE check_error
colnames File column names match expected column names for round (i.e. task ID names + hub standard column names) TRUE check_error
col_types File column types match expected column types from config. Mainly applicable to parquet & arrow files. FALSE check_failure
valid_vals Columns (excluding `value` column) contain valid combinations of task ID / output type / output type ID values TRUE check_error error_tbl: table of invalid task ID/output type/output type ID value combinations
rows_unique Columns (excluding `value` column) contain unique combinations of task ID / output type / output type ID values FALSE check_failure
req_vals Columns (excluding `value` column) contain all required combinations of task ID / output type / output type ID values FALSE check_failure missing_df: table of missing task ID/output type/output type ID value combinations
value_col_valid Values in `value` column are coercible to data type configured for each output type FALSE check_failure
value_col_non_desc Values in `value` column are non-decreasing as output_type_ids increase for all unique task ID /output type value combinations. Applies to `quantile` or `cdf` output types only FALSE check_failure error_tbl: table of rows affected
value_col_sum1 Values in the `value` column of `pmf` output type data for each unique task ID combination sum to 1. FALSE check_failure error_tbl: table of rows affected

Examples

hub_path <- system.file("testhubs/simple", package = "hubValidations")
file_path <- "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
validate_submission(hub_path, file_path)
#> ::notice ::✔ simple: All hub config files are valid.%0A✔ 2022-10-08-team1-goodmodel.csv: File exists at path%0A  model-output/team1-goodmodel/2022-10-08-team1-goodmodel.csv.%0A✔ 2022-10-08-team1-goodmodel.csv: File name "2022-10-08-team1-goodmodel.csv" is%0A  valid.%0A✔ 2022-10-08-team1-goodmodel.csv: File directory name matches `model_id`%0A  metadata in file name.%0A✔ 2022-10-08-team1-goodmodel.csv: `round_id` is valid.%0A✔ 2022-10-08-team1-goodmodel.csv: File is accepted hub format.%0A✔ 2022-10-08-team1-goodmodel.csv: Metadata file exists at path%0A  model-metadata/team1-goodmodel.yaml.%0A✔ 2022-10-08-team1-goodmodel.csv: File could be read successfully.%0A✔ 2022-10-08-team1-goodmodel.csv: `round_id_col` name is valid.%0A✔ 2022-10-08-team1-goodmodel.csv: `round_id` column "origin_date" contains a%0A  single, unique round ID value.%0A✔ 2022-10-08-team1-goodmodel.csv: All `round_id_col` "origin_date" values match%0A  submission `round_id` from file name.%0A✔ 2022-10-08-team1-goodmodel.csv: Column names are consistent with expected%0A  round task IDs and std column names.%0A✔ 2022-10-08-team1-goodmodel.csv: Column data types match hub schema.%0A✔ 2022-10-08-team1-goodmodel.csv: `tbl` contains valid values/value%0A  combinations.%0A✔ 2022-10-08-team1-goodmodel.csv: All combinations of task ID%0A  column/`output_type`/`output_type_id` values are unique.%0A✔ 2022-10-08-team1-goodmodel.csv: Required task ID/output type/output type ID%0A  combinations all present.%0A✔ 2022-10-08-team1-goodmodel.csv: Values in column `value` all valid with%0A  respect to modeling task config.%0A✔ 2022-10-08-team1-goodmodel.csv: Values in `value` column are non-decreasing%0A  as output_type_ids increase for all unique task ID value/output type%0A  combinations of quantile or cdf output types.%0Aℹ 2022-10-08-team1-goodmodel.csv: No pmf output types to check for sum of 1.%0A  Check skipped.%0A! 2022-10-08-team1-goodmodel.csv: Submission time must be within accepted%0A  submission window for round.  Current time "2024-04-03 06:20:09 UTC" is%0A  outside window 2022-10-02 EDT--2022-10-09 23:59:59 EDT.