| Title: | Unified Framework for Data Quality Control |
|---|---|
| Description: | An easy framework to set a quality control workflow on a dataset. Includes a various range of functions that allow to establish an adaptable data quality control. |
| Authors: | Luis Garcez [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-8637-7946>) |
| Maintainer: | Luis Garcez <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-28 10:12:16 UTC |
| Source: | https://github.com/luisgarcez11/qualitycontrol |
An Amyotrophic lateral sclerosis related example dataset.
als_dataals_data
A list
subjidSubject ID
p1ALSFRS-R 1
p2ALSFRS-R 2
p3ALSFRS-R 3
p4ALSFRS-R 4
p5ALSFRS-R 5
p6ALSFRS-R 6
p7ALSFRS-R 7
p8ALSFRS-R 8
p9ALSFRS-R 9
x1rALSFRS-R R1
x2rALSFRS-R R2
x3rALSFRS-R R3
age_at_baselineAge at baseline
age_at_onsetAge at onsite
onsetRegion of onset
baseline_dateBaseline date3
death_dateDeath date
An example dataset containing a Quality Control mapping
als_data_qc_mappingals_data_qc_mapping
A list of 3 tibbles.
missingTable with all the 'missing' tests.
inconsistenciesTable with all the 'inconsistencies' tests.
rangeTable with all the 'out of range' tests.
QC dataset using a specific variable mapping
qc_data(data, qc_mapping, output_file = NULL)qc_data(data, qc_mapping, output_file = NULL)
data |
A data frame, data frame extension (e.g. a |
qc_mapping |
A list of data frame or data frame extension (e.g. a |
output_file |
(optional) File path ended in |
A data frame containing all the findings.
qc_data(als_data, als_data_qc_mapping)qc_data(als_data, als_data_qc_mapping)
read_qc_mapping reads an .xlsx file that contains
the QC mapping.
read_qc_mapping(path)read_qc_mapping(path)
path |
excel file path to be read. Each tab should contain 3 tabs with the names missing, inconsistencies and range. Each tab will correspond to one QC mapping table. QC mapping
The columns specified above should contain specific values:
|
A list containing all the QC mapping tables
Test if variable values are duplicated
test_duplicated(data, variable)test_duplicated(data, variable)
data |
data to be tested. |
variable |
The variable to be tested. |
A data frame containing all the findings regarding the applied test.
test_duplicated(als_data, 'subjid')test_duplicated(als_data, 'subjid')
Test the inconsistencies between variables on a dataset
test_inconsistencies(data, variable1, variable2, relation)test_inconsistencies(data, variable1, variable2, relation)
data |
data to be tested. |
variable1 |
The variable to be tested. |
variable2 |
The variable to be tested. |
relation |
String such as 'greater_than', 'greater_than_or_equal' 'lower_than_or_equal' and 'lower_than'. |
A data frame containing all the findings regarding the applied test.
test_inconsistencies(als_data, 'baseline_date', 'death_date', relation = 'lower_than') test_inconsistencies(als_data, 'age_at_baseline', 'age_at_onset', relation = 'greater_than')test_inconsistencies(als_data, 'baseline_date', 'death_date', relation = 'lower_than') test_inconsistencies(als_data, 'age_at_baseline', 'age_at_onset', relation = 'greater_than')
Test the variable missingness on a dataset
test_missing(data, variable)test_missing(data, variable)
data |
data to be tested. |
variable |
The variable to be tested. |
A data frame containing all the findings regarding the applied test.
test_missing(als_data, 'p8') test_missing(als_data, 'p1')test_missing(als_data, 'p8') test_missing(als_data, 'p1')
Test the range of a variable on a dataset
test_range( data, variable, type, categories = NULL, lower_value = NULL, upper_value = NULL )test_range( data, variable, type, categories = NULL, lower_value = NULL, upper_value = NULL )
data |
data to be tested. |
variable |
The variable to be tested. |
type |
String such as 'categorical', 'date' or 'numeric' |
categories |
Only to be filled if |
lower_value |
Only to be filled if |
upper_value |
Only to be filled if |
A data frame containing all the findings regarding the applied test.
test_range(als_data, 'onset', c('bulbar','respiratory', 'spinal'), type = 'categorical') test_range(als_data, 'age_at_baseline', lower_value = 20, upper_value = 100, type = 'numeric') test_range(als_data, 'age_at_onset', lower_value = 20, upper_value = 100, type = 'numeric') test_range(als_data, 'baseline_date', lower_value = '2000-01-01', upper_value = '2022-01-01', type = 'date') test_range(als_data, 'death_date', lower_value = '2000-01-01', upper_value = '2022-01-01', type = 'date')test_range(als_data, 'onset', c('bulbar','respiratory', 'spinal'), type = 'categorical') test_range(als_data, 'age_at_baseline', lower_value = 20, upper_value = 100, type = 'numeric') test_range(als_data, 'age_at_onset', lower_value = 20, upper_value = 100, type = 'numeric') test_range(als_data, 'baseline_date', lower_value = '2000-01-01', upper_value = '2022-01-01', type = 'date') test_range(als_data, 'death_date', lower_value = '2000-01-01', upper_value = '2022-01-01', type = 'date')