qualitycontrol

qualitycontrol

The goal of qualitycontrol is to set a data quality control framework

Installation

You can install the qualitycontrol from GitHub with:

# install.packages("devtools")
devtools::install_github("luisgarcez11/qualitycontrol")

Data

The als_data dataset will be used to guide you through the package functionality. This data is not real, but based on data retrieved from Amyotrophic Lateral Sclerosis patients.

library(qualitycontrol)
als_data
#>    subjid p1 p2 p3 p4 p5 p6 p7 p8 p9 x1r x2r x3r age_at_baseline age_at_onset
#> 1       1  4  1  1  3  4  3  4  3  4   2   2   1              51           46
#> 2       2  4  4  4  1  1  3  3  1  4   1   2   4              82           77
#> 3       3  2  3  1  4  3  1  3  1  1   4   3   1              85           80
#> 4       4  3  2  1  1  4  1  3  2  4   4   3   3              77           72
#> 5       5  3  2  1  3  3  4  4  3  4   1   4   2              85           80
#> 6       6  2  2  1  4  1  4  4  3  1   3   5   2              73           68
#> 7       7  1  4  2  4  3  3  2  3  4   1   2   2              65           60
#> 8       8  2  2  4  4  3  2  1  2  3   3   1   1              50           62
#> 9       9  3  1  1  4  4  2  4  1  1   2   2   4              65           46
#> 10     10  3  4  1  4  3  2  3  2  1   4   3   1              81           76
#> 11     11  1  3  1  3  3  4  1 NA  3   3   2   4              51           46
#> 12     12  1  4  3  2  3  2  2 NA  1   3   2   3              50           45
#> 13     13  1  1  4  1  1  3  4 NA  2   2   3   1              82           77
#> 14     14  3  2  2  4  3  3  3  3  2   3   4   1              76           71
#> 15     15  3  4  2  2  2  3  1  3  4   4   1   4              87          376
#> 16     16  3  3  2  4  3  3  1  1  2   2   4   1              50           45
#> 17     17  3  2  3  1  4  1  3  2  1   4   4   2              85           80
#> 18     18  4  1  3  1  3  1  3  2  2   4   3   4              57           52
#> 19     19  1  3  3  2  2  2  3  2  3   2   3   2              74           69
#> 20     20  2  2  4  2  3  4  2  4  1   4   1   3              59           54
#> 21     21  2  3  3  2  3  2  4  4  1   1   3   3              79           74
#> 22     22  4  3  1  1  3  4  2  1  4   1   2   3              53           48
#> 23     23  3  3  4  3  4  1  3  4  3   2   2   2              45           40
#> 24     24  4  1  1  2  4  2  4  4  4   4   2   1              72           67
#> 25     25  4  3  1  3  3  4  3  2  3   3   4   2              77           72
#> 26     26  2  1  1  2  4  2  4  1  2   3   2   4              65           60
#> 27     27  1  1  1  1  1  1  3  3  2   2   1   1              54           49
#> 28     28  3  1  1  3  1  4  1  2  2   2   3   4              50          -23
#> 29     29  2  3  1  3  1  4  4  1  3   2   4   1              85           80
#> 30     30  3  1  2  1  3  1  2  4  1   1   2   4              85           80
#> 31     30  3  3  1  4  2  2  1  4  3   3   1   3              53           48
#>          onset baseline_date death_date
#> 1       bulbar    2003-03-26 2010-10-18
#> 2        bulba    2003-07-03 2019-06-24
#> 3       spinal    2007-01-27 9999-12-30
#> 4       bulbar    2010-11-27 2018-01-04
#> 5       bulbar    2006-10-25 2017-10-13
#> 6       spinal    2007-04-30 2010-05-08
#> 7       spinal    2002-11-15 2019-04-06
#> 8       spinal    2002-12-13 2018-05-04
#> 9       spinal    2005-06-02 2013-08-11
#> 10      bulbar    2004-06-02 2016-05-20
#> 11      bulbar    2007-03-09 2016-09-26
#> 12      bulbar    2005-01-11 2010-06-20
#> 13      bulbar    2010-12-22 2019-07-05
#> 14      bulbar    2008-10-14 2013-08-14
#> 15      spinal    2005-09-15 2010-07-20
#> 16      spinal    2007-07-05 2010-08-28
#> 17 respiratory    2002-08-19 2011-10-17
#> 18      spinal    2002-06-30 2020-12-17
#> 19 respiratory    2010-07-18 2016-05-15
#> 20      spinal    2004-08-15 2015-03-15
#> 21      bulbar    2006-04-07 2013-03-16
#> 22      bulbar    2002-06-01 2016-06-21
#> 23      bulbar    2007-08-12 2017-04-01
#> 24      bulbar    2006-08-12 2002-12-02
#> 25 respiratory    2006-08-11 2016-03-03
#> 26      spinal    2005-01-04 2011-10-05
#> 27 respiratory    2009-08-25 2015-03-11
#> 28      bulbar    2002-05-11 2017-11-09
#> 29      bulbar    2004-07-27 2014-03-27
#> 30      bulbar    2005-11-11 2015-05-30
#> 31      bulbar    2008-02-27 2014-07-05

QC mapping

The als_data_qc_mapping is an R list which contains 3 tables specifying all the tests used for quality control.

Missing

als_data_qc_mapping$missing
#> # A tibble: 13 × 3
#>    qc_type    variable type   
#>    <chr>      <chr>    <chr>  
#>  1 duplicated subjid   text   
#>  2 missing    p1       numeric
#>  3 missing    p2       numeric
#>  4 missing    p3       numeric
#>  5 missing    p4       numeric
#>  6 missing    p5       numeric
#>  7 missing    p6       numeric
#>  8 missing    p7       numeric
#>  9 missing    p8       numeric
#> 10 missing    p9       numeric
#> 11 missing    x1r      numeric
#> 12 missing    x2r      numeric
#> 13 missing    x3r      numeric

Inconsistencies

als_data_qc_mapping$inconsistencies
#> # A tibble: 2 × 6
#>   qc_type             variable1       type1   relation     variable2    type2  
#>   <chr>               <chr>           <chr>   <chr>        <chr>        <chr>  
#> 1 inconsistent_values age_at_baseline numeric greater_than age_at_onset numeric
#> 2 inconsistent_values baseline_date   date    lower_than   death_date   date

Out of range values

als_data_qc_mapping$range
#> # A tibble: 16 × 6
#>    qc_type variable        type        lower_value upper_value categories       
#>    <chr>   <chr>           <chr>       <chr>       <chr>       <chr>            
#>  1 range   p1              numeric     1           4           <NA>             
#>  2 range   p2              numeric     1           4           <NA>             
#>  3 range   p3              numeric     1           4           <NA>             
#>  4 range   p4              numeric     1           4           <NA>             
#>  5 range   p5              numeric     1           4           <NA>             
#>  6 range   p6              numeric     1           4           <NA>             
#>  7 range   p7              numeric     1           4           <NA>             
#>  8 range   p8              numeric     1           4           <NA>             
#>  9 range   p9              numeric     1           4           <NA>             
#> 10 range   x1r             numeric     1           4           <NA>             
#> 11 range   x2r             numeric     1           4           <NA>             
#> 12 range   x3r             numeric     1           4           <NA>             
#> 13 range   age_at_baseline numeric     20          100         <NA>             
#> 14 range   age_at_onset    numeric     20          100         <NA>             
#> 15 range   death_date      date        2000-01-01  2022-01-01  <NA>             
#> 16 range   onset           categorical <NA>        <NA>        bulbar, respirat…

qc_data function

qc_data takes as arguments the data to be quality controlled and the QC mapping containing the tests to be applied.

qc_data(als_data, als_data_qc_mapping)
#> # A tibble: 13 × 19
#>    subjid p1    p2    p3    p4    p5    p6    p7    p8    p9    x1r   x2r  
#>    <chr>  <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#>  1 30     3     1     2     1     3     1     2     4     1     1     2    
#>  2 30     3     3     1     4     2     2     1     4     3     3     1    
#>  3 11     1     3     1     3     3     4     1     <NA>  3     3     2    
#>  4 12     1     4     3     2     3     2     2     <NA>  1     3     2    
#>  5 13     1     1     4     1     1     3     4     <NA>  2     2     3    
#>  6 6      2     2     1     4     1     4     4     3     1     3     5    
#>  7 15     3     4     2     2     2     3     1     3     4     4     1    
#>  8 28     3     1     1     3     1     4     1     2     2     2     3    
#>  9 3      2     3     1     4     3     1     3     1     1     4     3    
#> 10 2      4     4     4     1     1     3     3     1     4     1     2    
#> 11 8      2     2     4     4     3     2     1     2     3     3     1    
#> 12 15     3     4     2     2     2     3     1     3     4     4     1    
#> 13 24     4     1     1     2     4     2     4     4     4     4     2    
#> # ℹ 7 more variables: x3r <chr>, age_at_baseline <chr>, age_at_onset <chr>,
#> #   onset <chr>, baseline_date <chr>, death_date <chr>, finding <chr>

This will return a table with all the findings. If you want to save it, you can specify the path to be saved in output_file.