Pre-processing of clinical data for clinical data review report

Laure Cougnaud, Michela Pasetto

June 17, 2024

This vignette shows functionalities used for annotating and filtering the data within the clinDataReview package.

Utility functions to automate standard pre-processing steps of the data are available in the package.

Note that these functions are mainly useful in combination with the specification of the parameters in ‘config’ file in the clinical data reports (see the dedicated reporting vignette).

For this vignette, we will use example data available in the clinUtils package.

library(clinDataReview)

1 Data format

The input dataset for the clinical data review should be a data.frame with clinical data. Such data is typically imported from SAS data file or xpt data file.
Such dataset can be imported for multiple files at once via the clinUtils::loadDataADaMSDTM function.

The label of the variables stored in the SAS datasets is also used for title/captions.

A few ADaM datasets are included in the clinUtils package for the demonstration, via the dataset dataADaMCDISCP01 and corresponding variable labels.

library(clinUtils)

data(dataADaMCDISCP01)
labelVars <- attr(dataADaMCDISCP01, "labelVars")

dataLB <- dataADaMCDISCP01$ADLBC
dataDM <- dataADaMCDISCP01$ADSL
dataAE <- dataADaMCDISCP01$ADAE

2 Annotate data

The annotateData enables to add metadata for a specific domain/dataset.

dataLBAnnot <- annotateData(
    data = dataLB, 
    annotations = list(data = dataDM, vars = c("ETHNIC", "ARM")), 
    verbose = TRUE
)
## Data annotated with variable(s): ETHNIC ('ETHNIC'), ARM ('ARM') from the 'custom' dataset based on the variable(s):  USUBJID ('USUBJID').
knitr::kable(
    head(dataLBAnnot), 
    caption = paste("Laboratory parameters annotated with",
        "demographics information with the `annotatedData` function"
    )
)
Laboratory parameters annotated with demographics information with the annotatedData function
STUDYID SUBJID USUBJID TRTP TRTPN TRTA TRTAN TRTSDT TRTEDT AGE AGEGR1 AGEGR1N RACE RACEN SEX COMP24FL DSRAEFL SAFFL AVISIT AVISITN ADY ADT VISIT VISITNUM PARAM PARAMCD PARAMN PARCAT1 AVAL BASE CHG A1LO A1HI R2A1LO R2A1HI BR2A1LO BR2A1HI ANL01FL ALBTRVAL ANRIND BNRIND ABLFL AENTMTFL LBSEQ LBNRIND LBSTRESN DATASET ETHNIC ARM
CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81 Xanomeline High Dose 81 2013-08-23 2014-02-20 57 <65 1 WHITE 1 M Y Y Baseline 0 -9 2013-08-14 SCREENING 1 1 Sodium (mmol/L) SODIUM 18 CHEM 139.00 139.00 NA 132.0 147.0 1.053030 0.9455782 1.053030 0.9455782 81.50 N N Y 26 NORMAL 139.00 ADLBC NOT HISPANIC OR LATINO Xanomeline High Dose
CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81 Xanomeline High Dose 81 2013-08-23 2014-02-20 57 <65 1 WHITE 1 M Y Y Baseline 0 -9 2013-08-14 SCREENING 1 1 Potassium (mmol/L) K 19 CHEM 4.00 4.00 NA 3.4 5.4 1.176471 0.7407407 1.176471 0.7407407 4.10 N N Y 19 NORMAL 4.00 ADLBC NOT HISPANIC OR LATINO Xanomeline High Dose
CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81 Xanomeline High Dose 81 2013-08-23 2014-02-20 57 <65 1 WHITE 1 M Y Y Baseline 0 -9 2013-08-14 SCREENING 1 1 Chloride (mmol/L) CL 20 CHEM 109.00 109.00 NA 94.0 112.0 1.159574 0.9732143 1.159574 0.9732143 62.00 N N Y 11 NORMAL 109.00 ADLBC NOT HISPANIC OR LATINO Xanomeline High Dose
CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81 Xanomeline High Dose 81 2013-08-23 2014-02-20 57 <65 1 WHITE 1 M Y Y Baseline 0 -9 2013-08-14 SCREENING 1 1 Bilirubin (umol/L) BILI 21 CHEM 8.55 8.55 NA 3.0 21.0 2.850000 0.4071429 2.850000 0.4071429 22.95 N N Y 6 NORMAL 8.55 ADLBC NOT HISPANIC OR LATINO Xanomeline High Dose
CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81 Xanomeline High Dose 81 2013-08-23 2014-02-20 57 <65 1 WHITE 1 M Y Y Baseline 0 -9 2013-08-14 SCREENING 1 1 Alkaline Phosphatase (U/L) ALP 22 CHEM 88.00 88.00 NA 31.0 110.0 2.838710 0.8000000 2.838710 0.8000000 77.00 N N Y 2 NORMAL 88.00 ADLBC NOT HISPANIC OR LATINO Xanomeline High Dose
CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81 Xanomeline High Dose 81 2013-08-23 2014-02-20 57 <65 1 WHITE 1 M Y Y Baseline 0 -9 2013-08-14 SCREENING 1 1 Gamma Glutamyl Transferase (U/L) GGT 23 CHEM 43.00 43.00 NA 10.0 61.0 4.300000 0.7049180 4.300000 0.7049180 48.50 N N Y 15 NORMAL 43.00 ADLBC NOT HISPANIC OR LATINO Xanomeline High Dose

3 Filter data

The filterData enables to filter a dataset.

dataLBAnnotTreatment <- filterData(
    data = dataLBAnnot, 
    filters = list(var = "ARM", value = "Placebo", rev = TRUE), 
    verbose = TRUE
)
## 354 records with ARM ('ARM') %in% 'Placebo' are filtered in the data.
knitr::kable(
    unique(dataLBAnnotTreatment[, c("USUBJID", "ARM")]), 
    caption = paste("Subset of laboratory parameters filtered",
        "with placebo patients"
    )
)
Subset of laboratory parameters filtered with placebo patients
USUBJID ARM
1 01-701-1148 Xanomeline High Dose
397 01-701-1192 Xanomeline Low Dose
793 01-701-1211 Xanomeline Low Dose
1363 01-718-1371 Xanomeline High Dose
1615 01-718-1427 Xanomeline High Dose

4 Transform data

The transformData enables to convert data to a different format.

For example, the laboratory data is converted from a long format, containing one record per endpoint * visit * subject to a wide format containing one record per visit * subject. The endpoints are included in different columns.

eDishData <- transformData(
    data = subset(dataLB, PARAMCD %in% c("ALT", "BILI")),
    transformations = list(
        type = "pivot_wider",
        varsID = c("USUBJID", "VISIT"), 
        varsValue = c("LBSTRESN", "LBNRIND"),
        varPivot = "PARAMCD"
    ),
    verbose = TRUE,
    labelVars = labelVars
)
## Warning in reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying, : some constant variables
## (AVISIT,AVISITN,PARAM,PARAMN,AVAL,BASE,CHG,A1LO,A1HI,R2A1LO,R2A1HI,BR2A1LO,BR2A1HI,ANL01FL,ALBTRVAL,LBSEQ) are really varying
## Warning in reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying, : multiple rows match for PARAMCD=BILI: first taken
## Warning in reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying, : multiple rows match for PARAMCD=ALT: first taken
## Data is converted to a wide format with variables: 'LBSTRESN', 'LBNRIND' for different: 'PARAMCD' by 'Unique Subject Identifier', 'Visit Name' pivoted to different columns.
knitr::kable(head(eDishData))
STUDYID SUBJID USUBJID TRTP TRTPN TRTA TRTAN TRTSDT TRTEDT AGE AGEGR1 AGEGR1N RACE RACEN SEX COMP24FL DSRAEFL SAFFL AVISIT AVISITN ADY ADT VISIT VISITNUM PARAM PARAMN PARCAT1 AVAL BASE CHG A1LO A1HI R2A1LO R2A1HI BR2A1LO BR2A1HI ANL01FL ALBTRVAL ANRIND BNRIND ABLFL AENTMTFL LBSEQ DATASET LBSTRESN.BILI LBNRIND.BILI LBSTRESN.ALT LBNRIND.ALT
4 CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81 Xanomeline High Dose 81 2013-08-23 2014-02-20 57 <65 1 WHITE 1 M Y Y Baseline 0 -9 2013-08-14 SCREENING 1 1 Bilirubin (umol/L) 21 CHEM 8.55 8.55 NA 3 21 2.85 0.4071429 2.85 0.4071429 22.95 N N Y 6 ADLBC 8.55 NORMAL 34 NORMAL
40 CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81 Xanomeline High Dose 81 2013-08-23 2014-02-20 57 <65 1 WHITE 1 M Y Y Week 2 2 14 2013-09-05 WEEK 2 4 Bilirubin (umol/L) 21 CHEM 8.55 8.55 0.00 3 21 2.85 0.4071429 2.85 0.4071429 22.95 N N 43 ADLBC 8.55 NORMAL 41 NORMAL
76 CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81 Xanomeline High Dose 81 2013-08-23 2014-02-20 57 <65 1 WHITE 1 M Y Y Week 4 4 28 2013-09-19 WEEK 4 5 Bilirubin (umol/L) 21 CHEM 8.55 8.55 0.00 3 21 2.85 0.4071429 2.85 0.4071429 22.95 N N 78 ADLBC 8.55 NORMAL 35 NORMAL
112 CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81 Xanomeline High Dose 81 2013-08-23 2014-02-20 57 <65 1 WHITE 1 M Y Y Week 6 6 42 2013-10-03 WEEK 6 7 Bilirubin (umol/L) 21 CHEM 8.55 8.55 0.00 3 21 2.85 0.4071429 2.85 0.4071429 22.95 N N 108 ADLBC 8.55 NORMAL 31 NORMAL
148 CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81 Xanomeline High Dose 81 2013-08-23 2014-02-20 57 <65 1 WHITE 1 M Y Y Week 8 8 57 2013-10-18 WEEK 8 8 Bilirubin (umol/L) 21 CHEM 8.55 8.55 0.00 3 21 2.85 0.4071429 2.85 0.4071429 22.95 N N 138 ADLBC 8.55 NORMAL 31 NORMAL
184 CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81 Xanomeline High Dose 81 2013-08-23 2014-02-20 57 <65 1 WHITE 1 M Y Y Week 12 12 87 2013-11-17 WEEK 12 9 Bilirubin (umol/L) 21 CHEM 6.84 8.55 -1.71 3 21 2.28 0.3257143 2.85 0.4071429 Y 24.66 N N 168 ADLBC 6.84 NORMAL 39 NORMAL

5 Process data

The processData function executes all the pre-processing steps described in the previous section at once.

dataLBAnnotTreatment2 <- processData(
    data = dataLB,
    processing = list(
        list(annotate = list(data = dataDM, vars = c("ETHNIC", "ARM"))),
        list(filter = list(var = "ARM", value = "Placebo", rev = TRUE))
    ),
    verbose = TRUE
)

identical(dataLBAnnotTreatment, dataLBAnnotTreatment2)

[1] TRUE

6 Appendix

6.1 Session info

R version 4.4.0 (2024-04-24) Platform: x86_64-pc-linux-gnu Running under: Ubuntu 22.04.4 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0

locale: [1] C

time zone: Etc/UTC tzcode source: system (glibc)

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] clinUtils_0.2.0 clinDataReview_1.6.1 knitr_1.47

loaded via a namespace (and not attached): [1] plotly_4.10.4 sass_0.4.9 utf8_1.2.4 generics_0.1.3 tidyr_1.3.1 xml2_1.3.6 stringi_1.8.4 jsonvalidate_1.3.2 [9] hms_1.1.3 digest_0.6.35 magrittr_2.0.3 evaluate_0.24.0 grid_4.4.0 bookdown_0.39 fastmap_1.2.0 plyr_1.8.9
[17] jsonlite_1.8.8 httr_1.4.7 purrr_1.0.2 fansi_1.0.6 crosstalk_1.2.1 viridisLite_0.4.2 scales_1.3.0 lazyeval_0.2.2
[25] jquerylib_0.1.4 cli_3.6.2 rlang_1.1.4 munsell_0.5.1 base64enc_0.1-3 cachem_1.1.0 yaml_2.3.8 tools_4.4.0
[33] parallel_4.4.0 dplyr_1.1.4 colorspace_2.1-0 ggplot2_3.5.1 DT_0.33 forcats_1.0.0 vctrs_0.6.5 R6_2.5.1
[41] lifecycle_1.0.4 stringr_1.5.1 htmlwidgets_1.6.4 pkgconfig_2.0.3 pillar_1.9.0 bslib_0.7.0 gtable_0.3.5 data.table_1.15.4 [49] glue_1.7.0 Rcpp_1.0.12 haven_2.5.4 xfun_0.44 tibble_3.2.1 tidyselect_1.2.1 htmltools_0.5.8.1 rmarkdown_2.27
[57] compiler_4.4.0