| Variable | Missing |
|---|---|
| Slide_Label | 0 |
| Dx_Paige | 0 |
| Dx_Report | 0 |
| Dx_Research | 0 |
25 Reproducibility and QC
26 Reproducibility Plan
- Use
renvto lock package versions; avoid installing packages during render. Initialize once, snapshot after dependency changes, and commitrenv.lock. - Move data derivation into scripted steps (e.g.,
targets/drakeor a simple01_ingest.R+02_clean.Rpipeline) that read the raw Excel files in_first_phase/_second_phaseand write the_temp_*.RDSartifacts deterministically. - Set seeds for any resampling/bootstrap; record session info at the end of each render (keep
summary.qmdas is). - Document data provenance: source file name, timestamp, and row counts before/after filters; write these to a small metadata table that is saved alongside each
_temp_*.RDS.
27 Data Provenance Checklist
- Track how
_temp_agreement_decision.RDS,_temp_duration.RDS,_temp_all_data_duration_no_outlier.RDS, and_temp_subjective.RDSare created. Store the script path, input file hashes, and filter criteria in an attribute:
- Avoid manual editing of intermediate files; prefer regenerating from raw data.
- Keep a short
RUNME.mddescribing the order of scripts to rebuild all RDS/Excel outputs.
28 Quick Missingness and Outlier Screens
Note for Pathologists: Quality Control: Summary of missing data in key diagnostic fields.
Note for Pathologists: Quality Control: Sanity check for diagnosis duration, showing outlier counts (diagnoses taking > 300 seconds).
| n | over_300s | max_seconds |
|---|---|---|
| 6248 | 0 | 297.455 |
29 Rendering Hygiene
- Keep
_common.Rlimited to loading packages and shared options; perform installations only when setting up the environment (e.g.,renv::restore()before render). - Prefer chunk-level
eval: falsefor exploratory code instead of commenting out blocks; retain current analyses untouched. - When adding new figures/tables, place assets in dedicated folders (e.g.,
img_qc) to avoid clashing with existing outputs.