CNV Autoencoder Experiment Summary Report

Report Date: 2025-12-18
Generated: 2025-12-19 00:16:50
Experiments: 4

1. Executive Summary

Objective: Detect Copy Number Variations (CNVs) in cell-free DNA (cfDNA) samples using a Variational Autoencoder (VAE) trained on normal samples. Anomaly detection is based on reconstruction error - normal samples should have low error, while cfDNA samples with cancer-derived CNVs should have high error.

4
Experiments Conducted
0.907
Best AUROC (target: 0.8)
3.03x
Best Separation (target: 1.5x)
Target Met
Overall Status

Key Finding: Over 4 experiments, best AUROC=0.907 (Δ+0.082 from first), best Separation=3.03x (Δ+0.16x from first). All targets achieved!

2. Data Overview

Dataset Samples Features Notes
Normal (Train) 395 21,574 genes ENCODE blacklist filtered, gene-level aggregation
Normal (Val) 49
Normal (Test) 49
cfDNA (Test) 500

3. Experiments Comparison

3.1 Summary Table

Note: For holdout experiments, Test = held-out normal test subset vs cfDNA (primary). Optional AUROC shown in parentheses is computed using all normal samples vs cfDNA and is for reference only (not a held-out split).

Metric holdout_latent16_beta0holdout
2025-12-18
holdout_latent16_beta001holdout
2025-12-18
holdout_latent32_beta0holdout
2025-12-18
holdout_latent32_beta001holdout
2025-12-18
Target
AUROC Test: 0.825
(All normals: 0.804)
Test: 0.754
(All normals: 0.711)
Test: 0.907
(All normals: 0.900)
Test: 0.741
(All normals: 0.709)
≥ 0.8
Separation Ratio Test: 2.87x Test: 1.64x Test: 3.03x Test: 1.53x ≥ 1.5x
Cohen's d 0.50 0.51 0.68 0.47 ≥ 0.5
AUPRC 0.976
(All: 0.775)
0.968
(All: 0.715)
0.988
(All: 0.871)
0.964
(All: 0.706)
≥ 0.5
Sensitivity @ 95% Spec 43.2% 41.0% 66.0% 33.6% ≥ 30%

3.2 Metric Trends

Metric Trends Chart
Figure: Performance metrics across experiments. Dashed lines show targets.

4. Experiment Details

1. holdout_latent16_beta0holdout

Run ID: 88bc9ed08ea542a1a7840216703eca66 | Date: 2025-12-18 00:30 - 00:33 CST | Duration: ~3 minutes | Status: FINISHED | Sample Context: Holdout Test (49 Normal vs 500 cfDNA)

Configuration

Parameter Value Parameter Value
Latent Dim 16 Learning Rate 0.0003
Hidden Channels [16, 32, 64] Beta Max 0.0
Kernel Sizes [15, 9, 5] Beta Anneal Epochs 20

Results (Figures use holdout test samples)

Test = held-out normal test subset vs cfDNA (primary). All normals (when shown) uses all normal samples vs cfDNA for reference only.

Metric Test (Holdout) All Normals (Ref) Target Status
AUROC 0.825 0.804 ≥ 0.8 PASS
AUPRC 0.976 0.775 ≥ 0.5 PASS
Separation Ratio 2.87x - ≥ 1.5x PASS
Cohen's d 0.50 - ≥ 0.5 PASS
Normal Error (mean ± std) 0.028 ± 0.033 - -
cfDNA Error (mean ± std) 0.080 ± 0.109 - -

Operating Point Analysis

Trade-offs between sensitivity and specificity at different thresholds. For cancer screening, high sensitivity (95%+) is typically prioritized.

Specificity Sensitivity (Recall) FPR Threshold Note
99.0% 7.8% 2.0% 0.2466
95.0% 40.0% 6.1% 0.0436
90.0% 48.6% 10.2% 0.0369
80.0% 63.0% 20.4% 0.0289
46.9% 95.0% 53.1% 0.0202 95% sensitivity operating point

Figures

train_error_distribution.png
Training Error Distribution (Normal vs cfDNA)
roc_curve.png
ROC Curve
roc_curve_all_samples.png
ROC Curve (All Samples - for comparison)
fig1_error_distribution.png
Detailed Error Distribution with Statistics
fig2_sashimi_plot.png
Genome-wide Error Profile (Sashimi Plot)
fig2_sashimi_plot_difference.png
Error Difference (cfDNA - Normal)
fig3_sample_profiles.png
Sample Reconstruction Profiles

2. holdout_latent16_beta001holdout

Run ID: 1ef6b476f675419e972f31f27625e741 | Date: 2025-12-18 00:33 - 00:34 CST | Duration: ~1 minute | Status: FINISHED | Sample Context: Holdout Test (49 Normal vs 500 cfDNA)

Configuration

Parameter Value Parameter Value
Latent Dim 16 Learning Rate 0.0003
Hidden Channels [16, 32, 64] Beta Max 0.01
Kernel Sizes [15, 9, 5] Beta Anneal Epochs 20

Results (Figures use holdout test samples)

Test = held-out normal test subset vs cfDNA (primary). All normals (when shown) uses all normal samples vs cfDNA for reference only.

Metric Test (Holdout) All Normals (Ref) Target Status
AUROC 0.754 0.711 ≥ 0.8 FAIL
AUPRC 0.968 0.715 ≥ 0.5 PASS
Separation Ratio 1.64x - ≥ 1.5x PASS
Cohen's d 0.51 - ≥ 0.5 PASS
Normal Error (mean ± std) 0.048 ± 0.031 - -
cfDNA Error (mean ± std) 0.079 ± 0.063 - -

Operating Point Analysis

Trade-offs between sensitivity and specificity at different thresholds. For cancer screening, high sensitivity (95%+) is typically prioritized.

Specificity Sensitivity (Recall) FPR Threshold Note
99.0% 3.6% 2.0% 0.2610
95.0% 40.0% 6.1% 0.0648
90.0% 62.0% 10.2% 0.0494
80.0% 64.0% 20.4% 0.0485
0.0% 100.0% 100.0% 0.0283 95% sensitivity operating point

Figures

train_error_distribution.png
Training Error Distribution (Normal vs cfDNA)
roc_curve.png
ROC Curve
roc_curve_all_samples.png
ROC Curve (All Samples - for comparison)
fig1_error_distribution.png
Detailed Error Distribution with Statistics
fig2_sashimi_plot.png
Genome-wide Error Profile (Sashimi Plot)
fig2_sashimi_plot_difference.png
Error Difference (cfDNA - Normal)
fig3_sample_profiles.png
Sample Reconstruction Profiles

3. holdout_latent32_beta0holdout

Run ID: dfef8db798bf42dbbfc77f5be3aa59ec | Date: 2025-12-18 00:33 - 00:36 CST | Duration: ~3 minutes | Status: FINISHED | Sample Context: Holdout Test (49 Normal vs 500 cfDNA)

Configuration

Parameter Value Parameter Value
Latent Dim 32 Learning Rate 0.0003
Hidden Channels [16, 32, 64] Beta Max 0.0
Kernel Sizes [15, 9, 5] Beta Anneal Epochs 20

Results (Figures use holdout test samples)

Test = held-out normal test subset vs cfDNA (primary). All normals (when shown) uses all normal samples vs cfDNA for reference only.

Metric Test (Holdout) All Normals (Ref) Target Status
AUROC 0.907 0.900 ≥ 0.8 PASS
AUPRC 0.988 0.871 ≥ 0.5 PASS
Separation Ratio 3.03x - ≥ 1.5x PASS
Cohen's d 0.68 - ≥ 0.5 PASS
Normal Error (mean ± std) 0.015 ± 0.012 - -
cfDNA Error (mean ± std) 0.045 ± 0.046 - -

Operating Point Analysis

Trade-offs between sensitivity and specificity at different thresholds. For cancer screening, high sensitivity (95%+) is typically prioritized.

Specificity Sensitivity (Recall) FPR Threshold Note
99.0% 11.6% 2.0% 0.0952
95.0% 63.8% 6.1% 0.0224
90.0% 73.8% 10.2% 0.0189
80.0% 87.2% 22.4% 0.0157
63.3% 95.2% 36.7% 0.0129 95% sensitivity operating point

Figures

train_error_distribution.png
Training Error Distribution (Normal vs cfDNA)
roc_curve.png
ROC Curve
roc_curve_all_samples.png
ROC Curve (All Samples - for comparison)
fig1_error_distribution.png
Detailed Error Distribution with Statistics
fig2_sashimi_plot.png
Genome-wide Error Profile (Sashimi Plot)
fig2_sashimi_plot_difference.png
Error Difference (cfDNA - Normal)
fig3_sample_profiles.png
Sample Reconstruction Profiles
latent_space_analysis.png
Latent Space Analysis (PCA, t-SNE, ROC)

Key Observations

4. holdout_latent32_beta001holdout

Run ID: 4277be6f338b4f0ea44f58c2cc668fca | Date: 2025-12-18 00:36 - 00:37 CST | Duration: ~1 minute | Status: FINISHED | Sample Context: Holdout Test (49 Normal vs 500 cfDNA)

Configuration

Parameter Value Parameter Value
Latent Dim 32 Learning Rate 0.0003
Hidden Channels [16, 32, 64] Beta Max 0.01
Kernel Sizes [15, 9, 5] Beta Anneal Epochs 20

Results (Figures use holdout test samples)

Test = held-out normal test subset vs cfDNA (primary). All normals (when shown) uses all normal samples vs cfDNA for reference only.

Metric Test (Holdout) All Normals (Ref) Target Status
AUROC 0.741 0.709 ≥ 0.8 FAIL
AUPRC 0.964 0.706 ≥ 0.5 PASS
Separation Ratio 1.53x - ≥ 1.5x PASS
Cohen's d 0.47 - ≥ 0.5 FAIL
Normal Error (mean ± std) 0.040 ± 0.031 - -
cfDNA Error (mean ± std) 0.060 ± 0.045 - -

Operating Point Analysis

Trade-offs between sensitivity and specificity at different thresholds. For cancer screening, high sensitivity (95%+) is typically prioritized.

Specificity Sensitivity (Recall) FPR Threshold Note
99.0% 1.2% 2.0% 0.2468
95.0% 32.0% 6.1% 0.0557
90.0% 55.0% 10.2% 0.0413
80.0% 66.6% 20.4% 0.0378
0.0% 100.0% 100.0% 0.0212 95% sensitivity operating point

Figures

train_error_distribution.png
Training Error Distribution (Normal vs cfDNA)
roc_curve.png
ROC Curve
roc_curve_all_samples.png
ROC Curve (All Samples - for comparison)
fig1_error_distribution.png
Detailed Error Distribution with Statistics
fig2_sashimi_plot.png
Genome-wide Error Profile (Sashimi Plot)
fig2_sashimi_plot_difference.png
Error Difference (cfDNA - Normal)
fig3_sample_profiles.png
Sample Reconstruction Profiles

5. Synthesized Analysis

What Improved
What Didn't Work

Remaining Gaps to Targets

Metric Current Best Target Gap % to Target
AUROC 0.907 ≥ 0.8 Met 113%
Separation Ratio 3.03x ≥ 1.5x Met 202%
Cohen's d 0.92 ≥ 0.5 Met 184%

5.5 Latent Space Analysis (Best Model)

Analysis of the latent space representation for the best performing model. Shows how normal and cfDNA samples are encoded.

Latent Space Analysis
Figure: Latent space visualization showing PCA, t-SNE projections, error distributions, and ROC curve for the best model.
Key Observations

6. Suggested Next Direction

Synthesized from analysis of all 4 experiments and parsed from the latest experiment report.

7. Appendix

7.1 Full Configuration Comparison

Parameter holdout_latent16_beta0 holdout_latent16_beta001 holdout_latent32_beta0 holdout_latent32_beta001
latent_dim 16 16 32 32
hidden_channels [16, 32, 64] [16, 32, 64] [16, 32, 64] [16, 32, 64]
kernel_sizes [15, 9, 5] [15, 9, 5] [15, 9, 5] [15, 9, 5]
learning_rate 0.0003 0.0003 0.0003 0.0003
beta_max 0.0 0.01 0.0 0.01
beta_anneal_epochs 20 20 20 20
clip_values 3.0 3.0 3.0 3.0
batch_size 32 32 32 32
epochs 200 200 200 200
patience 30 30 30 30

7.2 Reproducibility Commands

holdout_latent16_beta0 (88bc9ed0)

uv run python scripts/train.py \
    --normal-data /data/cnv_autoencoder/processed/gene_aggregated/normal_batch.npz \
    --cfdna-data /data/cnv_autoencoder/processed/gene_aggregated/cfdna_batch.npz \
    --checkpoint-dir /data/cnv_autoencoder/checkpoints/holdout_latent16_beta0 \
    --mlflow-uri /data/cnv_autoencoder/mlflow \
    --experiment-name cnv_vae_holdout_split \
    --run-name holdout_latent16_beta0 \
    --exclude-list configs/sample_exclude_list.csv \
    --test-split 0.1 \
    --latent-dim 16 \
    --hidden-channels 16 32 64 \
    --kernel-sizes 15 9 5 \
    --learning-rate 0.0003 \
    --beta-max 0.0 \
    --clip-values 3.0 \
    --epochs 200 \
    --patience 30 \
    --seed 42

holdout_latent16_beta001 (1ef6b476)

uv run python scripts/train.py \
    --normal-data /data/cnv_autoencoder/processed/gene_aggregated/normal_batch.npz \
    --cfdna-data /data/cnv_autoencoder/processed/gene_aggregated/cfdna_batch.npz \
    --checkpoint-dir /data/cnv_autoencoder/checkpoints/holdout_latent16_beta001 \
    --mlflow-uri /data/cnv_autoencoder/mlflow \
    --experiment-name cnv_vae_holdout_split \
    --run-name holdout_latent16_beta001 \
    --exclude-list configs/sample_exclude_list.csv \
    --test-split 0.1 \
    --latent-dim 16 \
    --hidden-channels 16 32 64 \
    --kernel-sizes 15 9 5 \
    --learning-rate 0.0003 \
    --beta-max 0.01 \
    --clip-values 3.0 \
    --epochs 200 \
    --patience 30 \
    --seed 42

holdout_latent32_beta0 (dfef8db7)

uv run python scripts/train.py \
    --normal-data /data/cnv_autoencoder/processed/gene_aggregated/normal_batch.npz \
    --cfdna-data /data/cnv_autoencoder/processed/gene_aggregated/cfdna_batch.npz \
    --checkpoint-dir /data/cnv_autoencoder/checkpoints/holdout_latent32_beta0 \
    --mlflow-uri /data/cnv_autoencoder/mlflow \
    --experiment-name cnv_vae_holdout_split \
    --run-name holdout_latent32_beta0 \
    --exclude-list configs/sample_exclude_list.csv \
    --test-split 0.1 \
    --latent-dim 32 \
    --hidden-channels 16 32 64 \
    --kernel-sizes 15 9 5 \
    --learning-rate 0.0003 \
    --beta-max 0.0 \
    --clip-values 3.0 \
    --epochs 200 \
    --patience 30 \
    --seed 42

holdout_latent32_beta001 (4277be6f)

uv run python scripts/train.py \
    --normal-data /data/cnv_autoencoder/processed/gene_aggregated/normal_batch.npz \
    --cfdna-data /data/cnv_autoencoder/processed/gene_aggregated/cfdna_batch.npz \
    --checkpoint-dir /data/cnv_autoencoder/checkpoints/holdout_latent32_beta001 \
    --mlflow-uri /data/cnv_autoencoder/mlflow \
    --experiment-name cnv_vae_holdout_split \
    --run-name holdout_latent32_beta001 \
    --exclude-list configs/sample_exclude_list.csv \
    --test-split 0.1 \
    --latent-dim 32 \
    --hidden-channels 16 32 64 \
    --kernel-sizes 15 9 5 \
    --learning-rate 0.0003 \
    --beta-max 0.01 \
    --clip-values 3.0 \
    --epochs 200 \
    --patience 30 \
    --seed 42

7.3 Figure Paths Reference

Experiment Analysis Directory
holdout_latent16_beta0 /data/cnv_autoencoder/analysis/88bc9ed08ea542a1a7840216703eca66/
holdout_latent16_beta001 /data/cnv_autoencoder/analysis/1ef6b476f675419e972f31f27625e741/
holdout_latent32_beta0 /data/cnv_autoencoder/analysis/dfef8db798bf42dbbfc77f5be3aa59ec/
holdout_latent32_beta001 /data/cnv_autoencoder/analysis/4277be6f338b4f0ea44f58c2cc668fca/