Running the same sample multiple times results in different PASS/FAIL result #39

kmavrommatis · 2024-12-05T10:37:38Z

Hi,
I have a very fragmented tumor sample which has been segmented by various methods (FACETS, SCLUST, Battenberg) and i am using CNAqc to check its quality and potentially identify the best approach to analyze this genome.
My thoughts are that given the current segmentation I could use CNAqc to identify the segments that are more likely to be correct vs more likely to be wrong. Furthermore, I am hoping that if a sample fails I can get rerun it with different arguments through the same algorithms e.g. provide a different ploidy estimate etc.

I have noticed though that if I run the same cnaqc object through the analyze_peaks function multiple times I get different results.
For example

Running:

xlist=lapply( 1:10, function(y){analyze_peaks(x, n_bootstrap=10)}) %>% magrittr::set_names( paste0( "rep_",1:10))

I get

sapply(xlist, function(x){
    x$peaks_analysis$QC
})
  rep_1 rep_2 rep_3 rep_4 rep_5 rep_6 rep_7 rep_8 rep_9 rep_10
  <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 
FAIL  PASS  FAIL  FAIL  PASS  FAIL  PASS  PASS  FAIL  PASS

While

xlist2=lapply( 1:10, function(y){analyze_peaks(x)}) %>% magrittr::set_names( paste0( "rep_",1:10))

I get

sapply(xlist2, function(x){
    x$peaks_analysis$QC
})
  rep_1 rep_2 rep_3 rep_4 rep_5 rep_6 rep_7 rep_8 rep_9 rep_10
  <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 
PASS  PASS  PASS  PASS  PASS  FAIL  PASS  PASS  PASS  PASS

These samples fail because their purity correction is high (> 8%). However, in the instances where the samples PASS the purity correction is <2%.
Any advice on this?
Thanks
K

The text was updated successfully, but these errors were encountered:

caravagn · 2024-12-17T14:12:07Z

Hi, can you post the basic plots of the genome? segmentation, statistics for the QCed segments, VAFs, peak analysis etc.

It's hard otherwise give you a helpful answer.

G

caravagn · 2024-12-17T14:12:28Z

@vvvirgy please provide some support if possible.

kmavrommatis · 2025-01-07T17:22:21Z

cnas.rds.gz
mutations.rds.gz

Hi, thanks for your response.
I am not sure what information would be useful so I attach two files, one with mutations and the other with the CN segmentation and here is the code I run:

reps=10
seed=12323

 x=CNAqc::init(
    mutations=mut,  # mutations predicted using GATK Mutect2 on tumor/matched normal 
    cna=cnvs,
    purity=0.76,   #purity is estimated by SCLUST
    sample='test',
    ref='hg38'
  )
set.seed(seed)
  xlist2=rep(list(x) , reps )

xlist2=mclapply( xlist2, function(y){
      CNAqc::analyze_peaks(y)
    }, mc.cores=10)
xlist2=xlist2  %>% magrittr::set_names( paste0( "rep_",1:reps))


qc = sapply(xlist2, function(x){x$peaks_analysis$QC})
qc

returns

 rep_1  rep_2  rep_3  rep_4  rep_5  rep_6  rep_7  rep_8  rep_9 rep_10 
"FAIL" "PASS" "PASS" "FAIL" "PASS" "PASS" "FAIL" "PASS" "PASS" "FAIL"

R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 14.7.2
CNAqc_1.0.0

graceooh · 2025-01-09T22:41:40Z

Hi, thank you for this great tool. I would also like to back this up and say I had similar experiences where the same input data gave different PASS and FAIL statuses after peaks analysis.

caravagn · 2025-01-11T11:43:01Z

require(dplyr)

reps=10
seed=12323

mut = readRDS("~/Downloads/mutations.rds")
cnvs = readRDS("~/Downloads/cnas.rds")

x=CNAqc::init(
  mutations=mut,  # mutations predicted using GATK Mutect2 on tumor/matched normal 
  cna=cnvs,
  purity=0.76,   #purity is estimated by SCLUST
  sample='test',
  ref='hg38'
)

require(CNAqc)

plot_segments(x)

# Check for arm overfragmentation
x = CNAqc::detect_arm_overfragmentation(x)
plot_arm_fragmentation(x)

# Inspect by chr
inspect_segment(x, n = 10)

plot_data_histogram(x, which = 'VAF')

# Your code now
set.seed(seed)

xlist2=rep(list(x) , reps )
xlist2=parallel::mclapply( xlist2, function(y){
  CNAqc::analyze_peaks(y)
}, mc.cores=10)

xlist2=xlist2  %>% magrittr::set_names( paste0( "rep_",1:reps))


qc = sapply(xlist2, function(x){x$peaks_analysis$QC})
qc

# Now,look at one PASS/ FAIL, you realise that peaks are WEAK in your 
# data for both 1:0 and 2:1 
xlist2[[2]] %>% plot_peaks_analysis()
xlist2[[1]] %>% plot_peaks_analysis()

# now, the randomness is due to the Bmix analysis (mixture model) inside
# CNAqc. That algorithm samples initial conditions randomly and is obvisouly
# sensitive to the fact that you barely have peaks in 1:0/2:1.

# nonetheless, the mclapply you use is not required. You should use the
# bootstrap function by CNAqc that is done exactly to do that and compute
# an aggregated score. This is also more generous to work with your data 
# that has these complex peaks
x = CNAqc::analyze_peaks(x, n_bootstrap = 10)
plot_peaks_analysis(x)

# If you wanto to double check I would suggest
x_chr = x %>% 
  split_by_chromosome() %>% 
  lapply(function(w) {analyze_peaks(w)})

Now, this sample is difficult because, for a VAF-peak analysis it basically has NO signal in 1:0 and 2:1 states. I am inclined to think that:

your purity estimate is OK
your diploid calls are OK
your segmentation is OK
I would not trust too much 1:0 and 2:1 states because they miss the peak expected at high frequency

The last point: those 1:0 and 2:1 could be true, of course. If true, they would suggest that those CNAs have happened very late in the evolution, and that the sample MRCA collapses (in time) towards that instant. So they would be recent events.

Honestly, if these events are important to you, you need to digthe BAF and DR and double check with those too.

That's what we do usually.

kmavrommatis · 2025-01-14T13:35:48Z

Thank you,
that is very helpful.

Implementing your comments generated more questions :) that I will ask in a subsequent issue.
Thanks again for your help

caravagn assigned vvvirgy Dec 17, 2024

caravagn added the help wanted Extra attention is needed label Dec 17, 2024

kmavrommatis mentioned this issue Jan 14, 2025

Running CNAqc analyze_peaks by chromosome does not populate the object for fragmented samples #40

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running the same sample multiple times results in different PASS/FAIL result #39

Running the same sample multiple times results in different PASS/FAIL result #39

kmavrommatis commented Dec 5, 2024

caravagn commented Dec 17, 2024

caravagn commented Dec 17, 2024

kmavrommatis commented Jan 7, 2025

graceooh commented Jan 9, 2025

caravagn commented Jan 11, 2025

kmavrommatis commented Jan 14, 2025

Running the same sample multiple times results in different PASS/FAIL result #39

Running the same sample multiple times results in different PASS/FAIL result #39

Comments

kmavrommatis commented Dec 5, 2024

caravagn commented Dec 17, 2024

caravagn commented Dec 17, 2024

kmavrommatis commented Jan 7, 2025

graceooh commented Jan 9, 2025

caravagn commented Jan 11, 2025

kmavrommatis commented Jan 14, 2025