Skip to content

checking the effect of using two mono culture vs using a fake co culture during misassignment calculation

Xin He edited this page May 27, 2020 · 4 revisions

For some experiment, we do not have the required co-culture samples to estimate the misassignment rate in the triple-culture. In this case, we can either use the 2 mono-culture samples or to merge them into a fake co-culture sample.

Here we trying to understand how this affects the final result and why. We use the bbb round3 data(rat) for testing.

  • most difference genes in turns of misassignment rate, due to reference samples.
# A tibble: 20 x 5
   gene               gene_name      p_mono   p_co difference
   <chr>              <chr>           <dbl>  <dbl>      <dbl>
 1 ENSRNOG00000012199 Sox2           4490.  2598.      1892. 
 2 ENSRNOG00000046657 AABR07008242.1  193.  1483.      1291. 
 3 ENSRNOG00000055837 AABR07022469.1  155.  1091.       937. 
 4 ENSRNOG00000054139 LOC100911132    124.   875.       751. 
 5 ENSRNOG00000049070 Rack1           120.   821.       701. 
 6 ENSRNOG00000054846 AABR07022393.1  113.   808.       695. 
 7 ENSRNOG00000030807 LOC100912427     35.1  197.       162. 
 8 ENSRNOG00000012728 Nkx2-2          337.   200.       137. 
 9 ENSRNOG00000057353 LOC100911394    252.   145.       107. 
10 ENSRNOG00000052159 LOC100910481    183.   105.        78.3
11 ENSRNOG00000051439 LOC100910181    176.   101.        75.0
12 ENSRNOG00000059624 AABR07015889.1  166.    97.1       69.3
13 ENSRNOG00000018685 AABR07028237.1  142.    84.7       57.7
14 ENSRNOG00000057759 AABR07027854.1  139.    82.9       56.5
15 ENSRNOG00000050206 Shank2          136.    81.4       54.5
16 ENSRNOG00000011305 Sox10           136.    81.4       54.5
17 ENSRNOG00000026065 Slit1           127.    75.7       51.2
18 ENSRNOG00000061294 LOC100909833    108.    62.8       45.6
19 ENSRNOG00000020009 Npas4            99.8   58.6       41.2
20 ENSRNOG00000009206 Fezf2            90.5   51.6       38.8
  • corelation using fake(co) or mono samples for misassignment estimation.

image

We suspect this is caused by the different composition of reads of species in the sample. the fake HM samples mouse2human is roughly 1:1, but in the HMR sample, mouse is much abundant and there not many human reads.

   Sample            mouse   rat human
   <chr>             <dbl> <dbl> <dbl>
	05_EC               0     0   100  
	06_EC               0     0   100  
	07_EC               0     0   100  
	08_EC               0     0   100  
	12_Pc               0    99.9   0  
	11_Pc               0   100     0  
	09_Pc               0   100     0  
	10_Pc               0   100     0  
	15_As_replacement 100     0     0  
	14_As             100     0     0  
	13_As             100     0     0  
	16_As             100     0     0
	fake_03_HM         46.7   0    53.3
	fake_04_HM         50.3   0    49.7
	fake_01_HM         55.9   0    44.1
	fake_02_HM         60.7   0    39.3
	04_HMR             67.3  28.8   3.9
	02_HMR             74.4  17.6   8  
	01_HMR             75.9  16.6   7.5
	03_HMR             53.2  44.1   2.7

Let's look at some genes

image

  • ENSRNOG00000012199 Sox2

0.588* c(5.023813,4.695743,1.267060,2.472318) /c(0.133,0.241,0.0394,0.0283) [1] 22.21054 11.45683 18.90942 51.36830

> P_co$P_condition_debug$P%>%filter(gene %in% c('ENSRNOG00000012199'))
# A tibble: 1 x 11
  gene                   p `human,mouse_counts` fake_01_HM fake_02_HM fake_03_HM fake_04_HM `01_HMR` `02_HMR` `03_HMR` `04_HMR`
  <chr>              <dbl>                <dbl>      <dbl>      <dbl>      <dbl>      <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1 ENSRNOG00000012199 2598.                0.588     0.0103     0.0262       1.22       1.06    0.133    0.241   0.0394   0.0283

> P_co$P_condition_debug$D
$`human,mouse`
  04_HMR   02_HMR   03_HMR   01_HMR 
2.472318 4.695743 1.267060 5.023813 

> P$`human,mouse`$p[6466,]
# A tibble: 1 x 6
  gene               `01_HMR` `02_HMR` `03_HMR` `04_HMR`     p
  <chr>                 <dbl>    <dbl>    <dbl>    <dbl> <dbl>
1 ENSRNOG00000012199    2214.    1144.    1892.    5143. 2598.

> P_mono$P_condition_debug$P%>%filter(gene %in% c('ENSRNOG00000012199'))
# A tibble: 1 x 18
  gene               p.human p.mouse     p human_counts `05_EC` `06_EC` `07_EC` `08_EC` mouse_counts `13_As` `14_As` `15_As_replacement` `16_As` `01_HMR` `02_HMR` `03_HMR` `04_HMR`
  <chr>                <dbl>   <dbl> <dbl>        <dbl>   <dbl>   <dbl>   <dbl>   <dbl>        <dbl>   <dbl>   <dbl>               <dbl>   <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1 ENSRNOG00000012199       0   4490. 4490.            0       0       0       0       0         1.09  0.0181  0.0429                2.58    2.07    0.133    0.241   0.0394   0.0283
> P_mono$P_condition_debug$D
$human
    04_HMR     02_HMR     03_HMR     01_HMR 
0.13513955 0.45660661 0.06097529 0.45306509 

$mouse
  04_HMR   02_HMR   03_HMR   01_HMR 
2.337179 4.239137 1.206085 4.570748