-
Notifications
You must be signed in to change notification settings - Fork 1
checking the effect of using two mono culture vs using a fake co culture during misassignment calculation
Xin He edited this page May 27, 2020
·
4 revisions
For some experiment, we do not have the required co-culture samples to estimate the misassignment rate in the triple-culture. In this case, we can either use the 2 mono-culture samples or to merge them into a fake co-culture sample.
Here we trying to understand how this affects the final result and why. We use the bbb round3 data(rat) for testing.
- most difference genes in turns of misassignment rate, due to reference samples.
# A tibble: 20 x 5
gene gene_name p_mono p_co difference
<chr> <chr> <dbl> <dbl> <dbl>
1 ENSRNOG00000012199 Sox2 4490. 2598. 1892.
2 ENSRNOG00000046657 AABR07008242.1 193. 1483. 1291.
3 ENSRNOG00000055837 AABR07022469.1 155. 1091. 937.
4 ENSRNOG00000054139 LOC100911132 124. 875. 751.
5 ENSRNOG00000049070 Rack1 120. 821. 701.
6 ENSRNOG00000054846 AABR07022393.1 113. 808. 695.
7 ENSRNOG00000030807 LOC100912427 35.1 197. 162.
8 ENSRNOG00000012728 Nkx2-2 337. 200. 137.
9 ENSRNOG00000057353 LOC100911394 252. 145. 107.
10 ENSRNOG00000052159 LOC100910481 183. 105. 78.3
11 ENSRNOG00000051439 LOC100910181 176. 101. 75.0
12 ENSRNOG00000059624 AABR07015889.1 166. 97.1 69.3
13 ENSRNOG00000018685 AABR07028237.1 142. 84.7 57.7
14 ENSRNOG00000057759 AABR07027854.1 139. 82.9 56.5
15 ENSRNOG00000050206 Shank2 136. 81.4 54.5
16 ENSRNOG00000011305 Sox10 136. 81.4 54.5
17 ENSRNOG00000026065 Slit1 127. 75.7 51.2
18 ENSRNOG00000061294 LOC100909833 108. 62.8 45.6
19 ENSRNOG00000020009 Npas4 99.8 58.6 41.2
20 ENSRNOG00000009206 Fezf2 90.5 51.6 38.8
- corelation using fake(co) or mono samples for misassignment estimation.
We suspect this is caused by the different composition of reads of species in the sample. the fake HM samples mouse2human is roughly 1:1, but in the HMR sample, mouse is much abundant and there not many human reads.
Sample mouse rat human
<chr> <dbl> <dbl> <dbl>
05_EC 0 0 100
06_EC 0 0 100
07_EC 0 0 100
08_EC 0 0 100
12_Pc 0 99.9 0
11_Pc 0 100 0
09_Pc 0 100 0
10_Pc 0 100 0
15_As_replacement 100 0 0
14_As 100 0 0
13_As 100 0 0
16_As 100 0 0
fake_03_HM 46.7 0 53.3
fake_04_HM 50.3 0 49.7
fake_01_HM 55.9 0 44.1
fake_02_HM 60.7 0 39.3
04_HMR 67.3 28.8 3.9
02_HMR 74.4 17.6 8
01_HMR 75.9 16.6 7.5
03_HMR 53.2 44.1 2.7
Let's look at some genes
- ENSRNOG00000012199 Sox2
0.588* c(5.023813,4.695743,1.267060,2.472318) /c(0.133,0.241,0.0394,0.0283) [1] 22.21054 11.45683 18.90942 51.36830
> P_co$P_condition_debug$P%>%filter(gene %in% c('ENSRNOG00000012199'))
# A tibble: 1 x 11
gene p `human,mouse_counts` fake_01_HM fake_02_HM fake_03_HM fake_04_HM `01_HMR` `02_HMR` `03_HMR` `04_HMR`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ENSRNOG00000012199 2598. 0.588 0.0103 0.0262 1.22 1.06 0.133 0.241 0.0394 0.0283
> P_co$P_condition_debug$D
$`human,mouse`
04_HMR 02_HMR 03_HMR 01_HMR
2.472318 4.695743 1.267060 5.023813
> P$`human,mouse`$p[6466,]
# A tibble: 1 x 6
gene `01_HMR` `02_HMR` `03_HMR` `04_HMR` p
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ENSRNOG00000012199 2214. 1144. 1892. 5143. 2598.
> P_mono$P_condition_debug$P%>%filter(gene %in% c('ENSRNOG00000012199'))
# A tibble: 1 x 18
gene p.human p.mouse p human_counts `05_EC` `06_EC` `07_EC` `08_EC` mouse_counts `13_As` `14_As` `15_As_replacement` `16_As` `01_HMR` `02_HMR` `03_HMR` `04_HMR`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ENSRNOG00000012199 0 4490. 4490. 0 0 0 0 0 1.09 0.0181 0.0429 2.58 2.07 0.133 0.241 0.0394 0.0283
> P_mono$P_condition_debug$D
$human
04_HMR 02_HMR 03_HMR 01_HMR
0.13513955 0.45660661 0.06097529 0.45306509
$mouse
04_HMR 02_HMR 03_HMR 01_HMR
2.337179 4.239137 1.206085 4.570748