short-read sequence assemblies for variation detection #1556

C-YONG · 2024-12-03T14:25:50Z

Hello Developers,

I've successfully built a pan-genome consisting of 7 fugu genomes using MC without any problems. Thank you for your kind descriptions on GitHub. Now, I'm planning to find more SVs by adding short-read sequence data.
Should I perform haplotype sampling or allele frequency filtering?
Should I just use CILP filtering?
Thank you for your support, and I look forward to hearing from you soon!

Best Regards

glennhickey · 2024-12-03T16:59:31Z

As shown here, haplotype sampling (--haplo) gives better performance than allele frequency filtering.

So my suggestion is to use the haplotype filtering, but since we only tested on human, I can't guarantee that it will outperform frequency filtering --filter 2 --giraffe on fugu.

C-YONG · 2024-12-04T01:00:52Z

Hello! Based on the content from the VG Giraffe best practices document :"With a small number of haplotypes (e.g. 10), the default graph is usually a good choice."
Does this mean we don’t need filtering（d2） or haplotype sampling?
I would like to understand whether such filtering or haplotype sampling is necessary for graphs constructed with a small number of genomes. For such simple graphs, is the default already sufficient?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

short-read sequence assemblies for variation detection #1556

short-read sequence assemblies for variation detection #1556

C-YONG commented Dec 3, 2024

glennhickey commented Dec 3, 2024

C-YONG commented Dec 4, 2024

short-read sequence assemblies for variation detection #1556

short-read sequence assemblies for variation detection #1556

Comments

C-YONG commented Dec 3, 2024

glennhickey commented Dec 3, 2024

C-YONG commented Dec 4, 2024