-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assembly does not improve #172
Comments
Hi @gubrins, did you find a solution? I'm just another user of SALSA. Maybe your combined.bed file isn't as it should be. Did you use the method described here: [(https://github.com/ArimaGenomics/mapping_pipeline)] to clean up, map, and prepare the sorted bed file? What does cat final.bam.stats give you? Something like this with substantial "All inter" mappings? All 110216052 Chris |
Hi Chris, thanks for the interest! Unluckily I did not manage to improve it... The bed file should be fine because I did the exact same as the other species and for one worked and for the other not, so I am wondering that the Hi-C data is not as good as I thought. |
It seems that your "All inter" counts in your bam stats file are low or zero. Is that true? That would confirm poor Hi-C data. |
Heys,
I am working with two closely related species and for both I have HiFi and Hi-C data. I did the exact same for both species and for species 1, after SALSA, I get a better assembly. However, for species 2, after salsa I get the same N50 as I had before doing the scaffolding.
During the assembly, I get this ERROR! WARNING: Not enough Hi-C reads for scaffolding. What does this mean?
This is the summary I get from gfastats after the scaffolding:
`+++Summary+++:
scaffolds: 356
Total scaffold length: 1502913456
Average scaffold length: 4221667.01
Scaffold N50: 67491308
Scaffold auN: 81379285.48
Scaffold L50: 7
Largest scaffold: 203202437
contigs: 403
Total contig length: 1502889956
Average contig length: 3729255.47
Contig N50: 67491308
Contig auN: 81095181.14
Contig L50: 7
Largest contig: 203202437
gaps: 47
Total gap length: 23500
Average gap length: 500.00
Gap N50: 500
Gap auN: 500.00
Gap L50: 24
Largest gap: 500
Base composition (ACGT): 448804358, 302773097, 302741034, 448571467
GC content %: 40.29
soft-masked bases: 0
paths: 356
`
As you can see, both scaffold and contig N50 are the same: 67491308
And I also add this, just in case it helps:
This is the code I used to scaffold the assembly:
run_pipeline.py --assembly purged.fa --length purged.fa.fai --bed combined.bed --enzyme GANTC --output scaffolded
Any help would be appreciated!
Thanks in advance!
The text was updated successfully, but these errors were encountered: