Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Majority of MAGs removed #17

Open
jronwalker opened this issue Nov 20, 2024 · 1 comment
Open

Majority of MAGs removed #17

jronwalker opened this issue Nov 20, 2024 · 1 comment

Comments

@jronwalker
Copy link

Hello,

I recently tried to run the metator pipeline command with scaffolds from metaSPADES and the reads from HiC sequencing. The command used was
metator pipeline --assembly all_MAH_C_scaffolds.fasta --forward 0_1367337_S8_R1_001.fastq.gz --reverse 0_1367337_S8_R2_001.fastq.gz --algorithm louvain --aligner bowtie2 --aligner-mode normal --cluster-matrix False --depth None --edge 0 --enzyme None --force False --iterations 100 --rec-iter 10 --junctions NNNNN --no-clean-up False --normalization empirical_hit --outdir metator_mah_c --overlap 80 --prefix None --rec-overlap 90 --min-quality 30 --res-param 1.0 --size 500000 --start fastq --scaffold False --threads 16 --tmpdir ./tmp

I got the following warning through out the output stream BiopythonDeprecationWarning: GC is deprecated; please use gc_fraction instead. warnings.warn
My main concern is we went from 83,147 overlapping MAGs down to 3 MAGs in the next step and these were the final output MAGs. These were all classified as Others Bins and not as HQ, MQ or LQ. There was also a warning and error at the end of the log file that said:

`INFO :: HQ MAGs: 0 Total Size: 0
INFO :: MQ MAGs: 0 Total Size: 0
INFO :: LQ MAGs: 0 Total Size: 0
INFO :: Contaminated potential MAGs: 0 Total Size: 0
INFO :: Others bins: 3 Total Size: 4936961
WARNING :: No pairs alignment files found in metator_mah_c, no heatmap will be generated.
/nethome/jrw14219/miniforge3/envs/metator/lib/python3.10/site-packages/metator/figures.py:918: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
contigs_data["size_weight"] = (contigs_data["Size"] / min_size).apply(int)
INFO ::

INFO :: ## date: 2024-11-19 23:32:28`

I am using MetaTOR: v1.3.2. We have about 3M scaffolds and our HiC reads have 148M reads. Any information you could provide on what may going on or how we can optimize parameters would be greatly appreciated.

Thank you!

@mmarbout
Copy link
Contributor

mmarbout commented Jan 6, 2025

Hi
Thanks for using our pipeline. And sorry for the delay of my answer. So far, the programm appeared to work fine. The first message is only a warning due to an old version of Biopython (mandatory to use miComplete). The last warning is also a problem due to old dependency version but is not a problem. The ^rogramm must have provided you with a log file that will give you some informations about your HiC data. What is the mapping rate of your fastq ? what is the 3D ratio ? it is possible that your data do not have enough 3D signal ...
Martial

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants