Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get single marker gene from read #108

Open
Lcornet opened this issue May 19, 2022 · 0 comments
Open

Get single marker gene from read #108

Lcornet opened this issue May 19, 2022 · 0 comments

Comments

@Lcornet
Copy link

Lcornet commented May 19, 2022

I have a fasta file with a maker gene and I would like to extract it from raw illumina reads:

The marker is in a fasta file with only one sequence:

LLX10@00074518
MAIEDNPYVFRFEGRLWVSEEPRETAAAQLRAQREWDRQNARLQHWWVAISVSAVAGVAV
TLYLGTSAGLAPAIYLVLLPIGFGAGAVLGALVNKRFFAPELQHGSLPPRPELAKLTRIP
SRVARAAPDNASARDLIDWSTRGFVD

I try to construct a custom database with matam by I have this error:

$ singularity exec --bind /scratch/ulg/bioec/lcornet/matam:/mnt matam.sif matam_db_preprocessing.py -i /mnt/marker.fasta -d /mnt/marker/ --cpu 1 --max_memory 10000 -v

#################################
MATAM db pre-processing
#################################

CMD: /opt/miniconda/opt/matam-1.6.0/scripts/matam_db_preprocessing.py --verbose --cpu 1 --max_memory 10000 --min_length 10 --max_consecutive_n 5 --clustering_id_threshold 0.95 --db_dir /mnt/marker --input_ref /mnt/marker.fasta

INFO - Starting ref db pre-processing
INFO - Extracting taxonomies from reference DB
INFO - Cleaning reference db
1 sequences were rejected
INFO - Starting ref db clustering
INFO - Clustering sequences @ 95 pct id
vsearch v2.15.2_linux_x86_64, 251.8GB RAM, 64 cores
https://github.com/torognes/vsearch

Reading file /mnt/marker/marker.cleaned.fasta 100%
0 nt in 0 seqs
minseqlength 32: 1 sequence discarded.
Masking 100%
Sorting by length 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 0
Singletons: 0
Traceback (most recent call last):
File "/opt/miniconda/opt/matam-1.6.0/scripts/fasta_clean_name.py", line 62, in
sequence_id = header.split()[0]
IndexError: list index out of range
INFO - Renaming output files as MATAM db files
INFO - Indexing complete ref db

WARNING: no write permissions in directory /tmpscratch: No such file or directory
will try /tmp/.

Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, [email protected]
Laurent Noé, [email protected]
Hélène Touzet, [email protected]

Parameters summary:
K-mer size: 19
K-mer interval: 1
Maximum positions to store per unique K-mer: 10000

Total number of databases to index: 1

Begin indexing file /mnt/marker/marker_NR95.complete.fasta under index name /mnt/marker/marker_NR95.complete:

ERROR: at least one of your reads is shorter than the seed length 19, please filter out all reads shorter than 19 to continue index construction.

Collecting sequence distribution statistics ..
INFO - Indexing clustered ref db
The input file is empty, an index was not built.

Output MATAM db: /mnt/marker/marker_NR95

matam_db_preprocessing.py terminated with some errors. Check the log for additional infos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant