-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue with build #78
Comments
Hi @mhassan, Sorry about that. There was a bug (introduced by another change) that meant we always required a species directory even when using All the best, Mike |
Hi Mike, So I edited the species_wrapper in tracer_func.py to include "Btau": 'Bos_taurus' Which clears the error but then igblastn produces this error: CMD failed: USAGE DESCRIPTION Use '-help' to print detailed descriptions of command line argumentsError: Argument "organism". Illegal value, expected I've tried the default --seq_method and 'assembly' but none seems to work. Musa |
Ah. Sorry about this. I’ll be happy to help make this work. It’d be really helpful if you could send me the following things (either through here or directly through my email mjt.stubbington [at] gmail.com).
1. The command that you used to build your Btau reference
2. The command you’re running that gives the igblastn error below
3. The reference sequences that you’re using as input to `build`
4. FASTQ files for one of your cells so that I can test them as input once everything else is working
I think we made some breaking changes to `build` when updating the other functions and it wasn’t tested fully. Sorry you have to be the test-case but this is really useful and I should be able to get a working version to you soon.
All the best,
Mike
… On 20 Sep 2018, at 23:39, mhassan ***@***.***> wrote:
Hi Mike,
Thanks, that resolved the issue. However, am now having a problem with igblastn. Initially there was an error with the species:
igblast_species = species_mapper[species]
KeyError: 'Btau
So I edited the species_wrapper in tracer_func.py to include "Btau": 'Bos_taurus'
Which clears the error but then igblastn produces this error:
CMD failed:
/exports/mhassan/igBLAST/ncbi-igblast-1.9.0/bin/igblastn -germline_db_V ./bos/Btau/igblast_dbs/TCR_V.fa -germline_db_D ./bos/Btau/igblast_dbs/TCR_D.fa -germline_db_J ./bos/Btau/igblast_dbs/TCR_J.fa -domain_system imgt -organism Bos_taurus -ig_seqtype TCR -show_translation -num_alignments_V 5 -num_alignments_D 5 -num_alignments_J 5 -outfmt 3 -auxiliary_data optional_file/Bos_taurus_gl.aux -query /exports/mhassan/Tim/TCRs/A7/Trinity_output/A7_TCR_A.Trinity.fasta
USAGE
igblastn [-h] [-help] [-germline_db_V germline_database_name]
[-num_alignments_V int_value] [-germline_db_V_seqidlist filename]
[-germline_db_D germline_database_name] [-num_alignments_D int_value]
[-germline_db_D_seqidlist filename]
[-germline_db_J germline_database_name] [-num_alignments_J int_value]
[-germline_db_J_seqidlist filename] [-auxiliary_data filename]
[-min_D_match min_D_match] [-D_penalty D_penalty] [-J_penalty J_penalty]
[-num_clonotype num_clonotype] [-clonotype_out clonotype_out]
[-allow_vdj_overlap] [-organism germline_origin]
[-domain_system domain_system] [-ig_seqtype sequence_type]
[-focus_on_V_segment] [-extend_align5end] [-min_V_length Min_V_Length]
[-min_J_length Min_J_Length] [-show_translation] [-db database_name]
[-dbsize num_letters] [-gilist filename] [-seqidlist filename]
[-negative_gilist filename] [-negative_seqidlist filename]
[-taxids taxids] [-negative_taxids taxids] [-taxidlist filename]
[-negative_taxidlist filename] [-entrez_query entrez_query]
[-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
[-subject subject_input_file] [-subject_loc range] [-query input_file]
[-sra accession] [-out output_file] [-evalue evalue]
[-word_size int_value] [-gapopen open_penalty] [-gapextend extend_penalty]
[-qcov_hsp_perc float_value] [-max_hsps int_value]
[-xdrop_ungap float_value] [-xdrop_gap float_value]
[-xdrop_gap_final float_value] [-searchsp int_value]
[-sum_stats bool_value] [-penalty penalty] [-reward reward] [-no_greedy]
[-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]
[-best_hit_score_edge float_value] [-window_size int_value]
[-off_diagonal_range int_value] [-lcase_masking] [-query_loc range]
[-strand strand] [-parse_deflines] [-outfmt format] [-show_gis]
[-num_descriptions int_value] [-num_alignments int_value]
[-line_length line_length] [-max_target_seqs num_sequences]
[-num_threads int_value] [-remote] [-version]
DESCRIPTION
Nucleotide-Nucleotide BLAST for immunoglobulin sequences 2.8.0+
Use '-help' to print detailed descriptions of command line arguments
Error: Argument "organism". Illegal value, expected human',mouse', rabbit',rat', rhesus_monkey':Bos_taurus'
Error: (CArgException::eConstraint) Argument "organism". Illegal value, expected human',mouse', rabbit',rat', rhesus_monkey':Bos_taurus'
I've tried the default --seq_method and 'assembly' but none seems to work.
Musa
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#78 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABFwhr8uAvLoZVIYSz-zp4aZSG1oSC98ks5udBkggaJpZM4WxE3O>.
|
Hi @mhassan, I think this is fixed in the A few notes:
Hope that's helpful. Please let me know how you get on and if you need anything else. All the best, Mike |
Thanks Mike for the tweaks. They seem to have resolved the initial issues but another issue came up, which seems to be an open issue raised by another user #65. My initial thought was that the igblastn is looking in the wrong resource directory i.e. not the one in tracer.conf file, but that doesn't appear to be the case. |
Well, I'm glad that the first part seems to be fixed. Have you installed IgBLAST according to the instructions here by downloading the Does |
The igBLAST path was properly set, however, 'internal_data' and 'optional_file' needed to be in the bin sub-directory. |
Tracer builds a unique salmon index for each cell that includes the specific reconstructed TCR sequences. It does this by appending the reconstructed TCR sequences to the end of the reference transcriptome fasta file that’s specified in the config and then building the index. This enables it to quantify expression of the reconstructed sequences - this is then used for filtering in cases where more than two sequences are reconstructed for a locus.
I’ve not seen that salmon error before. Happy to help with it though. Please send me the full error output from tracer.
Thanks!
… On 30 Sep 2018, at 10:42, mhassan ***@***.***> wrote:
The igBLAST path was properly set, however, 'internal_data' and 'optional_file' needed to be in the bin sub-directory.
Am wondering though why tracer needs to build salmon index even when a pre-made index is given. This seems to cause a fatal error in my case in which tracer stops with:
"salmon quant was invoked improperly" error message
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#78 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABFwhm0x95_Oeo2mj3jpDWla-dZYiwLNks5ugQKMgaJpZM4WxE3O>.
|
here is the tracer run including the error message at the end. ##Finding recombinant-derived reads## ##Assembling Trinity Contigs## ##Running IgBLAST## No library type specified for salmon in the configuration file. Using automatic detection (--libType A). ##Running Salmon## ##Making Salmon indices## The newest version, available at https://github.com/COMBINE-lab/salmon/releases index ["/mhassan2/Tim/TCRs/A7/expression_quantification/salmon_index/A7_transcriptome.idx"] did not previously exist . . . creating it [Step 1 of 4] : counting k-mers Replaced 544 non-ATCG nucleotides ##Quantifying with Salmon## The newest version, available at https://github.com/COMBINE-lab/salmon/releases salmon (mapping-based) v0.6.0[ program ] => salmon[ command ] => quant[ index ] => { /mhassan2/Tim/TCRs/A7/expression_quantification/salmon_index/A7_transcriptome.idx }[ libType ] => { A }[ threads ] => { 8 }[ mates1 ] => { Tim/CowSeq_2053_G9_11d_PC_cDNA_A7_PRO1834_S1_libDNA_TAGCGCTC-ATAGAGAG_L006_R1_val_1.fq.gz }[ mates2 ] => { Tim/CowSeq_2053_G9_11d_PC_cDNA_A7_PRO1834_S1_libDNA_TAGCGCTC-ATAGAGAG_L006_R2_val_2.fq.gz }[ output ] => { /mhassan2/Tim/TCRs/A7/expression_quantification }Logs will be written to /mhassan2/Tim/TCRs/A7/expression_quantification/logs |
Thanks. It looks like you’re running a really old version of Salmon (v0.6.1). Can you upgrade to a newer version?
M
… On 30 Sep 2018, at 11:20, mhassan ***@***.***> wrote:
here is the tracer run including the error message at the end.
##Finding recombinant-derived reads##
Resuming with existing TCR_A reads
Resuming with existing TCR_B reads
##Assembling Trinity Contigs##
Resuming with existing Trinity output
##Running IgBLAST##
##Running IgBLAST##
Performing IgBlast on ['TCR_A', 'TCR_B']
##TCR_A##
##TCR_B##
No library type specified for salmon in the configuration file. Using automatic detection (--libType A).
No kmer length specified for salmon in the configuration file. Using default value of 31.
##Running Salmon##
##Making Salmon indices##
Version Info: ### A newer version of Salmon is available. ####
The newest version, available at https://github.com/COMBINE-lab/salmon/releases
contains important bug fixes and improvements; please upgrade at your
earliest convenience.
index ["/mhassan2/Tim/TCRs/A7/expression_quantification/salmon_index/A7_transcriptome.idx"] did not previously exist . . . creating it
[2018-09-30 18:33:30.945] [jLog] [info] building index
RapMap Indexer
[Step 1 of 4] : counting k-mers
counted k-mers for 20000 transcriptsElapsed time: 0.515421s
Replaced 544 non-ATCG nucleotides
Clipped poly-A tails from 16 transcripts
Building rank-select dictionary and saving to disk done
Elapsed time: 0.00164497s
Writing sequence data to file . . . done
Elapsed time: 0.0960587s
[info] Building 32-bit suffix array (length of generalized text is 46217282)
Building suffix array . . . success
saving to disk . . . done
Elapsed time: 0.143047s
done
Elapsed time: 5.08416s
processed 19000000 positionstcmalloc: large alloc 1073741824 bytes == 0x57032000 @
processed 38000000 positionstcmalloc: large alloc 2147483648 bytes == 0x97032000 @
processed 46000000 positions
khash had 40737272 keys
saving hash to disk . . . done
Elapsed time: 3.87736s
[2018-09-30 18:34:09.634] [jLog] [info] done building index
##Quantifying with Salmon##
Version Info: ### A newer version of Salmon is available. ####
The newest version, available at https://github.com/COMBINE-lab/salmon/releases
contains important bug fixes and improvements; please upgrade at your
earliest convenience.
salmon (mapping-based) v0.6.0
[ program ] => salmon
[ command ] => quant
[ index ] => { /mhassan2/Tim/TCRs/A7/expression_quantification/salmon_index/A7_transcriptome.idx }
[ libType ] => { A }
[ threads ] => { 8 }
[ mates1 ] => { Tim/CowSeq_2053_G9_11d_PC_cDNA_A7_PRO1834_S1_libDNA_TAGCGCTC-ATAGAGAG_L006_R1_val_1.fq.gz }
[ mates2 ] => { Tim/CowSeq_2053_G9_11d_PC_cDNA_A7_PRO1834_S1_libDNA_TAGCGCTC-ATAGAGAG_L006_R2_val_2.fq.gz }
[ output ] => { /mhassan2/Tim/TCRs/A7/expression_quantification }
Logs will be written to /mhassan2/Tim/TCRs/A7/expression_quantification/logs
Exception : [unknown library format string : A]
salmon/v0.6.1-pre/bin/salmon quant was invoked improperly.
For usage information, try salmon/v0.6.1-pre/bin/salmon quant --help
Exiting.
[2018-09-30 18:34:09.807] [jointLog] [info] parsing read library format
Traceback (most recent call last):
File "tracer/tracer", line 21, in
launch()
File "/mhassan2/tracer/tracerlib/launcher.py", line 43, in launch
Task().run()
File "/mhassan2/tracer/tracerlib/tasks.py", line 373, in run
self.quantify(cell)
File "/mhassan2/tracer/tracerlib/tasks.py", line 618, in quantify
self.fragment_sd, salmon_libType, salmon_kmerLen)
File "/mhassan2/tracer/tracerlib/tracer_func.py", line 1472, in quantify_with_salmon
subprocess.check_call(salmon_command)
File "python/2.7.10/lib/python2.7/subprocess.py", line 540, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['salmon/v0.6.1-pre/bin/salmon', 'quant', '-i', '/mhassan2/Tim/TCRs/A7/expression_quantification/salmon_index/A7_transcriptome.idx', '-l', 'A', '-p', '8', '-1', 'Tim/CowSeq_2053_G9_11d_PC_cDNA_A7_PRO1834_S1_libDNA_TAGCGCTC-ATAGAGAG_L006_R1_val_1.fq.gz', '-2', 'Tim/CowSeq_2053_G9_11d_PC_cDNA_A7_PRO1834_S1_libDNA_TAGCGCTC-ATAGAGAG_L006_R2_val_2.fq.gz', '-o', '/mhassan2/Tim/TCRs/A7/expression_quantification']' returned non-zero exit status 1
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Probably was an issue with the salmon version. Though haven't tested it yet, reverting to kallisto (for which I have the latest version) worked without any errors or warnings. |
Great. I’ll merge the branch that fixed the build errors into master and then I’ll close this issue.
M
… On 30 Sep 2018, at 12:00, mhassan ***@***.***> wrote:
Probably was an issue with the salmon version. Though haven't tested it yet, reverting to kallisto (for which I have the latest version) worked without any errors or warnings.
Thanks.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Thanks for the help Mike. |
You're welcome. Hope you find out some cool things to do with cow TCRs! |
I have installed and successfully run the test. I am now trying to use 'tracer build' to create a new resource for my organism of choice (Btau). However, I keep getting the error I keep getting the error "assert os.path.isdir(resources_root), "Species not found in resources". I didn't expect it to find the spp since that's what am trying to create. Nevertheless I made a folder and move all the V, J, C, and D sequences in it, but still get this error.
Do you have any idea how to go about it.
The text was updated successfully, but these errors were encountered: