Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with build #78

Closed
mhassan opened this issue Sep 19, 2018 · 14 comments
Closed

issue with build #78

mhassan opened this issue Sep 19, 2018 · 14 comments

Comments

@mhassan
Copy link

mhassan commented Sep 19, 2018

I have installed and successfully run the test. I am now trying to use 'tracer build' to create a new resource for my organism of choice (Btau). However, I keep getting the error I keep getting the error "assert os.path.isdir(resources_root), "Species not found in resources". I didn't expect it to find the spp since that's what am trying to create. Nevertheless I made a folder and move all the V, J, C, and D sequences in it, but still get this error.
Do you have any idea how to go about it.

@mstubb
Copy link
Member

mstubb commented Sep 20, 2018

Hi @mhassan,

Sorry about that. There was a bug (introduced by another change) that meant we always required a species directory even when using build. Please can you try the latest version from the master branch and let me know if it works for you now?

All the best,

Mike

@mhassan
Copy link
Author

mhassan commented Sep 20, 2018

Hi Mike,
Thanks, that resolved the issue. However, am now having a problem with igblastn. Initially there was an error with the species:
igblast_species = species_mapper[species]
KeyError: 'Btau

So I edited the species_wrapper in tracer_func.py to include "Btau": 'Bos_taurus'

Which clears the error but then igblastn produces this error:

CMD failed:
/exports/mhassan/igBLAST/ncbi-igblast-1.9.0/bin/igblastn -germline_db_V ./bos/Btau/igblast_dbs/TCR_V.fa -germline_db_D ./bos/Btau/igblast_dbs/TCR_D.fa -germline_db_J ./bos/Btau/igblast_dbs/TCR_J.fa -domain_system imgt -organism Bos_taurus -ig_seqtype TCR -show_translation -num_alignments_V 5 -num_alignments_D 5 -num_alignments_J 5 -outfmt 3 -auxiliary_data optional_file/Bos_taurus_gl.aux -query /exports/mhassan/Tim/TCRs/A7/Trinity_output/A7_TCR_A.Trinity.fasta

USAGE
igblastn [-h] [-help] [-germline_db_V germline_database_name]
[-num_alignments_V int_value] [-germline_db_V_seqidlist filename]
[-germline_db_D germline_database_name] [-num_alignments_D int_value]
[-germline_db_D_seqidlist filename]
[-germline_db_J germline_database_name] [-num_alignments_J int_value]
[-germline_db_J_seqidlist filename] [-auxiliary_data filename]
[-min_D_match min_D_match] [-D_penalty D_penalty] [-J_penalty J_penalty]
[-num_clonotype num_clonotype] [-clonotype_out clonotype_out]
[-allow_vdj_overlap] [-organism germline_origin]
[-domain_system domain_system] [-ig_seqtype sequence_type]
[-focus_on_V_segment] [-extend_align5end] [-min_V_length Min_V_Length]
[-min_J_length Min_J_Length] [-show_translation] [-db database_name]
[-dbsize num_letters] [-gilist filename] [-seqidlist filename]
[-negative_gilist filename] [-negative_seqidlist filename]
[-taxids taxids] [-negative_taxids taxids] [-taxidlist filename]
[-negative_taxidlist filename] [-entrez_query entrez_query]
[-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
[-subject subject_input_file] [-subject_loc range] [-query input_file]
[-sra accession] [-out output_file] [-evalue evalue]
[-word_size int_value] [-gapopen open_penalty] [-gapextend extend_penalty]
[-qcov_hsp_perc float_value] [-max_hsps int_value]
[-xdrop_ungap float_value] [-xdrop_gap float_value]
[-xdrop_gap_final float_value] [-searchsp int_value]
[-sum_stats bool_value] [-penalty penalty] [-reward reward] [-no_greedy]
[-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]
[-best_hit_score_edge float_value] [-window_size int_value]
[-off_diagonal_range int_value] [-lcase_masking] [-query_loc range]
[-strand strand] [-parse_deflines] [-outfmt format] [-show_gis]
[-num_descriptions int_value] [-num_alignments int_value]
[-line_length line_length] [-max_target_seqs num_sequences]
[-num_threads int_value] [-remote] [-version]

DESCRIPTION
Nucleotide-Nucleotide BLAST for immunoglobulin sequences 2.8.0+

Use '-help' to print detailed descriptions of command line arguments

Error: Argument "organism". Illegal value, expected human', mouse', rabbit', rat', rhesus_monkey': Bos_taurus'
Error: (CArgException::eConstraint) Argument "organism". Illegal value, expected human', mouse', rabbit', rat', rhesus_monkey': Bos_taurus'

I've tried the default --seq_method and 'assembly' but none seems to work.

Musa

@mstubb
Copy link
Member

mstubb commented Sep 21, 2018 via email

@mstubb
Copy link
Member

mstubb commented Sep 26, 2018

Hi @mhassan,

I think this is fixed in the build_iss78 branch (https://github.com/Teichlab/tracer/tree/build_iss78). Please can you download this and see if it works for you? You shouldn't need to edit 'tracer_func.py' now.

A few notes:

  1. This was breaking because we added functionality to use an 'auxiliary file' from IgBLAST to detect CDR3s based on annotations of CDR locations. These files are available with IgBLAST but obviously don't exist for custom species. So, Tracer now won't attempt to do this for custom reference builds.

  2. Please could you manually check the first few assembled sequences to see if Tracer is reporting their 'productive' status correctly. Does Tracer call them 'productive' if they are in-frame/lack stop codons and vice versa? If not, our method for doing this may be too mouse/human-specific and will need revising.

  3. You'll need to supply a Bos taurus reference transcriptome and to add its path into your config file. I made one using the instructions at http://www.nxn.se/valent/2016/10/3/the-first-steps-in-rna-seq-expression-analysis-single-cell-and-other and you can download it from http://www.ensembl.org/biomart/martresults/248?file=martquery_0926102305_185.txt.gz. I know nothing about cow genomics though so this may not be optimal!

  4. In building Tracer references from your cow fasta files, I noticed the following:

  • Your TRB_C.fa file has too many entries. build can only use a single constant region sequence. So long as they don't differ too much, you can pick a consensus.

  • There was no TRD_C.fa file in the set that you sent me. Didn't matter for my testing though.

Hope that's helpful. Please let me know how you get on and if you need anything else.

All the best,

Mike

@mhassan
Copy link
Author

mhassan commented Sep 30, 2018

Thanks Mike for the tweaks. They seem to have resolved the initial issues but another issue came up, which seems to be an open issue raised by another user #65.
"BLAST query/options error: Germline annotation database human/human_TR_V could not be found in [internal_data] directory
Please refer to the BLAST+ user manual"

My initial thought was that the igblastn is looking in the wrong resource directory i.e. not the one in tracer.conf file, but that doesn't appear to be the case.
Musa

@mstubb
Copy link
Member

mstubb commented Sep 30, 2018

Well, I'm glad that the first part seems to be fixed.

Have you installed IgBLAST according to the instructions here by downloading the internal_data directory and setting your $IGDATA environment variable?

Does tracer test work or just crash in the same place?

@mhassan
Copy link
Author

mhassan commented Sep 30, 2018

The igBLAST path was properly set, however, 'internal_data' and 'optional_file' needed to be in the bin sub-directory.
Am wondering though why tracer needs to build salmon index even when a pre-made index is given. This seems to cause a fatal error in my case in which tracer stops with:
"salmon quant was invoked improperly" error message

@mstubb
Copy link
Member

mstubb commented Sep 30, 2018 via email

@mhassan
Copy link
Author

mhassan commented Sep 30, 2018

here is the tracer run including the error message at the end.

##Finding recombinant-derived reads##
Resuming with existing TCR_A reads
Resuming with existing TCR_B reads

##Assembling Trinity Contigs##
Resuming with existing Trinity output

##Running IgBLAST##
##Running IgBLAST##
Performing IgBlast on ['TCR_A', 'TCR_B']
##TCR_A##
##TCR_B##

No library type specified for salmon in the configuration file. Using automatic detection (--libType A).
No kmer length specified for salmon in the configuration file. Using default value of 31.

##Running Salmon##

##Making Salmon indices##
Version Info: ### A newer version of Salmon is available. ####

The newest version, available at https://github.com/COMBINE-lab/salmon/releases
contains important bug fixes and improvements; please upgrade at your
earliest convenience.

index ["/mhassan2/Tim/TCRs/A7/expression_quantification/salmon_index/A7_transcriptome.idx"] did not previously exist . . . creating it
[2018-09-30 18:33:30.945] [jLog] [info] building index
RapMap Indexer

[Step 1 of 4] : counting k-mers
counted k-mers for 20000 transcriptsElapsed time: 0.515421s

Replaced 544 non-ATCG nucleotides
Clipped poly-A tails from 16 transcripts
Building rank-select dictionary and saving to disk done
Elapsed time: 0.00164497s
Writing sequence data to file . . . done
Elapsed time: 0.0960587s
[info] Building 32-bit suffix array (length of generalized text is 46217282)
Building suffix array . . . success
saving to disk . . . done
Elapsed time: 0.143047s
done
Elapsed time: 5.08416s
processed 19000000 positionstcmalloc: large alloc 1073741824 bytes == 0x57032000 @
processed 38000000 positionstcmalloc: large alloc 2147483648 bytes == 0x97032000 @
processed 46000000 positions
khash had 40737272 keys
saving hash to disk . . . done
Elapsed time: 3.87736s
[2018-09-30 18:34:09.634] [jLog] [info] done building index

##Quantifying with Salmon##
Version Info: ### A newer version of Salmon is available. ####

The newest version, available at https://github.com/COMBINE-lab/salmon/releases
contains important bug fixes and improvements; please upgrade at your
earliest convenience.

salmon (mapping-based) v0.6.0

[ program ] => salmon

[ command ] => quant

[ index ] => { /mhassan2/Tim/TCRs/A7/expression_quantification/salmon_index/A7_transcriptome.idx }

[ libType ] => { A }

[ threads ] => { 8 }

[ mates1 ] => { Tim/CowSeq_2053_G9_11d_PC_cDNA_A7_PRO1834_S1_libDNA_TAGCGCTC-ATAGAGAG_L006_R1_val_1.fq.gz }

[ mates2 ] => { Tim/CowSeq_2053_G9_11d_PC_cDNA_A7_PRO1834_S1_libDNA_TAGCGCTC-ATAGAGAG_L006_R2_val_2.fq.gz }

[ output ] => { /mhassan2/Tim/TCRs/A7/expression_quantification }

Logs will be written to /mhassan2/Tim/TCRs/A7/expression_quantification/logs
Exception : [unknown library format string : A]
salmon/v0.6.1-pre/bin/salmon quant was invoked improperly.
For usage information, try salmon/v0.6.1-pre/bin/salmon quant --help
Exiting.
[2018-09-30 18:34:09.807] [jointLog] [info] parsing read library format
Traceback (most recent call last):
File "tracer/tracer", line 21, in
launch()
File "/mhassan2/tracer/tracerlib/launcher.py", line 43, in launch
Task().run()
File "/mhassan2/tracer/tracerlib/tasks.py", line 373, in run
self.quantify(cell)
File "/mhassan2/tracer/tracerlib/tasks.py", line 618, in quantify
self.fragment_sd, salmon_libType, salmon_kmerLen)
File "/mhassan2/tracer/tracerlib/tracer_func.py", line 1472, in quantify_with_salmon
subprocess.check_call(salmon_command)
File "python/2.7.10/lib/python2.7/subprocess.py", line 540, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['salmon/v0.6.1-pre/bin/salmon', 'quant', '-i', '/mhassan2/Tim/TCRs/A7/expression_quantification/salmon_index/A7_transcriptome.idx', '-l', 'A', '-p', '8', '-1', 'Tim/CowSeq_2053_G9_11d_PC_cDNA_A7_PRO1834_S1_libDNA_TAGCGCTC-ATAGAGAG_L006_R1_val_1.fq.gz', '-2', 'Tim/CowSeq_2053_G9_11d_PC_cDNA_A7_PRO1834_S1_libDNA_TAGCGCTC-ATAGAGAG_L006_R2_val_2.fq.gz', '-o', '/mhassan2/Tim/TCRs/A7/expression_quantification']' returned non-zero exit status 1

@mstubb
Copy link
Member

mstubb commented Sep 30, 2018 via email

@mhassan
Copy link
Author

mhassan commented Sep 30, 2018

Probably was an issue with the salmon version. Though haven't tested it yet, reverting to kallisto (for which I have the latest version) worked without any errors or warnings.
Thanks.

@mstubb
Copy link
Member

mstubb commented Sep 30, 2018 via email

@mhassan
Copy link
Author

mhassan commented Sep 30, 2018

Thanks for the help Mike.

@mstubb
Copy link
Member

mstubb commented Oct 1, 2018

You're welcome. Hope you find out some cool things to do with cow TCRs!

@mstubb mstubb closed this as completed Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants