Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newer python versions and Bio Alphabet #112

Open
NathanSiemers opened this issue May 23, 2021 · 5 comments
Open

Newer python versions and Bio Alphabet #112

NathanSiemers opened this issue May 23, 2021 · 5 comments

Comments

@NathanSiemers
Copy link

Hello, I'm trying to build a running tracer on a more modern version of python (3.8.10). SInce then, Bio.Alphabet has been removed from python, and the recommendation is that calls to it (IUPAC) can be removed from most code without a problem.

Is it feasible to do this? Any know successes or issues with later versions of python?

Thank you.

File "/usr/local/lib/python3.8/site-packages/tracer-0.5-py3.8.egg/tracerlib/tracer_func.py", line 29, in

from Bio.Alphabet import IUPAC

File "/usr/local/lib/python3.8/site-packages/Bio/Alphabet/init.py", line 20, in

raise ImportError(

ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the \

``molecule_type`

@NathanSiemers
Copy link
Author

I tested removal of the import calls in init.py and one other file, and tracer loaded correctly, but haven't made a test run.

@mstubb
Copy link
Member

mstubb commented May 24, 2021

Hi Nathan,

Thanks for this! I'll be happy to accept a PR that updates this if you'd like to submit one.

All the best,

Mike

@NathanSiemers
Copy link
Author

I've spent several days working on a pull request. I removed the Bio Alphabet dependencies and changed the creating of the Seq objects to remove dependencies on Bio Alphabet IUPAC. I have also have been editing the Dockerfile to update packages to bring everything to a modern version, and also to run the tests. I can send you what I have so far, but: There's an error in the 'tracer test'. It seems that there's still an obscure call to Bio Alphabet in the pickle dump/load that I find difficult to trace. Partially likely because I'm not a python hacker, I can't resolve this one. Some help from the group would be appreciated.

(fragment of tracer test below, I can't find a remaining reference to Bio Alphabet anywhere in the code base.)

##Running Kallisto##
##Making Kallisto indices##

[build] loading fasta file /tracer/test_data/results/cell1/expression_quantification/kallisto_index/cell1_transcriptome.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
from 654 target sequences
[build] warning: replaced 3 non-ACGUT characters in the input sequence
with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ... done
[build] creating equivalence classes ... done
[build] target de Bruijn graph has 781463 contigs and contains 113560426 k-mers

##Quantifying with Kallisto##

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 131,104
[index] number of k-mers: 113,560,426
[index] number of equivalence classes: 460,618
[quant] running in paired-end mode
[quant] will process pair 1: /tracer/test_data/cell1_1.fastq
/tracer/test_data/cell1_2.fastq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 1,135 reads, 1,042 reads pseudoaligned
[quant] estimated average fragment length: 106.333
[ em] quantifying the abundances ... done
[ em] the Expectation-Maximization algorithm ran for 52 rounds

##Filtering by read count##
Traceback (most recent call last):
File "/usr/local/bin/tracer", line 11, in
load_entry_point('tracer==0.5', 'console_scripts', 'tracer')()
File "/usr/local/lib/python3.7/dist-packages/tracer-0.5-py3.7.egg/tracerlib/launcher.py", line 43, in launch
Task().run()
File "/usr/local/lib/python3.7/dist-packages/tracer-0.5-py3.7.egg/tracerlib/tasks.py", line 1230, in run
loci=['A', 'B'], species='Mmus').run()
File "/usr/local/lib/python3.7/dist-packages/tracer-0.5-py3.7.egg/tracerlib/tasks.py", line 766, in run
cl = pickle.load(pkl)
File "/usr/local/lib/python3.7/dist-packages/Bio/Alphabet/init.py", line 21, in
"Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information."
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.

@NathanSiemers
Copy link
Author

I think the untraceability of the error is due to the Bio Alphabet embedding in the pkl test data reference files in directories like this:

https://github.com/Teichlab/tracer/tree/master/test_data/results/cell2/unfiltered_TCR_seqs

If that's true then the error is due to modern python not being able to load the old reference test results that were pickled.

N

(some text strings from the pkl file below)

S'alphabet'p154g0(cBio.AlphabetHasStopCodonp155g2Ntp156Rp157(dp158S'stop_symbol'p159S'*'p160sg154g0(cBio.Alphabet.IUPACExtendedIUPACProteinp161g2Ntp162Rp163sS'letters'

@mstubb
Copy link
Member

mstubb commented Jun 1, 2021

Thanks Nathan.

Yes, I think you're right that the error comes from test trying to load the old pickled files that were created with a previous version.

I think that a solution here would be to use an environment with the old BioPython to load those pickled files and then write them out as some kind of parseable text file (not as a pickle).

The pickles are representations of a Cell (

class Cell(object):
) object and its Recombinant (
class Recombinant(object):
) objects.

These classes aren't very complex so you could write out a text file containing their instance variables.

You could then switch to an environment with the new version of BioPython, recreate the objects using the values in your text file and then repickle them. Those should then be compatible and test should pass.

Cheers,

Mike

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants