Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a problem about follow your pipeline s3_isoform.py #4

Open
loverlyday opened this issue Feb 3, 2021 · 19 comments
Open

a problem about follow your pipeline s3_isoform.py #4

loverlyday opened this issue Feb 3, 2021 · 19 comments

Comments

@loverlyday
Copy link

loverlyday commented Feb 3, 2021

i have fix the errors i asked 16 days ago ,but I get a new errors when do isoform reconstruction:

"""
Traceback (most recent call last):
File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/zhouw/xxxx/envs/trim/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/xxxx/temp/ERR3835349/pyModule/isoform_reconstruct.py", line 226, in _run_isoform
results = aligned.groupby(by='BC_UB').apply(_isoform_inference_of_single_molec, ref_iso_dict[gene])
File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 859, in apply
result = self._python_apply_general(f, self._selected_obj)
File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 892, in _python_apply_general
keys, values, mutated = self.grouper.apply(f, data, self.axis)
File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 220, in apply
res = f(group)
File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 843, in f
return func(g, *args, **kwargs)
File "/home/xxxx/temp/ERR3835349/pyModule/isoform_reconstruct.py", line 196, in _isoform_inference_of_single_molec
out = [aligned_reads_df[16].iloc[0], aligned_reads_df[17].iloc[0],
File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/site-packages/pandas/core/indexing.py", line 879, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/site-packages/pandas/core/indexing.py", line 1496, in _getitem_axis
self._validate_integer(key, axis)
File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/site-packages/pandas/core/indexing.py", line 1437, in _validate_integer
raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/xxxx/temp/ERR3835349/ss3_isofrom.py", line 109, in
main()
File "/home/xxxx/temp/ERR3835349/ss3_isofrom.py", line 105, in main
get_isoforms(conf_data, out_path, ref)
File "/home/xxxx/temp/ERR3835349/pyModule/isoform_reconstruct.py", line 469, in get_isoforms
pool.map(func, remain_genes, chunksize=1)
File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
IndexError: single positional indexer is out-of-bounds

this errors the last remaining gene files but when i just run the gene file with "results = aligned.groupby(by='BC_UB').apply(_isoform_inference_of_single_molec, ref_iso_dict[gene])",

i find that the errors occur when you call _run_isoform.what's the different between _run_isoform and isoform_inference_correction_by_ass_v2? if the _run_isoform have no meaning.i will
ignore it.
looking for you early replay.

@PingChen-Angela
Copy link
Contributor

The error looks like there might be no content in your input "aligned_reads_df". Is it empty?

@loverlyday
Copy link
Author

The error looks like there might be no content in your input "aligned_reads_df". Is it empty?

no the file is not empty.the errors occur the last gene in the set, it's random.may be you can try your data when the first " remaining files" not equal to zero, then may you can see the error。

@PingChen-Angela
Copy link
Contributor

Hi @loverlyday, I cannot see the error from my side. Can you send me the gene file where the error occurred? It is under the keptReads/chr folder with file name "[gene name]_aligned_reads.csv". That will help me find out the reason.

@yiyelinfeng
Copy link

Hi@loverlyday, I meet the same error, have you solved it? Looking forward to your reply. thanks!

@PingChen-Angela
Copy link
Contributor

Hi@loverlyday, I meet the same error, have you solved it? Looking forward to your reply. thanks!

Hi, @yiyelinfeng! Can you send me one of the gene files under the keptReads/chr folder with file name "[gene name]_aligned_reads.csv". That will help me fix the issue. Thanks!

@yiyelinfeng
Copy link

yiyelinfeng commented Aug 6, 2021 via email

@kwglam
Copy link

kwglam commented Sep 5, 2021

Hi @yiyelinfeng and PingChen-Angela,

After running the ss3_isoform.py program for almost two weeks, I have now got exactly the same error as what you reported (Please see the error message below). Have you figured out what the problem is? Would you please kindly share how you fix this bug? Thanks you very much in advance in replying my message.

Error message:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/gpfs/gsfs10/users/xxx/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 226, in _run_isoform
results = aligned.groupby(by='BC_UB').apply(_isoform_inference_of_single_molec, ref_iso_dict[gene])
File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1253, in apply
result = self._python_apply_general(f, self._selected_obj)
File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1287, in _python_apply_general
keys, values, mutated = self.grouper.apply(f, data, self.axis)
File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 820, in apply
res = f(group)
File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1237, in f
return func(g, *args, **kwargs)
File "/gpfs/gsfs10/users/xxx/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 196, in _isoform_inference_of_single_molec
out = [aligned_reads_df[16].iloc[0], aligned_reads_df[17].iloc[0],
File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/indexing.py", line 931, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/indexing.py", line 1566, in _getitem_axis
self._validate_integer(key, axis)
File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/indexing.py", line 1500, in _validate_integer
raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/xxx/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 109, in
main()
File "/data/xxx/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 105, in main
get_isoforms(conf_data, out_path, ref)
File "/gpfs/gsfs10/users/xxx/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 469, in get_isoforms
pool.map(func, remain_genes, chunksize=1)
File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
IndexError: single positional indexer is out-of-bounds

@PingChen-Angela
Copy link
Contributor

@yiyelinfeng and @kwglam, I have updated the code and please try again.

@kwglam
Copy link

kwglam commented Sep 7, 2021

Hi Angela,

Thanks for updating the script. I will try to run it again.

BTW, I am wondering how much walltime you usually need to run the ss3_isoform.py script with 50 processors. I previously used 10 processors to run the script and it took more than 12 days to hit the issue. Thanks!

@PingChen-Angela
Copy link
Contributor

Hi Angela,

Thanks for updating the script. I will try to run it again.

BTW, I am wondering how much walltime you usually need to run the ss3_isoform.py script with 50 processors. I previously used 10 processors to run the script and it took more than 12 days to hit the issue. Thanks!

It might be a bit slow for big dataset, but you don't need to rerun everything. You can just update the code there and use the same output folder.

@kwglam
Copy link

kwglam commented Sep 7, 2021

Hi Angela,
Thanks for updating the script. I will try to run it again.
BTW, I am wondering how much walltime you usually need to run the ss3_isoform.py script with 50 processors. I previously used 10 processors to run the script and it took more than 12 days to hit the issue. Thanks!

It might be a bit slow for big dataset, but you don't need to rerun everything. You can just update the code there and use the same output folder.

But I guess I still have to run the entire quantification part (-Q), right? I reckoned that if using the same output folder, it will skip genes that already existed. Is that what you mean?

@PingChen-Angela
Copy link
Contributor

Hi Angela,
Thanks for updating the script. I will try to run it again.
BTW, I am wondering how much walltime you usually need to run the ss3_isoform.py script with 50 processors. I previously used 10 processors to run the script and it took more than 12 days to hit the issue. Thanks!

It might be a bit slow for big dataset, but you don't need to rerun everything. You can just update the code there and use the same output folder.

But I guess I still have to run the entire quantification part (-Q), right? I reckoned that if using the same output folder, it will skip genes that already existed. Is that what you mean?

Yes.

@kwglam
Copy link

kwglam commented Sep 8, 2021

Hi Angela,
Thanks for updating the script. I will try to run it again.
BTW, I am wondering how much walltime you usually need to run the ss3_isoform.py script with 50 processors. I previously used 10 processors to run the script and it took more than 12 days to hit the issue. Thanks!

It might be a bit slow for big dataset, but you don't need to rerun everything. You can just update the code there and use the same output folder.

But I guess I still have to run the entire quantification part (-Q), right? I reckoned that if using the same output folder, it will skip genes that already existed. Is that what you mean?

Yes.

Hi Angela,

I have run your script with the updated code and it finally generated the 'assigned_isoforms' folder. However, it hit with another error:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 336, in isoform_inference_correction_by_ass_v2
ass_junc = get_junction(ass, trans_df)
File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 260, in get_junction
ass_start_junc = tmp.apply(_get_junc_start, axis=1, trans_df=trans_df)
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/frame.py", line 8736, in apply
return op.apply()
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 688, in apply
return self.apply_standard()
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 812, in apply_standard
results, res_index = self.apply_series_generator()
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 828, in apply_series_generator
results[i] = self.f(v)
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 131, in f
return func(x, *args, **kwargs)
File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 272, in _get_junc_start
row_idx = list(trans_df.query('Exon_Idx=="%s" and Transcripts=="%s"' %(x[3], x[2])).index)[0]
IndexError: list index out of range
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 109, in
main()
File "/data/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 105, in main
get_isoforms(conf_data, out_path, ref)
File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 484, in get_isoforms
pool.map(func, infered_gene_paths, chunksize=1)
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
IndexError: list index out of range

Any insights on this? Thanks a lot!!

Gabriel

@PingChen-Angela
Copy link
Contributor

Hi Angela,
Thanks for updating the script. I will try to run it again.
BTW, I am wondering how much walltime you usually need to run the ss3_isoform.py script with 50 processors. I previously used 10 processors to run the script and it took more than 12 days to hit the issue. Thanks!

It might be a bit slow for big dataset, but you don't need to rerun everything. You can just update the code there and use the same output folder.

But I guess I still have to run the entire quantification part (-Q), right? I reckoned that if using the same output folder, it will skip genes that already existed. Is that what you mean?

Yes.

Hi Angela,

I have run your script with the updated code and it finally generated the 'assigned_isoforms' folder. However, it hit with another error:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 336, in isoform_inference_correction_by_ass_v2
ass_junc = get_junction(ass, trans_df)
File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 260, in get_junction
ass_start_junc = tmp.apply(_get_junc_start, axis=1, trans_df=trans_df)
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/frame.py", line 8736, in apply
return op.apply()
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 688, in apply
return self.apply_standard()
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 812, in apply_standard
results, res_index = self.apply_series_generator()
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 828, in apply_series_generator
results[i] = self.f(v)
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 131, in f
return func(x, *args, **kwargs)
File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 272, in _get_junc_start
row_idx = list(trans_df.query('Exon_Idx=="%s" and Transcripts=="%s"' %(x[3], x[2])).index)[0]
IndexError: list index out of range
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 109, in
main()
File "/data/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 105, in main
get_isoforms(conf_data, out_path, ref)
File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 484, in get_isoforms
pool.map(func, infered_gene_paths, chunksize=1)
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
IndexError: list index out of range

Any insights on this? Thanks a lot!!

Gabriel

Hi Gabriel,
Thanks for reporting this error. I will look into it soon. In the meanwhile, if you change your number of processes, do you still see the error?

@kwglam
Copy link

kwglam commented Sep 9, 2021

Hi Angela,
Thanks for updating the script. I will try to run it again.
BTW, I am wondering how much walltime you usually need to run the ss3_isoform.py script with 50 processors. I previously used 10 processors to run the script and it took more than 12 days to hit the issue. Thanks!

It might be a bit slow for big dataset, but you don't need to rerun everything. You can just update the code there and use the same output folder.

But I guess I still have to run the entire quantification part (-Q), right? I reckoned that if using the same output folder, it will skip genes that already existed. Is that what you mean?

Yes.

Hi Angela,
I have run your script with the updated code and it finally generated the 'assigned_isoforms' folder. However, it hit with another error:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 336, in isoform_inference_correction_by_ass_v2
ass_junc = get_junction(ass, trans_df)
File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 260, in get_junction
ass_start_junc = tmp.apply(_get_junc_start, axis=1, trans_df=trans_df)
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/frame.py", line 8736, in apply
return op.apply()
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 688, in apply
return self.apply_standard()
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 812, in apply_standard
results, res_index = self.apply_series_generator()
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 828, in apply_series_generator
results[i] = self.f(v)
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 131, in f
return func(x, *args, **kwargs)
File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 272, in _get_junc_start
row_idx = list(trans_df.query('Exon_Idx=="%s" and Transcripts=="%s"' %(x[3], x[2])).index)[0]
IndexError: list index out of range
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/data/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 109, in
main()
File "/data/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 105, in main
get_isoforms(conf_data, out_path, ref)
File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 484, in get_isoforms
pool.map(func, infered_gene_paths, chunksize=1)
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
IndexError: list index out of range
Any insights on this? Thanks a lot!!
Gabriel

Hi Gabriel,
Thanks for reporting this error. I will look into it soon. In the meanwhile, if you change your number of processes, do you still see the error?

Hi Angela,
Very much appreciated your efforts and time. I used the default number of processes (8) in the command line. Do I have to specify the same number in the config file (nproc=8)? What is the appropriate number to use? Thanks!

@kwglam
Copy link

kwglam commented Sep 10, 2021

Hi Angela,
I tried with nproc=8 in the config. The program terminated again with the same error but at different gene.
Meanwhile, I was running the program with another bam file. However, it encountered another problem that did not appear before.

ENSG00000141384
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 300, in isoform_inference_correction_by_ass_v2
initial_infered = pd.read_table(gene_file, header=None, index_col=None, sep="\t")
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 683, in read_table
return _read(filepath_or_buffer, kwds)
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in init
self._engine = self._make_engine(self.engine)
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 69, in init
self._reader = parsers.TextReader(self.handles.handle, **kwds)
File "pandas/_libs/parsers.pyx", line 549, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 109, in
main()
File "/data/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 105, in main
get_isoforms(conf_data, out_path, ref)
File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 484, in get_isoforms
pool.map(func, infered_gene_paths, chunksize=1)
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
pandas.errors.EmptyDataError: No columns to parse from file

Do you think these problems were caused by the number of processes used or by specific genes? Thanks!

@PingChen-Angela
Copy link
Contributor

Hi @kwglam, the issue came from parallelisation. Can you send me the gene file with name "ENSG00000141384" under .R1 in your output folder? Please send to my email address [email protected].

@xucaoling
Copy link

Hi Angela,
When i use ss3_isoform.py, I got an error:

and error message:
Preprocessing on input BAM ...
[bam_sort_core] merging from 88 files and 8 in-memory blocks...
Collect informative reads per gene...
...for genes on chr1
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/data/vip55/software/Smart-seq3-master/ss3iso/pyModule/informative_reads.py", line 479, in _get_reads
report_gene = gobj.get_aligned_reads(n_read_limit, passed_cells)
File "/home/data/vip55/software/Smart-seq3-master/ss3iso/pyModule/informative_reads.py", line 84, in get_aligned_reads
samfile = pysam.AlignmentFile(self.in_bam_uniq, "rc")
File "pysam/libcalignmentfile.pyx", line 741, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 990, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='rc') - is it SAM/BAM format? Consider opening with check_sq=False
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/data/vip55/software/Smart-seq3-master/ss3iso/ss3_isoform.py", line 109, in
main()
File "/home/data/vip55/software/Smart-seq3-master/ss3iso/ss3_isoform.py", line 99, in main
fetch_gene_reads(in_bam_uniq, in_bam_multi, conf_data, op.species, out_path)
File "/home/data/vip55/software/Smart-seq3-master/ss3iso/pyModule/informative_reads.py", line 550, in fetch_gene_reads
report_genes = pool.map(func, genes, chunksize=1)
File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
ValueError: file has no sequences defined (mode='rc') - is it SAM/BAM format? Consider opening with check_sq=False

and my code is:
$python /home/data/vip55/software/Smart-seq3-master/ss3iso/ss3_isoform.py -i smartseq3_mouse_fibroblast.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam -e smartseq3_mouse_fibroblast -o ss3 -p 8 -s mm10 -P -R -c ss3_isoform.conf

I don't know how to solve the problem. Will you help me out?

@jiangfuqing
Copy link

Hi Angela, When i use ss3_isoform.py, I got an error:

and error message: Preprocessing on input BAM ... [bam_sort_core] merging from 88 files and 8 in-memory blocks... Collect informative reads per gene... ...for genes on chr1 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/home/data/vip55/software/Smart-seq3-master/ss3iso/pyModule/informative_reads.py", line 479, in _get_reads report_gene = gobj.get_aligned_reads(n_read_limit, passed_cells) File "/home/data/vip55/software/Smart-seq3-master/ss3iso/pyModule/informative_reads.py", line 84, in get_aligned_reads samfile = pysam.AlignmentFile(self.in_bam_uniq, "rc") File "pysam/libcalignmentfile.pyx", line 741, in pysam.libcalignmentfile.AlignmentFile.cinit File "pysam/libcalignmentfile.pyx", line 990, in pysam.libcalignmentfile.AlignmentFile._open ValueError: file has no sequences defined (mode='rc') - is it SAM/BAM format? Consider opening with check_sq=False """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/data/vip55/software/Smart-seq3-master/ss3iso/ss3_isoform.py", line 109, in main() File "/home/data/vip55/software/Smart-seq3-master/ss3iso/ss3_isoform.py", line 99, in main fetch_gene_reads(in_bam_uniq, in_bam_multi, conf_data, op.species, out_path) File "/home/data/vip55/software/Smart-seq3-master/ss3iso/pyModule/informative_reads.py", line 550, in fetch_gene_reads report_genes = pool.map(func, genes, chunksize=1) File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value ValueError: file has no sequences defined (mode='rc') - is it SAM/BAM format? Consider opening with check_sq=False

and my code is: $python /home/data/vip55/software/Smart-seq3-master/ss3iso/ss3_isoform.py -i smartseq3_mouse_fibroblast.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam -e smartseq3_mouse_fibroblast -o ss3 -p 8 -s mm10 -P -R -c ss3_isoform.conf

I don't know how to solve the problem. Will you help me out?

Hi, have you fixed this error? thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants