Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: need at least one array to concatenate #7

Closed
mibo1996 opened this issue Apr 13, 2021 · 11 comments · Fixed by #150
Closed

ValueError: need at least one array to concatenate #7

mibo1996 opened this issue Apr 13, 2021 · 11 comments · Fixed by #150

Comments

@mibo1996
Copy link

Hi,

I am trying to use infercnvpy on my single-cell dataset, but when I run:
cnv.tl.infercnv(
adata,
reference_key="leiden_1.0",
reference_cat='12',
)

I get the error message: ValueError: need at least one array to concatenate.

When I run:
adata.var.loc[:, ["gene_ids", "chromosome", "start", "end"]].head()
I can see that these columns were successfully added to adata.var

When I run:
adata.obs.head()
I can see that there is a column, 'leiden_1.0' with clusters numbered '0'-'16'.

I'm not sure what this error message means or what I can do differently to fix it.

Thank you

@suegrimes
Copy link

I am also getting the same error, running:
infercnvpy.tl.infercnv(seu_epi, reference_key='seurat_clusters', reference_cat=['0','1'], window_size=200)

ValueError: need at least one array to concatenate

seu_epi
AnnData object with n_obs × n_vars = 5136 × 16784
obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mito', 'nCount_SCT', 'nFeature_SCT', 'S.Score', 'G2M.Score', 'Phase', 'CC.Difference', 'SCT_snn_res.0.8', 'seurat_clusters', 'tree.ident', 'condition', 'patientID', 'pANN', 'DF', 'SCT_snn_res.1', 'SCT_snn_res.0.6'
var: 'sct.detection_rate', 'sct.gmean', 'sct.variance', 'sct.residual_mean', 'sct.residual_variance', 'sct.variable', 'chromosome', 'start', 'end', 'gene_id', 'gene_name'
uns: 'neighbors'
obsm: 'X_pca', 'X_umap'
varm: 'PCs'
obsp: 'distances'

seu_epi.X
array([[-0.12743472, -0.12882212, -0.06847287, ..., -0.03828714,
-0.01745232, -0.03993058],
[-0.10143117, -0.10215754, -0.0520843 , ..., -0.0263906 ,
-0.00942162, -0.02850649],
[-0.1182593 , -0.11941153, -0.06270597, ..., -0.03406237,
-0.01457542, -0.03586897],
...,
[-0.10151398, -0.10224243, -0.05213675, ..., -0.02642813,
-0.00944661, -0.02854246],
[-0.06104027, -0.06079169, -0.02616703, ..., -0.00829963,
0.00237056, -0.01121682],
[-0.07975152, -0.07994411, -0.03827017, ..., -0.01663157,
-0.00299579, -0.01916628]])

seu_epi.obs.loc[:, ['seurat_clusters','condition']].head()
seurat_clusters condition
AACTGGTCATTCACTT-1 17 tumor_appendix
ACGCCGATCTGAAAGA-1 3 tumor_appendix
ACTGATGAGCACCGTC-1 3 tumor_appendix
AGCATACTCTACTCAT-1 10 tumor_appendix
CAACCTCCACAACTGT-1 7 tumor_appendix

@grst
Copy link
Member

grst commented Apr 25, 2021

Hi,

thanks for reporting, I'll look into it.
Could you please provide the full stacktrace of the error (together with the error message, the line numbers in the code where the error occurs should show)? That would be extremely helpful for debugging!

@suegrimes
Copy link

suegrimes commented Apr 25, 2021

Hi Gregor - thanks for looking into this! Stacktrace below:

/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/anndata/c                                            ompat/__init__.py:180: FutureWarning: Moving element from .uns['neighbors']['dis                                                     tances'] to .obsp['distances'].

This is where adjacency matrices should go now.
  warn(
INFO:root:Extracted GTF attributes: ['gene_id', 'gene_name']
WARNING: GTF file misses annotation for 13 genes in adata.
WARNING: Skipped 13 genes because of duplicate identifiers in GTF file.
... storing 'orig.ident' as categorical
... storing 'Phase' as categorical
... storing 'seurat_clusters' as categorical
... storing 'condition' as categorical
... storing 'patientID' as categorical
... storing 'DF' as categorical
... storing 'chromosome' as categorical
... storing 'gene_id' as categorical
... storing 'gene_name' as categorical
WARNING: Skipped 26 genes because they don't have a genomic position annotated.
/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/numpy/lib                                                     /arraysetops.py:583: FutureWarning: elementwise comparison failed; returning sca                                                     lar instead, but in the future will perform elementwise comparison
  mask |= (ar1 == a)

Traceback (most recent call last):
  File "run_infercnv.py", line 18, in <module>
    infercnvpy.tl.infercnv(seu_epi, reference_key='seurat_clusters', reference_c                                                     at=[0,1], window_size=200)
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/i                                                     nfercnvpy/tl/_infercnv.py", line 164, in infercnv
    reference = _get_reference(tmp_adata, reference_key, reference_cat, referenc                                                     e)
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/i                                                     nfercnvpy/tl/_infercnv.py", line 305, in _get_reference
    raise ValueError(
ValueError: The following reference categories were not found in adata.obs[refer                                                     ence_key]: [0 1]
(/venvs/anaconda/envs/groupenvs/infercnv-py) sgrimes@ikura:C03_210423_inferCNV_p                                                     yTest$ python run_infercnv.py
/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/anndata/c                                                     ompat/__init__.py:180: FutureWarning: Moving element from .uns['neighbors']['dis                                                     tances'] to .obsp['distances'].

This is where adjacency matrices should go now.
  warn(
INFO:root:Extracted GTF attributes: ['gene_id', 'gene_name']
WARNING: GTF file misses annotation for 13 genes in adata.
WARNING: Skipped 13 genes because of duplicate identifiers in GTF file.
... storing 'orig.ident' as categorical
... storing 'Phase' as categorical
... storing 'seurat_clusters' as categorical
... storing 'condition' as categorical
... storing 'patientID' as categorical
... storing 'DF' as categorical
... storing 'chromosome' as categorical
... storing 'gene_id' as categorical
... storing 'gene_name' as categorical
WARNING: Skipped 26 genes because they don't have a genomic position annotated.
  0%|                                                                                                          | 0/2 [00:11<?, ?it/s]
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/concurrent/futures/process.py", line 198, in _process_chunk
    return [fn(*args) for args in chunk]
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/concurrent/futures/process.py", line 198, in <listcomp>
    return [fn(*args) for args in chunk]
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/infercnvpy/tl/_infercnv.py", line 353, in _infercnv_chunk
    chr_pos, x_smoothed = _running_mean_by_chromosome(
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/infercnvpy/tl/_infercnv.py", line 273, in _running_mean_by_chromosome
    return chr_start_pos, np.hstack(running_means)
  File "<__array_function__ internals>", line 5, in hstack
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/numpy/core/shape_base.py", line 346, in hstack
    return _nx.concatenate(arrs, 1)
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: need at least one array to concatenate
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "run_infercnv.py", line 18, in <module>
    infercnvpy.tl.infercnv(seu_epi, reference_key='seurat_clusters', reference_cat=['0','1'], window_size=200)
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/infercnvpy/tl/_infercnv.py", line 169, in infercnv
    *process_map(
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 130, in process_map
    return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 76, in _executor_map
    return list(tqdm_class(ex.map(fn, *iterables, **map_args), **kwargs))
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
ValueError: need at least one array to concatenate

@suegrimes
Copy link

In case it factors into the troubleshooting, my AnnData object started out as a Seurat 3.x object, updated to Seurat 4.0, then converted to hd5, then to AnnData.

@grst
Copy link
Member

grst commented Apr 26, 2021

Can you please post the result of

adata.obs["chromosome"].unique()

right before running tl.infercnv?

@suegrimes
Copy link

I assume you mean adata.var["chromosome"].unique()? Results below. I see that there is a nan entry, probably due to the duplicate genes being removed. Is that the issue?

seu_epi.var["chromosome"].unique()
array(['1', nan, '2', '3', '4', '5', '6', '7', 'X', '8', '9', '11', '10',
'12', '13', '14', '15', '16', '17', '18', '20', '19', 'Y', '22',
'21', 'MT', 'GL000194.1', 'GL000195.1', 'GL000219.1', 'KI270734.1',
'GL000218.1', 'KI270726.1', 'KI270711.1', 'KI270721.1'],
dtype=object)

@grst
Copy link
Member

grst commented Apr 26, 2021

Thanks! The problem is that I expected all chromosomes (as opposed to scaffolds) to start with chr.

chromosomes = _natural_sort(
[x for x in var["chromosome"].unique() if x.startswith("chr") and x != "chrM"]
)

What GTF file are you using? I only tried with GENCODE I think.

@suegrimes
Copy link

Ah, glad it's an easy explanation! I'm using the gtf file provided for cellranger by 10X genomics. I think it is ensembl. The first few lines below:

#!genome-build GRCh38.p12

#!genome-version GRCh38

#!genome-date 2013-12

#!genome-build-accession NCBI:GCA_000001405.27

#!genebuild-last-updated 2018-01

1 havana gene 29554 31109 . + . gene_id "ENSG00000243485"; gene_version "5"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"

1 havana transcript 29554 31097 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000473358"; transcript_version "1"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-202"; transcript_source "havana"; transcript_biotype "lincRNA"; tag "basic"; transcript_support_level "5"

1 havana exon 29554 30039 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000473358"; transcript_version "1"; exon_number "1"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-202"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSE00001947070"; exon_version "1"; tag "basic"; transcript_support_level "5"

1 havana exon 30564 30667 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000473358"; transcript_version "1"; exon_number "2"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-202"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSE00001922571"; exon_version "1"; tag "basic"; transcript_support_level "5"

1 havana exon 30976 31097 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000473358"; transcript_version "1"; exon_number "3"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-202"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSE00001827679"; exon_version "1"; tag "basic"; transcript_support_level "5"

1 havana transcript 30267 31109 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000469289"; transcript_version "1"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-201"; transcript_source "havana"; transcript_biotype "lincRNA"; tag "basic"; transcript_support_level "5"

1 havana exon 30267 30667 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000469289"; transcript_version "1"; exon_number "1"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-201"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSE00001841699"; exon_version "1"; tag "basic"; transcript_support_level "5"

@grst
Copy link
Member

grst commented Apr 26, 2021

Ok, I'll look into improving the genomic_position_from_gtf function at some point that it supports other GTF files as well.
Let's keep the issue open until then.

In the meanwhile, you could try a GENCODE file, or just modify the chromosome column in adata.var such that it has the chr prefix.

@suegrimes
Copy link

ok got it. I will try one of those two things. Thanks for being so responsive looking into this!

@stefanpeidli
Copy link

In the meantime a small and dirty trick:
adata.var['chromosome'] = ['chr'+str(i) for i in adata.var['chromosome']]

Then it worked for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants