Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with running scenicplus grn_inference eGRN #532

Open
zhli12 opened this issue Jan 14, 2025 · 2 comments
Open

Issue with running scenicplus grn_inference eGRN #532

zhli12 opened this issue Jan 14, 2025 · 2 comments

Comments

@zhli12
Copy link

zhli12 commented Jan 14, 2025

Hello,

When trying to run the command scenicplus grn_inference eGRN, I received an error saying

Traceback (most recent call last):
  File "/home/zli4/.conda/envs/scenicplus/bin/scenicplus", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/zli4/.conda/envs/scenicplus/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 1137, in main
    args.func(args)
  File "/home/zli4/.conda/envs/scenicplus/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 861, in eGRN
    infer_grn(
  File "/home/zli4/.conda/envs/scenicplus/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 885, in infer_grn
    eRegulons = build_grn(
                ^^^^^^^^^^
  File "/home/zli4/.conda/envs/scenicplus/lib/python3.11/site-packages/scenicplus/grn_builder/gsea_approach.py", line 143, in build_grn
    relevant_tfs, e_modules = create_emodules(
                              ^^^^^^^^^^^^^^^^
  File "/home/zli4/.conda/envs/scenicplus/lib/python3.11/site-packages/scenicplus/grn_builder/modules.py", line 592, in create_emodules
    for context, r2g_df in tqdm(r2g_iter, total=total_iter, disable=disable_tqdm):
  File "/home/zli4/.local/lib/python3.11/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/home/zli4/.conda/envs/scenicplus/lib/python3.11/site-packages/scenicplus/grn_builder/modules.py", line 514, in iter_thresholding
    grouped_adj_by_gene = Groupby(adj[TARGET_GENE_NAME].to_numpy())
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zli4/.conda/envs/scenicplus/lib/python3.11/site-packages/scenicplus/utils.py", line 278, in __init__
    self.n_keys = max(self.keys_as_int) + 1
                  ^^^^^^^^^^^^^^^^^^^^^
ValueError: max() arg is an empty sequence

after the message "GSEA INFO Thresholding region to gene relationships".

The function worked fine when I ran the tutorial, so it's probably not an issue with package versions. However, for my purpose, I am using bulk ATAC-seq data instead, so I modified the region_to_gene_adj_fname where the region-gene relationship is entirely based on a distance cutoff with correlation and importance being a constant (I've attached the screenshot below).

The full command I used is

scenicplus grn_inference eGRN \
    --TF_to_gene_adj_fname tf_to_gene_adj.tsv \
    --region_to_gene_adj_fname bulk_region_to_gene_adj2.tsv \
    --cistromes_fnamepysicTopic_outs/cistromes_direct.h5ad \
    --ranking_db_fname 10x_10kPBMC_1kb_bg_with_mask.regions_vs_motifs.rankings.feather \
    --eRegulon_out_fname eRegulon_direct.tsv \
    --temp_dir tmp \
    --order_regions_to_genes_by importance \
    --order_TFs_to_genes_by importance \
    --gsea_n_perm 1000 \
    --quantiles 0.8 0.9 0.95 \
    --top_n_regionTogenes_per_gene 5 10 15 \
    --min_regions_per_gene 0 \
    --rho_threshold 0.001 \
    --min_target_genes 10 \
    --n_cpu 20
Screenshot 2025-01-13 at 9 43 07 PM

I'm wondering whether anyone has tried something similar or has some idea on how to fix the error. Any help would be greatly appreciated!

@SeppeDeWinter
Copy link
Collaborator

Hi @zhli12

This error would occur when an empty dataframe is passed to iter_thresholding (https://github.com/aertslab/scenicplus/blob/86fa25e6819919e2a824c14f7fd0d6e01e257ae9/src/scenicplus/grn_builder/modules.py#L513C9-L513C26).

Can you check wether any of these dataframes is empty

import pandas as pd

region_to_gene = pd.read_table("bulk_region_to_gene_adj2.tsv")
CORRELATION_COEFFICIENT_NAME="rho"
rho_threshold=0.001

repressing_adj = region_to_gene.loc[
    region_to_gene[CORRELATION_COEFFICIENT_NAME] < -rho_threshold]
activating_adj = region_to_gene.loc[
    region_to_gene[CORRELATION_COEFFICIENT_NAME] > rho_threshold]

print(f"Repressing: {len(repressing_adj)}")
print(f"Activating: {len()activating_adj}")

Best,

Seppe

@zhli12
Copy link
Author

zhli12 commented Jan 20, 2025

Hi Seppe,

Thank you for your reply!

The output to the code you suggested is

Repressing: 0
Activating: 1528374

As this suggests that the issue might be caused by the inclusion of repressing enhancers, I modified the scenicplus parameters to

scenicplus grn_inference eGRN \
    --TF_to_gene_adj_fname /home/zli4/10kPBMC/pysicTopic_outs/tf_to_gene_adj.tsv \
    --region_to_gene_adj_fname /home/zli4/10kPBMC/bulkATAC/bulk_region_to_gene_adj2.tsv \
    --cistromes_fname /home/zli4/10kPBMC/pysicTopic_outs/cistromes_direct.h5ad \
    --ranking_db_fname /home/zli4/10kPBMC/10x_10kPBMC_1kb_bg_with_mask.regions_vs_motifs.rankings.feather \
    --eRegulon_out_fname /home/zli4/10kPBMC/bulkATAC/eRegulon_direct.tsv \
    --temp_dir /home/zli4/10kPBMC/bulkATAC/tmp \
    --order_regions_to_genes_by importance \
    --order_TFs_to_genes_by importance \
    --gsea_n_perm 1000 \
    --quantiles 0 \
    --top_n_regionTogenes_per_gene 10000 \
    --min_regions_per_gene 0 \
    --rho_threshold 0.001 \
    --min_target_genes 10 \
    --keep_only_activating_eRegulons \
    --n_cpu 20

where I added the addition argument keep_only_activating_eRegulons.

However, even with this modification, I still get the same error. I'm not whether this is because setting keep_only_activating_eRegulons is not the correct way to avoid including repressing enhancers or whether this is because I'm not setting this option properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants