-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xboost BUG: #93
Comments
Hey MmasterT, Sorry for the major delay responding to you. I've finally had some time to come back to this.The issue you're experiencing seems like it might be caused by SignalP3-NN failing. It's hard to tell without really delving into the run but that's the issue i've encountered before that caused something like this. It's possibly related to another issue people have reported with SignalP4. |
Hi @darcyabjones, have you made any progress with this issue? Alternatively, is it possible to run SignalP3 independently and then integrate the results into the Predector run? Thanks! |
Hi, |
Hi everyone, Again apologies for my lateness. I'll look at it again tomorrow while i'm updating the install scripts. Regarding running SignalP3 separately. predector/modules/processes.nf Lines 496 to 500 in 3d2a591
A+ |
Describe the bug
Final step of the pipeline is failing for some parameter reason as described in this stackoverflow issue:
https://stackoverflow.com/questions/66491801/i-got-this-error-dataframe-dtypes-for-data-must-be-int-float-bool-or-categori
To Reproduce
I've cloned the repo and changed some of the configs to run in a slurm context with no internet access. Everthing is creates and analyzed as expected but the final file.
sbatch -p ei-cb -J predector_test -o predector_test.%j.log -c 1 --mem 10G --wrap " source nextflow-22.04.0_CBG && nextflow run ~/singularity/predector/predector/main.nf --phibase /ei/cb/common/Databases/predector/phi-base_current.fas --pfam_hmm /ei/cb/common/Databases/predector/Pfam-A.hmm.gz --pfam_dat /ei/cb/common/Databases/predector/Pfam-A.hmm.dat.gz --dbcan /ei/cb/common/Databases/predector/dbCAN-HMMdb-V11.txt --effectordb /ei/cb/common/Databases/predector/effectordb.hmm.gz -profile test -with-singularity ~/singularity/predector/predector-1.2.7.sif -resume ~/singularity/predector/predector/ -c ~/singularity/predector/predector/nextflow.config -with-report"
Expected behavior
Expeceted to get the *rank_result.tsv file of the test
Error Log
Error executing process > 'rank_results (test_set)'
Caused by:
Process
rank_results (test_set)
terminated with an error exit status (2)Command executed:
predutils load_db --mem "2" tmp.db results.ldjson
predutils rank --mem "2" --dbcan dbcan.txt --pfam pfam.txt --outfile "test_set-ranked.tsv" --secreted-weight "2" --sigpep-good-weight "0.003" --sigpep-ok-weight "0.0001" --single-transmembrane-weight "-0.7" --multiple-transmembrane-weight "-1.0" --deeploc-extracellular-weight "1.3" --deeploc-intracellular-weight "-1.3" --deeploc-membrane-weight "-0.25" --targetp-mitochondrial-weight "-0.5" --effectorp1-weight "0.5" --effectorp2-weight "2.5" --effectorp3-apoplastic-weight "0.5" --effectorp3-cytoplasmic-weight "0.5" --effectorp3-noneffector-weight "-2.5" --deepredeff-fungi-weight "0.1" --deepredeff-oomycete-weight "0.0" --effector-homology-weight "2" --virulence-homology-weight "0.5" --lethal-homology-weight "-2" --tmhmm-first-60-threshold "10" tmp.db
rm -f tmp.db
Command exit status:
2
Command output:
(empty)
DataFrame.dtypes for data must be int, float, bool or category. When
categorical type is supplied, DMatrix parameter
enable_categorical
mustbe set to
True
. Invalid columns:signalp3_nn_dTraceback (most recent call last):
File "/opt/conda/envs/predector/lib/python3.9/site-packages/predectorutils/main.py", line 253, in main
rank_runner(args)
File "/opt/conda/envs/predector/lib/python3.9/site-packages/predectorutils/subcommands/rank.py", line 1577, in runner
raise e
File "/opt/conda/envs/predector/lib/python3.9/site-packages/predectorutils/subcommands/rank.py", line 1575, in runner
inner(con, cur, args)
File "/opt/conda/envs/predector/lib/python3.9/site-packages/predectorutils/subcommands/rank.py", line 1561, in inner
df["effector_score"] = run_ltr(df)
File "/opt/conda/envs/predector/lib/python3.9/site-packages/predectorutils/subcommands/rank.py", line 1503, in run_ltr
dmat = xgb.DMatrix(df_features)
File "/opt/conda/envs/predector/lib/python3.9/site-packages/xgboost/core.py", line 532, in inner_f
return f(**kwargs)
File "/opt/conda/envs/predector/lib/python3.9/site-packages/xgboost/core.py", line 643, in init
handle, feature_names, feature_types = dispatch_data_backend(
File "/opt/conda/envs/predector/lib/python3.9/site-packages/xgboost/data.py", line 896, in dispatch_data_backend
return _from_pandas_df(data, enable_categorical, missing, threads,
File "/opt/conda/envs/predector/lib/python3.9/site-packages/xgboost/data.py", line 345, in _from_pandas_df
data, feature_names, feature_types = _transform_pandas_df(
File "/opt/conda/envs/predector/lib/python3.9/site-packages/xgboost/data.py", line 283, in _transform_pandas_df
_invalid_dataframe_dtype(data)
File "/opt/conda/envs/predector/lib/python3.9/site-packages/xgboost/data.py", line 247, in _invalid_dataframe_dtype
raise ValueError(msg)
ValueError: DataFrame.dtypes for data must be int, float, bool or category. When
categorical type is supplied, DMatrix parameter
enable_categorical
mustbe set to
True
. Invalid columns:signalp3_nn_dOperating system (please enter the following information as appropriate):
Additional context
I think changin the xgb.DMatrix(df_features) to xgb.DMatrix(df_features, enable_categorical=True) shoould do the fix.
The text was updated successfully, but these errors were encountered: