The CellPhoneDB base data is based on other biological databases.
You can use tools scripts to update/recreate the CellPhoneDB base data.
When you finish to generate all data, please copy it into the cellphonedb/code/data
path and run collect script to update the database
CellPhoneDB database requires the following input tables:
- complex.csv
- gene.csv
- interaction.csv
- protein.csv
The order is important because they have data dependencies.
Tools.py
is the provided tool to preprocess the CellPhoneDB database input data.
Result data is saved in tools/out
and input data is saved in tools/data
.
gene.csv data is based on Ensembl database and Uniprot database. These data files needs to be downloaded from the respective source pages. In addition, users must provide:
- protein.csv: CellPhoneDB protein list (to filter only the genes associated to CellPhoneDB proteins).
- remove_genes.csv: Gene list needed to remove duplicated ensembls from imported data.
- hla_curated.csv: HLA Gene list to attach to the input data
- Merge Ensembl database and Uniprot on
gene_name
- Add HLA genes
- Remove genes from list and make some checks to validate the final result
python3 tools.py generate_genes uniprot_db_filename ensembl_db_filename proteins_filename remove_genes_filename hla_genes_filename [--result_filename] [--result_path] [--gene_uniprot_ensembl_merged_result_filename] [--add_hla_result_filename]
python3 tools.py generate_genes uniprot-filtered-organism_20180625.tab ensembl_db_20180625.txt ../../cellphonedb/core/data/protein.csv genes_to_remove_20180701.csv hla_genes_20180100.txt
Results: generate_genes script creates multiple files in output dir:
- gene_uniprot_ensembl_merged.csv is the result of merge uniprot and ensembl databases.
- gene_hla_added.csv is the result of add hla genes to gene_uniprot_ensembl_merged.csv list
- gene.csv is the final result (ready to be used in CellPhoneDB collector) after remove some provided genes.
interaction.csv is based on Imex database, Guide To Pharmacology database.
In adition, you need to provide:
- protein.csv: CellPhoneDB protein list.
- gene.csv: CellPhoneDB gene list.
- complex.csv: CellPhoneDB complex list.
- remove_interactions.csv: List of interactions to remove
- Process Imex raw interactions
- Get Iuphar raw database (check if new version is available)
- Process Iuphar interactions
- Merge ensemble/iuphar interactions
- Remove Complex interactions
- Remove provided interactions
- Add curated interactions
imex_raw_filename: str, iuphar_raw_filename: str, database_proteins_filename: str, database_gene_filename: str, database_complex_filename: str, interaction_to_remove_filename: str, interaction_curated_filename: str
python3 tools.py generate_interactions imex_raw_filename iuphar_raw_filename proteins_filename gene_filename complex_filename remove_interactions_filename interactions_curated_filename
python3 tools.py generate_interactions interactionsMirjana.txt interaction_iuphar_guidetopharmacology__20180619.csv ../../cellphonedb/core/data/protein.csv ../../cellphonedb/core/data/gene.csv ../../cellphonedb/core/data/complex.csv remove_interactions_20180330.csv interaction_curated_20180729.csv
Once the data is built, you need to move the interaction.csv
file and the gene.csv
file from the tools/out
folder to cellphonedb/core/data
, replacing the old files.
Aftewards, please upgrade the database using the following code:
FLASK_APP=manage.py flask reset_db
FLASK_APP=manage.py flask collect all
This commands removes the actual database (allocated by default in cellphonedb/code/cellphone.db
using sqlite) and creates the new one with the updated data.