Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk update graphite #282

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

mandawilson
Copy link
Contributor

@mandawilson mandawilson commented Jan 13, 2025

NOT TESTED

Given two CSV files (one oncotree downloaded from Graphite, the second the same file modified with any changes needed to the tree) this will output an RDF file to be uploaded to Graphite. Any nodes that need removing from Graphite must be removed manually. The script will validate the changes and also output a description of them and will ask the user to confirm that all changes are intentional. It will remind them to manually delete any oncotree nodes that have been removed.

An example run from the script looks like this:

% ./validate_new_tree_and_output_rdf.py -o original.csv -m modified.csv -t to_upload.rdf

Removed internal ids:
	ONC000251: Cervical Adenoid Basal Carcinoma (CABC)
	ONC000612: Uterine Perivascular Epithelioid Cell Tumor (UPECOMA)
	ONC000733: B-Lymphoblastic Leukemia/Lymphoma, NOS (BLLNOS)

New internal ids:
	ONC000612X: Uterine Perivascular Epithelioid Cell Tumor (UPECOMA) has parent ONC000607: Uterine Sarcoma/Mesenchymal (USARC)
	ONC000999: My new cancer type (TEST) has parent ONC000001: Tissue (TISSUE)
	ONC001000: Another new type (TEST2) has parent ONC000016: Gallbladder Cancer (GBC)

Precursors:
	None

Revocations:
	Removed -- ARE YOU SURE YOU MEANT TO DO THIS?
		'ONC000376' ('PTCL')'
			'ONC000379: Peripheral T-Cell lymphoma, NOS (PTCL)'

Oncotree code/label changes with no internal id change.  This is allowed as long as the new code/label covers the exact same set of cancer cases
	'ONC000293: Eye (EYE)' -> 'ONC000293: Eye (EYEX)'

Parent change
	child: 'ONC000294: Retinoblastoma (RBL)' parent: 'EYE' -> child: 'ONC000294: Retinoblastoma (RBL)' parent: 'OM'


Please confirm that all of the above changes are intentional.
Enter [y]es, [n]o if not: 

parent_is_defined_if_not_tissue(row)

# check these columns are unique
field_is_unique(row[graphite.CSV_RESOURCE_URI],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused about this function, not sure I see how resource_uri_set is ever populated

also, the function itself doesn't seem to use the arguments being passed in. maybe misreading?

print(f"'{field_name}' is a required field in '{csv_file}'. It is empty for '{internal_id}'", file=sys.stderr)
sys.exit(1)

def field_is_unique(field, field_name, column_set, internal_id, csv_file):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove internal_id - not used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants