Validation of Biomarkers Predictive of Tumor Location in Coloadenocarcinoma, an Analysis of the TCGA COAD Dataset
Tumor localization correlates with prognosis in coloadenocarcinoma, with aboral tumors having a better overall survival. This can be attributed to their better response to biologicals such as the anti-EGFR (epidermal growth factor receptor) cetuximab. It was hypothesized that –regardless of the causal relationships– this “sidedness” of coloadenocarcinomas could be reconstructed on a genomic and transcriptomic level.
A two machine learning models, logistic regression
and random forest
, were implemented in
python3
using pandas
and sklearn
, among other libraries.
There are two versions of the analysis, the original jupyter notebook, and a reworked script.
The script uses the harmonized gdc api, can be configured using config flags, and handles caching
more elegantly, and so should be preferred. However, paper.pdf
references the jupyter notebook.