Search
  • Seb Monzon

PROJECT: Oncogenic Genetic Variation Classification

Updated: Nov 22, 2019


This project used data from the Kaggle competition " Personalized Medicine: Redefining Cancer Treatment: Predict the effect of Genetic Variants to enable Personalized Medicine."


The goal of this project was to develop a model from scratch without looking at other Kaggle kernels (I do do this in my intracranial hemorrhage detection project), to see what kind of scores I can muster.


This is a multiclass classification problem. We're not given the definitions of the 9 different oncogenic classes, though we can hazard a guess that they have to do with carcinogenic odds among other factors.


The data came as follows (from the Kaggle competition):


training_variants - a comma separated file containing the description of the genetic mutations used for training. Fields are ID (the id of the row used to link the mutation to the clinical evidence), Gene (the gene where this genetic mutation is located), Variation (the aminoacid change for this mutations), Class (1-9 the class this genetic mutation has been classified on)


training_text - a double pipe (||) delimited file that contains the clinical evidence (text) used to classify genetic mutations. Fields are ID (the id of the row used to link the clinical evidence to the genetic mutation), Text (the clinical evidence used to classify the genetic mutation)


test_variants - a comma separated file containing the description of the genetic mutations used for training. Fields are ID (the id of the row used to link the mutation to the clinical evidence), Gene (the gene where this genetic mutation is located), Variation (the aminoacid change for this mutations)


test_text - a double pipe (||) delimited file that contains the clinical evidence (text) used to classify genetic mutations. Fields are ID (the id of the row used to link the clinical evidence to the genetic mutation), Text (the clinical evidence used to classify the genetic mutation)


The remainder of this blog is under construction.

0 views

©SEBASTIAN MONZON 2019