Predicting Phenotype from Genotype with Machine Learning


As increasing numbers of people choose to have their genomes sequenced and made available for research, more genomic data is available for analysis by machine learning approaches. Single Nucleotide Polymorphisms (SNPs) are known to be a major factor influencing many physical traits, diseases and other phenotypes. Using publicly available data and tools we predict phenotype from genotype using SNP data (approximately 1 million SNPs). We utilize data analysis and machine learning approaches only, no domain knowledge, so that our automated approach may be generally used to predict different phenotypes from genotype. In the first application of our method we predicted eye color with 87% accuracy.


