STRUCTURAL CLASSIFICATION OF PROTEINS USING IMAGE BASED MACHINE LEARNING
thesisposted on 23.05.2021, 13:16 by Daniel Franklin
Classification of proteins is an important area of research that enables better grouping of proteins either by their function, evolutionary similarities or in their structural makeup. Structural classification is the area of research that this thesis focuses on. We use visualizations of proteins to build a machine learning class prediction model, that successfully classifies proteins using the Structural Classification of Proteins (SCOP) framework. SCOP is a well-researched classification with many approaches using a representation of a proteins secondary structure in a linear chain of structures. This thesis uses a novel approach of rendering a three dimensional visualization of the protein itself and then applying image based machine learning to determine a protein’s SCOP classification. The resulting convolutional neural network (CNN) method has achieved average accuracies in the range 78-87% on the 25PDB dataset, which is better than or equal to the existing methods.