Pattern Recognition Methods for Advanced Stochastic Protein Sequence Analysis using HMMs

Thomas Pl{\"o}tz and Gernot A. Fink
Pattern Recognition, Special Issue on Bioinformatics, 39, pages 2267-2280, 2006.

BibTeX PDF

Abstract

Currently, Profile Hidden Markov Models (Profile HMMs) are the methodology of choice for probabilistic protein family modeling. Unfortunately, despite substantial progress the general problem of remote homology analysis is still far from being solved. In this article we propose new approaches for robust protein family modeling by consequently exploiting general pattern recognition techniques. A new feature based representation of amino acid sequences serves as the basis for semi-continuous protein family HMMs. Due to this paradigm shift in processing biological sequences the complexity of family models can be reduced substantially resulting in less parameters which need to be trained. This is especially favorable when only little training data is available as in most current tasks of molecular biology research. In various experiments we prove the superior performance of advanced stochastic protein family modeling for remote homology analysis which is especially relevant for e.g.\ drug discovery applications.