Designing Algorithms for Peptide Discovery

Improvements in technology have made it possible to generate more and more data.  In order to analyze and interpret this data, more advanced computational tools are required.  In Biology, the use of these tools is collectively known as Bioinformatics.  In our lab, we are utilizing Bioinformatics and Machine Learning for analyzing and describing peptide functions. Read below to find out more about our project.

Project Aims

Peptide Discovery & Machine Learning

Antimicrobial peptides are small proteins that protect against bacterial pathogens in humans, plants, insects and fungi. Antibiotics are chemicals that serve the same function, but pathogens resistant to antibiotics are developing at an alarming rate, portending a potential “post-antibiotic era.”

In the Kearney Lab, we have developed computer algorithms that allow us to scan through genomic sequence data to pull out the antimicrobial genes from the thousands of other genes present. Thus, all the genomes across all of Life can be searched to find antimicrobial peptides that might be used as therapeutics (e.g., linoclotride), as food additives (e.g., nisin) or as transgenes that would protect crop plants or farm animals.

In collaboration with Erich Baker (Baylor Computer Sciences), we have developed and patented a SVM-derived algorithm to discover sequential tri- disulfide bonded peptides (“STPs”) using only genomic sequence (Islam et al., 2015). STPs are highly useful due to their unusual stability. Going further, we have developed a universal algorithm using natural language processing. This algorithm guides the development of derivative algorithms that can be used to rapidly discover a wide variety of peptides. Now, we are able to find antimicrobial and other modulatory peptides by their function, such as identifying all of the sodium or calcium channel blocker peptides in genomic sequence. In this example, a survey of such peptides could lead to the discovery of new insecticides that can be included into the plant genome rather than sprayed, and would have no residual effect on the environment.


Islam, S.M.A., Heil, B.J., Kearney, C.M., and Baker, E.J. 2018 Protein classification using modified n‑grams and skip-grams. Bioinformatics 34(9):1481-1487. doi: 10.1093/bioinformatics/btx82

Islam, S.M.A., Kearney, C.M., and Baker, E.J. 2018 Assigning biological function using hidden signatures in cystine-stabilized peptide sequences. Sci Rep 8(1):9049. doi: 10.1038/s41598-018-27177-8

Islam, S.M.A., Kearney, C.M., and Baker, E.J. 2018 Classes, databases and prediction methods of pharmaceutically and commercially important cystine-stabilized peptides. Toxins 10(6). pii: E251. doi: 10.3390/toxins10060251

Islam S.M.A., Sajed T., Kearney C.M., Baker E.J. 2015.PredSTP: a highly accurate SVM based model to predict sequential cystine stabilized peptides. 2015. BMC Bioinformatics 16:210doi:10.1186/s12859-015-0633-x Editor’s Pick