Graduated PhD 2018
Taxonomic and Environmental Annotation of Bacterial 16S rRNA gene sequences via Shannon Entropy and Database Metadata Terms
Next-generation sequencing (NGS) based metagenomic studies provide a comprehensive view of the evolution, lifestyle and diversity of a microbial community under natural and altered conditions. Such experiments typically produce several hundreds of gigabytes of (valuable) sequencing data in a single run. However, analysis of such a large-scale data poses several technical and analytical challenges related to speed and accuracy of results, particularly when multiple and diverse genomes are sequenced and analyzed together.
Historically, researchers have relied on 'BLAST', a general purpose alignment tool, for the annotation of sequencing data. However, in the context of NGS its run-time is rather slow (even on a large compute cluster). Other existing metagenomics tools for analysis of shotgun metagenomics data can be classified into two categories: web-servers and stand-alone software. In the case of web-servers, researchers need to upload several GBs of sequence data to a remote server before beginning the analyses. The process of large file transfer can be cumbersome. These tools have all or some of the following drawbacks: i) applicable only when the metagenomic data do not contain eukaryotic sequences (which often is not the case in environmental samples), ii) slower in terms of speed, iii) unsuitable for the analysis of complex microbial communities, and/or iv) limited taxonomical resolution. Thus, there is a pressing need to develop Bioinformatics approaches/tools for rapid and accurate taxonomical and functional annotation of NGS metagenomics data.
We hypothesize that a weighted approach utilizing local 16SrRNA evolutionary conservation information could improve phylogenetic resolution and taxonomical classification of shotgun metagenomics reads. For functional classification, a hybrid approach combining protein motifs, domains, active sites along with homology mapping will improve the speed of analysis as well as predict novel genes.
Furthermore, there is a strong need to develop a robust pathway generation tool to study community metabolism. Tools available typically perform only at the level of metagenomics and only for the dominant organism in the environmental sample. Combining pathway generation approach with information from metatranscriptomics will provide an effective approach towards understanding community metabolism. Finally, a machine-learning based tool will be developed to classify multiple metagenomic samples based on features like pathway difference and gene abundance levels etc. This will provide a fast method to compare multiple metagenomic samples.
Research Project Supervisors
Professor Brajesh Singh, A/Professor Glenn Stone, Dr Ian Paulsen, Dr Christopher Quince and Dr Tom Jeffries
Jeffries TC, Rayu S, Nielsen UN, Lai KT, Ijaz A, Nazaries L, Singh BK, (2018) 'Metagenomic Functional Potential Predicts Degradation Rates of a Model Organophosphorus Xenobiotic in Pesticide Contaminated Soils', Frontiers in Microbiology, vol.9, Article no.147