New regulatory roles continue to emerge for natural and engineered non-coding RNAs, many of which need to fold into specific structures to carry out their function. Structural analysis is thus fundamental to understanding key cellular mechanisms and to accelerating the discovery and engineering of new RNAs for biotechnology and therapeutic applications. However, the applicability and scaling of reliable structure prediction techniques is vastly limited by technological and economical constraints, whereas the reliability of popular computational methods is generally poor. These limitations underlie a growing need to develop methodologies that enable rapid, yet accurate, characterization of structural features.
Our group tackles this challenge by focusing on nascent technologies which couple experimental approaches to detecting RNA structure with next-generation DNA sequencing. To fully leverage these advances, parallel advances in computational structural analysis must be made. Projects at the lab address the need for computational infrastructure at three levels:
1). Informatics for transcriptome-wide experiments (Ribonomics): developing data analysis platforms for high-throughput RNA structure characterization experiments.
2). Computational structural biology: developing machine-learning algorithms for integrating these data with biophysical models of folding dynamics to improve structure prediction capabilities.
3). Molecular systems engineering: applying the developed algorithms to a range of biological systems to address bioengineering, biomedical, and scientific problems.
Ribonomics (RNA Genomics)
|Recent breakthroughs in DNA sequencing technologies have dramatically reduced the cost and complexity of genome sequencing, thereby expanding our ability to study multiple organisms rapidly and efficiently. Furthermore, these advances have transformed the field of molecular biology, with the continuing emergence of a multitude of new experimental approaches, which harness cheap DNA sequencing to obtain molecular measurements at the whole-cell level. These novel experiments allow us to study DNA, RNA, and Protein-DNA/RNA interactions at unprecedented resolution and scope. At the same time, they generate massive datasets, which entail significant complexities and statistical ambiguities that we have not encountered before. These in turn convolute the desired information and warrant the development of statistical models and methods that can reconstruct it from such noisy molecular measurements.We are working on analysis methods for such sequencing-based data, with particular emphasis on measurements of RNA structure, where biochemical methods are used to discriminate between structurally-constrained and unconstrained nucleotides. Our work spans three aspects of data analysis: 1). Modeling the measurements and their relationship to structure and linking them to statistical uncertainties in the data, 2). Developing efficient and robust statistical inference algorithms to faithfully recover the structural information, and 3). Analyzing these datasets within the context of other genomics datasets.|
Computational Structural RNA Biology
|While these emerging techniques lay an infrastructure to affordable transcriptome-wide structural studies, the information they report is incomplete. This is primarily because the data consist of “soft” (as opposed to hard, or binary) indicators of a nucleotide’s base-pairing state and furthermore, they do not report pairing partner identities. This limits our ability to resolve structure from data and sequence information alone and necessitates complementation by additional knowledge bases, such as thermodynamics theory, phylogenetics, and the underlying biology. We are developing algorithmic strategies to fusing these knowledge sources together, with minimum required user input, to enhance the currently-limited power of computational RNA structure prediction methods.|
Molecular Systems Engineering
|To ground theory in practice, we have interest in applying our methodologies to a range of biological problems. The goal of this work is twofold: 1). Demonstrate the power of the methods to elucidate molecular RNA-based mechanisms, and 2). Validate, improve, and refine the algorithms we are designing. We will therefore soon operate a small molecular biology wet lab for exploratory validation of computational predictions and models. We are especially interested in applications in synthetic biology and biotechnology, such as the design of RNA aptamers and Riboswitches (RNA-based switches), in gene regulation via RNA:RNA interactions, and in RNA virology.|