We combine modern experimental and computational technologies to understand complex human diseases at a molecular level.
We are predominantly focused on metastatic progression in multiple cancers and neurodegenerative diseases as the biggest challenges to human health in the 21st century.
Advances artificial intelligence (AI) and deep learning has fundamentally revolutionized many aspects of our lives, and research and technology is no exception. Application of AI models to a variety of problems in life sciences is a rapidly growing field. As pioneers in this field, we have a long history of developing neural network models to answer key questions in genomics (Goodarzi et al, 2004, Biosystems and Marashi et al, 2006, Comp Biol Chem). Modern novel neural network architectures as well as access to the computational resources required to deploy them has been a boon for computational genomics. From interpretable models for studying long-range combinatorial injections in DNA/RNA to large language models and foundation models in chemistry and genomics, we have enjoyed a resurgence in the promise of AI/ML in biology.
It should be noted, however, much of these advances have been community driven. Early on, we took advantage of convolutional neural networks to model RBP-RNA interactions (Yu et al, 2020, Cancer Disc). We also leveraged graph convolutional neural networks and contributions from the DeepChem community to carry out in silico functional drug screening (Samuel et al, 2020, Cell Stem Cell). We have continued to build neural network models customized for various applications in genomics; for example, we recently introduced: (i) exoGRU for prediction of small RNA secretion (Zirak et al, 2023), (ii) DM2D for identification of non-coding regulatory regions that drive human cancers using whole-genome sequencing data (Woo et al, 2023), and (iii) Ribostrike, for functional drug screening across regulatory RNAs (Arshadi et al, 2023). Building on these earlier works, we are now using large language models to build specialized and biology-inspired models for applications in functional genomics, transcriptomics, and single cell biology.
Over the past decade, cancer progression has emerged as a complex evolutionary process with many dynamics forces at play at every step. The resulting widespread reprogramming of the gene expression landscape in cancer cells is a hallmark of cancer development. While the focus of cancer biologists has been on the key signaling pathways and regulatory programs that are hijacked by cancer cells, my group has been interested in the possibility of emergent regulatory modules that are engineered by cancer cells and fall outside of existing regulatory networks. This question led us to the discovery of orphan non-coding RNAs (oncRNAs), a class of small non-coding RNAs that are generally not expressed in normal tissue. We have demonstrated that cancer cells can adopt oncRNAs to carry out new regulatory functions that promote metastatic progression.
An evolutionary view of cancer progression, also requires capturing the population dynamics and identifying the “attractor” regulatory states that are most metastatic. We are taking advantage of mouse models of metastasis, in conjunction with cutting-edge technologies that are built on single-cell profiling, to reveal the complex and intricate evolutionary trajectories that lead to metastatic dissemination. We have observed that the notion of “tumor heterogeneity” is most likely the result of multiple parallel (and possibly competing) survival strategies (Nguyen et al, 2016, Nature Comm). More importantly, these strategies are specific to the distal organ microenvironment as well interactions with stromal and immune cells.
Complex human pathologies, such as cancer and neurodegenerative diseases, accompany widespread dysregulations in the regulatory programs that govern gene expression dynamics. A major component of my research is focused on unbiased and systematic platforms that enable the discovery of mechanistically novel post-transcriptional regulatory pathways that contribute to disease progression. For example, by focusing on small non-coding RNAs that are induced under stress, we identified a novel of class of tRNA-derived tRNA fragments (tiRNAs) that act as suppressors of breast cancer metastasis (Goodarzi et al, 2015, Cell). Recently, we also reported the discovery a post-transcriptional regulatory pathway that was not only mechanistically novel, but also directly promotes breast cancer metastasis (Goodarzi et al, 2014, Nature). These discoveries were made possible with development of integrated strategies, which combine modern experimental and computational technologies. This interdisciplinary approach, which taps into my background as a computational and experimental biologist, is crucial for tackling complex phenotypes in human disease.
Brain differs extensively from other tissues with respect to its transcriptome profile, with large sets of genes specifically expressed or inhibited in the brain. Among the regulators of brain-specific gene expression programs, however, factors that determine the stability and decay of mRNAs are surprisingly understudied. MicroRNAs (miRNAs) and RNA-binding proteins (RBPs) are the largest classes of such factors, which often function through binding to specific sequences within the 3’ untranslated region (UTR) of their target mRNAs – binding of miRNAs often leads to mRNA destabilization, and the binding of RBPs can either stabilize or destabilize the mRNA. There is increasing evidence that these factors play a pivotal role in shaping the transcriptome and identity of brain cells, and that deficits in their function is linked to various neurodevelopmental and neurodegenerative disorders, including Alzheimer’s disease (AD).
In a collaborative effort involving the Fattahi (UCSF) and Najafabadi (McGill) labs, we are developing novel computational frameworks as well as robust experimental models to study the contribution of RNA-binding proteins and other post-transcriptional regulators to neuronal development and neurodegenrative disease.
A systematic approach to cis-regulatory element discovery in RNA requires capturing the information provided by both the structure and the underlying sequence. The inability of motif discovery methodologies to seamlessly incorporate structural information as part of their search algorithms significantly hinders the identification of structural elements in RNA. To address the challenge outlined above, we have implemented, and continue to expand, a computational framework for discovering structural RNA elements that govern the behavior of RNA in the cell. In this approach, named TEISER (tool for eliciting informative structural elements in RNA), the large space of small structural seeds is systematically explored to identify elements that are significantly informative of transcriptomic measurements (Goodarzi et al, 2012, Nature). Using this approach, we have identified a number of structural elements that play a direct role in gene expression regulation and disease (e.g. Goodarzi et al, 2014, Nature).
Deciphering the noncoding regulatory genome is a formidable challenge. Despite the wealth of available gene expression data, broadly applicable methods for characterizing the regulatory elements that shape the underlying dynamics have been in short supply. To overcome this challenge, we have developed a suite of integrated computational and experimental techniques that overcome the major obstacles in revealing the regulatory logic underlying RNA dynamics in the cell under normal and pathologic conditions. Our computational frameworks for detecting linear and structural regulatory DNA and RNA motifs rely on directly assessing the mutual information between sequence and whole-transcriptomic measurements. Our approach makes minimal assumptions about the background sequence model and the mechanisms by which elements affect gene expression. In parallel, we have developed a series of experimental strategies, based on whole-genome observations, to validate and functionally probe these regulatory interactions in vivo. While our findings provide an encyclopedic snapshot of regulatory interactions in the cell, our knowledge of the regulatory genome is still in its infancy. Applying these strategies to other experimental models is a crucial step towards a more comprehensive understanding of the regulatory genome.
N6-methyladenosine (m6A) has been recently identified as an epitranscriptomic modification of mRNAs in eukaryotes, but its regulatory consequences and functional role in the cell is largely uncharacterized. In a series of studies, we have depicted a pivotal role for m6A modifications in miRNA processing. Using computational tools and focused experimental techniques, we have demonstrated that this modification marks the sites of primary miRNAs and helps recruit the miRNA machinery. We successfully identified the RNA-binding protein HNRNPA2B1 as one nuclear reader of this modification, which initiates the processing by interacting with DGCR8. In our view, this but one example of RNA editing regulating key RNA processing events in the cell. As such, we are interested in understanding how RNA methylation is initiated, what is its impact on the targets RNA molecule, and how this effects is brought about.
Do you have ideas that you think may help us do a better science? Then...
Contact us