Projects

Research Interests

TFBS prediction in early Drosophila melanogaster embryos

Abstract:

Numerous transcription factors (TFs) collaboratively shape cellular processes within gene regulatory networks by binding to specific sequences known as cis-regulatory modules (CRMs). Accurately predicting TF binding sites (TFBS) in the genomic landscape will uncover crucial protein-DNA interactions encoding these networks. Approaches like ChIP-seq are limited to assaying one TF at a time and yield false positives, an issue also common in the computational motif scanning approach. We use DNA accessibility data to simultaneously predict the TFBS of 14 TFs essential for developmental patterning in early Drosophila melanogaster embryos. Early embryonic DNA accessibility data and TF binding motifs undergo processing using TOBIAS, a tool which identifies genuine TF binding through TF "footprints," reducing false positives. Analysis using the genomics BEDTools suite reveals that most TFBS are outside of CRMs, with intergenic and intronic regions hosting the highest TFBS count. However, density of TFBS is highest in CRMs, followed unexpectedly by 5’UTRs. To understand the biological significance of the large number of TF binding sites outside of known CRMs, we propose two hypotheses: 1) Some belong to uncharacterized CRMs. 2) They indicate evidence of a dosage-based activation mechanism. To address the first question, we pinpoint high-density TFBS clusters within active enhancer histone mark regions as potential novel CRMs. Further validation requires computational sequence conservation analysis and reporter assays. Future research will investigate high TFBS density in 5’UTRs to explore the dosage-based activation mechanism. This study impacts CRM prediction algorithms and reveals more of the nature of the TFBS landscape.

BioBuilder Outreach Experience

This summer I had the pleasure to participate in teaching highschool students some basic wet lab skills in the BioBuilder program. We imparted the essentials of synthetic biology and highlighted the distinctions between treating an organism as a manufacturing unit versus a machine. The project centered around evaluating promoters of varying strengths that targeted the lacZ gene, resulting in the production of β-Galactosidase in E. coli. We guided students on the use of micropipettes and interpretation of Spectrophotometer data.

To ensure proper function and development of an organism, the genome provides complex regulatory networks that consist of elements acting in cis and in trans. My research projects have focused on analyzing different components present in these gene regulatory networks.

One of these elements are transcription factors (TFs), proteins that bind to specific sequences in the genome that contribute to transcriptional regulation. Assaying the binding landscape of these proteins throughout the genome has been riddled with inefficient low throughput techniques such as (Chromatin Immunoprecipitation) ChIP-seq and high levels of false positives through TF motif scanning. My goal is to test a novel way to computationally predict TF binding sites (TFBS) with a focus on D.melanogaster early stage embryos (blastoderm stage). This work is mainly computationally based and carried out by analyzing high resolution ATAC-seq data through the software TOBIAS which will be able to locate TFBS through ‘footprints’ left by TFs when bound to the genome. Using this technique we will be able to observe the binding landscape of TFs of choice, leading to the possible identification of previously unknown regulatory sequences and untested functional regions of the genome. Performance of this study can encourage the development of machine learning algorithms to better predict the presence of unknown cis-regulatory regions in other organisms and conditions.

Another set of elements that aid in transcription regulation are specific sequences known as enhancers, of which many TFs attach to. A subset of these enhancers have multiple ‘versions’ of itself that regulate the same gene and appear on the surface to be redundant. These enhancers have been dubbed shadow enhancers and they are present in many organisms. While they have been shown to act as important buffers when an organism is subjected to a stressful condition, much of their structure and grammar has been untested. My interest is to learn more about the requirements needed for shadow enhancer activity and if any perturbation to its endogenous structure will cause any changes in its expression patterns. Also using early stage D.melanogaster embryos, this project focuses on building synthetic ‘super-enhancers’ by removing the endogenous space between the shadow enhancer pairs of well studied gap genes giant, kruppel, and nubbin. Results from this study will increase the body of knowledge on the structure of shadow enhancers and their effects under modifications. This will be able to be applied to human shadow enhancer systems and any diseases that may spawn from their structural mutations.