Developing a CHIP-SEQ pipeline to determine function of chromatin remodeling and transcription related complexes in a model organism
In eukaryotes, chromatin is the complex of DNA and proteins that packages genomic DNA into chromosomes. Chromatin remodeling is the term used to indicate that the accessibility of DNA within chromatin must be regulated for transcription to happen. Cellular health depends on processes such as chromatin remodeling and transcription occurring in a timely and precise manner. The protein complexes that perform chromatin remodeling and transcription are poorly understood. The ciliate protozoa Tetrahymena thermophila has a unique nuclear biology with structurally and functionally distinct nuclei in a common cytoplasm. Recently we used a proteomic approach to characterize the protein complement of Tetrahymena several chromatin remodeling complexes that have in common a bromodomain containing protein we have named Ibd1, in addition to a divergent version of the Mediator complex that functions in transcriptional co-activation in yeast and humans. In order to understand the function of these protein complexes, I developed ChIP- Seq for use in Tetrahymena. Chromatin Immunoprecipitation followed by Next Generation Sequencing (ChIP-Seq) is a molecular method widely used in yeast and human cells to investigate the function of chromatin-related proteins by identifying their associated DNA sequences on a genomic scale. ChIP-Seq generates large quantities of data that is difficult to process and analyze, particularly for organisms with contig-based sequenced genomes. Whereas human and yeast genomes have advanced genomic annotations such as, promoter and enhancer regions and transcription binding sites, a contig-based genome typically are poorly annotated or have minimal annotation on their associated set of genes. These annotations are associated coordinates primarily predicted by gene finding programs. Poorly annotated genome sequence, such as Tetrahymena’s, makes comprehensive analysis of ChIP-Seq data difficult and as such standardized analysis pipelines are lacking. In this thesis I designed a ChIP-Seq pipeline for contig-based genomes to complement current proteomic approaches and to determine protein function of two proteins associated with chromatin remodeling and transcription in Tetrahymena, Ibd1 and Med31. The bromodomain- containing Ibd1 co-purifies in Tetrahymena with four protein complexes that function in yeast and humans in chromatin remodeling. Med31 is a conserved member of the Mediator complex in yeast, humans and Tetrahymena. ChIP-Seq analysis of Ibd1 suggests a role for the protein and its associated complexes during transcription, suggesting that it coordinates high levels of transcription of highly expressed genes in Tetrahymena thermophila. Furthermore, Med31 ChIP- Seq analysis suggested a global role for the Mediator complex in transcription regulation. Reduced levels of Med31 conduces to ectopic expression of developmental genes important for programmed DNA rearrangements and irreversible gene silencing. The ChIP-Seq computational pipeline presented in this thesis is an efficient and reliable tool to analyze genome-wide raw ChIP-Seq data generated in model organisms with poorly annotated contig-based genome sequence.