FranceCountry of destination:
United States of America
High-throughput technologies now enable deep and multi-faceted studies of the biological variability of living organisms at a variety of levels, including the genome, transcriptome, and epigenome. Identifying an appropriate way to simultaneously exploit and model this large accumulation of hetereogeneous large-scale ‘omics data as a whole is a major obstacle and an important area of current biostatistical research. During this research mobility, I am specifically addressing the question of targeted multi-omics integration in the context of precision medicine to improve detection power for the association of rare genetic variants. In this work, I am focusing on a comprehensive set of data collected through a grant funded by the National Heart, Lung, and Blood Institute’s (NHLBI) TransOmic Precision Medicine (TOPMed) Program, whose aim is to identify a comprehensive set of low-frequency or rare coding and functional noncoding variants underlying complex, multifactorial diseases. This study includes nearly 60,000 individuals across 26 studies who have been profiled using whole genome sequencing (WGS) data. However, the analysis of these WGS data poses many statistical challenges, in particular due to their ultra-high dimensionality. During my AgreenSkills research mobility, I am investigating one promising approach to address this issue: the model-based integration of available public transcriptomic and epigenomic data (e.g., from the GTEx, ENCODE and Roadmap Epigenomics Mapping projects) to better target functionally relevant genetic variants and significantly reduce the dimensionality, and thus the multiple testing burden. In particular, I am currently focusing on a novel penalized approach to structure sparsity penalties for variants according to their corresponding transcriptomic and epigenomic profiles.
Dr. Andrea Rau is a research scientist at the French National Institute for Agricultural Research (INRA) in the Animal Genetics and Integrative Biology research unit (Populations, Statistics, and Genome team). She received her Ph.D. in 2010 from Purdue University for the development of statistical methods to infer gene regulatory networks from time-course microarray data. Following that, she did a one-year post-doctoral fellowship at Inria Saclay, with a focus on clustering RNA-seq data to identify groups of co-expressed genes. Since 2011, she has worked at INRA in close collaboration with biologists on interdisciplinary problems at the interface of statistics and biology. Dr. Rau's research interests focus on the development of appropriate statistical methodology for the analysis of high-dimensional genomic and transcriptomic data, and the implementation of these methods in open-source software packages. Today, her work centers primarily on the inference of causal regulatory networks from gene knock-out and knock-down experiments, differential and co-expression analyses of RNA-seq data, and integrative analyses of large-scale multiomics data.
Godichon-Baggioni, A., Maugis-Rabusseau, C. and Rau, A., 2018. Clustering transformed compositional data using K-means, with applications in gene expression and bicycle sharing system data. Journal of Applied Statistics. Doi:10.1080/02664763.2018.1454894.
Rau, A. and Maugis-Rabusseau, C., 2017. Transformation and model choice for RNA-seq co-expression analysis. Briefings in Boinformatics, bbw128. Doi: 10.1093/bib/ bbw128.
Rigaill, G., Balzergue, S., Brunaud, V., Blondet, E., Rau, A., Rogier, O., Caius, J., Maugis-Rabusseau, C., SoubigouTaconnat, L., Aubourg, S., Lurin, C., Martin-Magniette, M.-L., and Delannoy, E., 2016. Synthetic datasets for the identification of key ingredients for RNA-seq differential analysis. Briefings in Bioinformatics, bbw092. Doi: 10.1093/bib/bbw092.
Rau, A., Maugis-Rabusseau, C., Martin-Magniette, M.-L., Celeux, G., 2015. Co-expression analysis of highthroughput transcriptome sequencing data with Poisson mixture models. Bioinformatics, 31(9): 1420-1427.
Gallopin, M., Celeux, G., Jaffrézic, F., Rau, A., 2015. A model selection criterion for model-based clustering of annotated gene expression data. Statistical Applications in Genetics and Molecular Biology, 14(5): 413-428.