Skip to main content

Seminar @2pm 24 Feb

Valid inference after clustering with application to single-cell RNA-sequencing data

Speaker: Lucy Gao, University of British Columbia
Zoom link: https://uni-sydney.zoom.us/j/8695451737

Abstract: In single-cell RNA-sequencing studies, researchers often model the variation between cells with a latent variable, such as cell type or pseudotime, and investigate associations between the genes and the latent variable. As the latent variable is unobserved, a two-step procedure seems natural: first estimate the latent variable, then test the genes for association with the estimated latent variable. However, if the same data are used for both of these steps, then standard methods for computing p-values in the second step will fail to control the type I error rate.

In my talk, I will introduce two different approaches to this problem. First, I will apply ideas from selective inference to develop a valid test for a difference in means between clusters obtained from the hierarchical clustering algorithm. Then, I will introduce count splitting: a flexible framework that enables valid inference after latent variable estimation in count-valued data, for virtually any latent variable estimation technique and inference approach. 

This talk is based on joint work with Jacob Bien (University of Southern California), Daniela Witten and Anna Neufeld (University of Washington), as well as Alexis Battle and Joshua Popp (Johns Hopkins University)