Multimodal Single-Cell Integration Across Time, Individuals, and Batches

A NeurIPS Competition (2022)


Scaling from a dozen cells a decade ago to millions of cells today, single-cell measurement technologies are driving a revolution in the life sciences. Recent advances make it possible to measure multiple high-dimensional modalities (e.g. DNA accessibility, RNA, and proteins) simultaneously in the same cell. Such data provides, for the first time, a direct and comprehensive view into the layers of gene regulation that drive biological diversity and disease.

In 2021, we organized the first single-cell analysis competition at NeurIPS bringing together 280 participants to compete on an atlas-scale dataset of human bone marrow cells from 12 donors generated across 4 sites globally.

In this competition for NeurIPS 2022, we are extending the challenge to drive innovation in modeling temporal single-cell data measured in multiple modalities at multiple time points. In this years competition, we generated a 300,000-cell time course dataset of CD34+ hematopoietic stem and progenitor cells (HSPC) from four human donors at five time points. HSPCs are stem cells that give rise to all other cells in the blood throughout adult life, and a 10-day time course captures important biology in CD34+ HSPCs.

In the test set, taken from an unseen later time point in the dataset, competitors will be provided with one modality and be tasked with predicting a paired modality measured in the same cell. The added challenge of this competition is that the test data will be from a later time point than any time point in the training data.

To read all the details and join to compete for $25,000 in prizes, sign up on Kaggle!


Card image cap

Cellarity is redefining drug discovery by targeting cell behaviors as opposed to individual proteins.
Learn more.

CZI leverages technology and collaboration to accelerate progress in science, education and community work.
Learn more.

Card image cap
Card image cap
Card image cap


Organizing Team

In alphabetical order

  • Daniel Burkhardt (Primary contact) is a Machine Learning Scientist at Cellarity. He is a core organizer of the Open Problems in Single-Cell Analysis. Daniel completed his PhD in Genetics at Yale University with a specialization in machine learning under the supervision of Smita Krishnaswamy.

  • Jonathan M. Bloom leads Perturbation Biology and Machine Learning at Cellarity. As an Institute Scientist at the Broad Institute, he cofounded the Hail team, Learning Meaningful Representations of Life, and Models, Inference & Algorithms, while contributing to sequencing benchmarks, statistical genetics, ML theory, and neuroscience efforts. Jonathan completed his PhD at Columbia and Moore Instructorship and NSF postdoc at MIT, conducting research on algebraic topology and geometry while reimagining the teaching of statistics.

  • Robrecht Cannoodt is a postdoctoral researcher in Saeys lab at VIB-UGent and a computer science consultant at Data Intuitive. During his PhD, his research was focused mainly on unsupervised learning in single-cell omics, more specifically on trajectory inference. Robrecht oversees infrastructure development for building collaborative, scalable and reproducible pipelines using NextFlow and Viash.

  • Peter Holderrieth is a Machine Learning Scientist at Cellarity where he works on hit prediction tools based on single-cell data. Previously, he was an Associate Scientist at Genomics plc. Originally trained as a mathematician, he holds an MSc in Statistics and MSc in Neuroscience from the University of Oxford.

  • Smita Krishnaswamy is an associate professor of Computer Science and Genetics at Yale University. Her research focuses on unsupervised learning, using data geometry, topology and deep learning methods for big biomedical data. She is a faculty advisor for Open Problems in Single-Cell Analysis.

  • Christopher Lance is a PhD candidate in the Machine Learning Group of Prof. Fabian Theis at the Helmholtz Center Munich where he works on establishing best practices for analysing multimodal single-cell data and the integration of multiple modalities.

  • Malte Lücken is a senior postdoctoral researcher and team leader for integrative single-cell analysis in the Machine Learning Group of Prof. Fabian Theis. His research focuses on evaluating computational methods for single-cell analysis and investigating how environmental stimuli and natural variation manifests on the level of single cells.

  • Angela Pisco is the Associate Director of Bioinformatics at the Chan Zuckerberg Biohub. Her group is responsible for generating and annotated atlas-scale single-cell datasets across biological systems.

  • Fabian Theis is the Professor of Mathematical Modeling of Biological Systems at the Department of Mathematics of the Technical University of Munich; the director of the Institute of Computational Biology, Helmholtz Zentrum München; the scientific director of the Helmholtz Artificial Intelligence Cooperation Unit; and a faculty member of the Wellcome Trust Sanger Institute, Cambridge, UK. His lab develops innovative methods for single-cell analysis.