Multimodal Single-Cell Data Integration

A NeurIPS Competition (2021)


Scaling from a dozen cells a decade ago to millions of cells today, single-cell measurement technologies are driving a revolution in the life sciences. Recent advances make it possible to measure multiple high-dimensional modalities (e.g. DNA accessibility, RNA, and proteins) simultaneously in the same cell. Such data provides, for the first time, a direct and comprehensive view into the layers of gene regulation that drive biological diversity and disease.

In this competition, we present three critical tasks on multimodal single-cell data using public datasets and an first-of-its-kind multi-omics benchmarking dataset. Teams will predict one modality from another and learn representations of multiple modalities measured in the same cells. Progress will reveal how a common genetic blueprint gives rise to distinct cell types and processes, as a foundation for improving human health.


The competition will focus on three tasks:

  1. Predicting one modality from another - e.g. given chromatin accessibility measured in tens of thousands of cells, predict corresponding RNA expression for each cell
  2. Learning a joint embedding - learn a meaningful embedding of multimodal data measured in the same cells that preserves expert-annotated data organization
  3. Modality alignment - re-align multi-modal data measured in the same cells when we hide the correspondences between the two measurements

We will provide:

  • A starter kit with minimal example submissions
  • Public data curated to match the format of test datasets
  • Purpose-generated training data matching held-out test datasets

We will launch the competition in June 2021 with the starter kits and full descriptions of the tasks. We will release specialized training data in July 2021 and collect submissions in September 2021.

For now, sign up below to be the first to hear when we launch the competition.



Organizing Team

In alphabetical order

  • Daniel Burkhardt (Primary contact) is a Machine Learning Scientist at Cellarity. He is a core organizer of the Open Problems in Single-Cell Analysis. Daniel completed his PhD in Genetics at Yale University with a specialization in machine learning under the supervision of Smita Krishnaswamy.

  • Smita Krishnaswamy is an associate professor of Computer Science and Genetics at Yale University. Her research focuses on unsupervised learning, using data geometry, topology and deep learning methods for big biomedical data. She is a faculty advisor for Open Problems in Single-Cell Analysis.

  • Malte Lücken is a senior postdoctoral researcher and team leader for integrative single-cell analysis in the Machine Learning Group of Prof. Fabian Theis. His research focuses on evaluating computational methods for single-cell analysis and investigating how environmental stimuli and natural variation manifests on the level of single cells.

  • Debora Marks is an Associate Professor in the Department of Systems Biology at Harvard Medical School. Over the past five years, her lab has developed methods that accelerate structural biology with applications to cryoEM, crystallography, protein design and computed 3D structures of thousands of proteins with unknown folds, protein complexes and RNA interactions, as well as flexible, dynamic and even disordered protein ensembles.

  • Angela Pisco is the Associate Director of Bioinformatics at the Chan Zuckerbeg Biohub. Her group is responsible for generating and annotated atlas-scale single-cell datasets across biological systems.

  • Bastian Rieck is a senior assistant in the Machine Learning and Computational Biology Lab of Prof. Dr. Karsten Borgwardt at ETH Zurich. His main research interests are manifold learning approaches based on concepts from topological data analysis, with a focus on personalized medicine and computational biology.

  • Jian Tang is currently an assistant professor at HEC Montreal (business school of University of Montreal) and a core faculty member at Mila-Quebec AI Institute. His research focuses on graph representation learning, graph neural networks, drug discovery, and knowledge graphs.

  • Fabian Theis is the Professor of Mathematical Modeling of Biological Systems at the Department of Mathematics of the Technical University of Munich; the director of the Institute of Computational Biology, Helmholtz Zentrum München; the scientific director of the Helmholtz Artificial Intelligence Cooperation Unit; and a faculty member of the Wellcome Trust Sanger Institute, Cambridge, UK. His lab develops innovative methods for single-cell analysis.

  • Alexander Tong is a PhD candidate in Computer Science in the lab of Prof. Smita Krishnaswamy at Yale University. His research focuses on manifold learning and optimal transport with a focus on single-cell data. Alexander will be responsible for supporting the comparison infrastructure and evaluating submissions.

  • Guy Wolf is an assistant professor in the Department of Mathematics and Statistics at the Université de Montréal and a Core Member of Mila—the Québec AI Institute. His research focuses on manifold learning, representation learning, and geometric deep learning for exploratory data analysis, including methods for dimensionality reduction, visualization, denoising, data augmentation, and coarse graining, with particular focus on applications in biomedical data exploration.