Multimodal Single-Cell Data Integration

A NeurIPS Competition (2021)


Scaling from a dozen cells a decade ago to millions of cells today, single-cell measurement technologies are driving a revolution in the life sciences. Recent advances make it possible to measure multiple high-dimensional modalities (e.g. DNA accessibility, RNA, and proteins) simultaneously in the same cell. Such data provides, for the first time, a direct and comprehensive view into the layers of gene regulation that drive biological diversity and disease.

In this competition for NeurIPS 2021, we present three tasks on multimodal single-cell data using public datasets and a first-of-its-kind multi-omics benchmarking dataset. Teams will predict one modality from another and learn representations of multiple modalities measured in the same cells. Progress will reveal how a common genetic blueprint gives rise to distinct cell types and processes, as a foundation for improving human health.

To learn more, you can watch the following introduction to the competition presented at the Broad Institute’s Models Inferences and Algorithms meeting or download the slides:


The competition will focus on three tasks:

  1. Predicting one modality from another - Given chromatin accessibility measured in tens of thousands of cells, predict corresponding RNA expression for each cell.
  2. Modality alignment - Re-align multi-modal data measured in the same cells when we hide the correspondences between the two measurements.
  3. Learning a joint embedding - Learn a meaningful embedding of multimodal data measured in the same cells that preserves expert-annotated data organization

To get started:

  1. Read about the tasks and the submission quickstart on our competition website
  2. View the starter kit contents
  3. Explore the data and prototype methods for free on Saturn Cloud (Optional)
  4. Implement your method and generate a submission!

Full details about all the above can be found in the Competition Documentation.

If you ever have any questions, please feel free to reach out on the Open Problems Discord Server.

If you’d like to get updates, please fill out the interest list below:


Card image cap

Cellarity is redefining drug discovery by targeting cell behaviors as opposed to individual proteins.
Learn more.

CZI leverages technology and collaboration to accelerate progress in science, education and community work.
Learn more.

Card image cap
Card image cap
Card image cap


Organizing Team

In alphabetical order

  • Daniel Burkhardt (Primary contact) is a Machine Learning Scientist at Cellarity. He is a core organizer of the Open Problems in Single-Cell Analysis. Daniel completed his PhD in Genetics at Yale University with a specialization in machine learning under the supervision of Smita Krishnaswamy.

  • Robrecht Cannoodt is a postdoctoral researcher in Saeys lab at VIB-UGent and a computer science consultant at Data Intuitive. During his PhD, his research was focused mainly on unsupervised learning in single-cell omics, more specifically on trajectory inference. Robrecht oversees infrastructure development for building collaborative, scalable and reproducible pipelines using NextFlow and Viash.

  • Smita Krishnaswamy is an associate professor of Computer Science and Genetics at Yale University. Her research focuses on unsupervised learning, using data geometry, topology and deep learning methods for big biomedical data. She is a faculty advisor for Open Problems in Single-Cell Analysis.

  • Malte Lücken is a senior postdoctoral researcher and team leader for integrative single-cell analysis in the Machine Learning Group of Prof. Fabian Theis. His research focuses on evaluating computational methods for single-cell analysis and investigating how environmental stimuli and natural variation manifests on the level of single cells.

  • Debora Marks is an Associate Professor in the Department of Systems Biology at Harvard Medical School. Over the past five years, her lab has developed methods that accelerate structural biology with applications to cryoEM, crystallography, protein design and computed 3D structures of thousands of proteins with unknown folds, protein complexes and RNA interactions, as well as flexible, dynamic and even disordered protein ensembles.

  • Angela Pisco is the Associate Director of Bioinformatics at the Chan Zuckerberg Biohub. Her group is responsible for generating and annotated atlas-scale single-cell datasets across biological systems.

  • Bastian Rieck is a senior assistant in the Machine Learning and Computational Biology Lab of Prof. Dr. Karsten Borgwardt at ETH Zurich. His main research interests are manifold learning approaches based on concepts from topological data analysis, with a focus on personalized medicine and computational biology.

  • Jian Tang is currently an assistant professor at HEC Montreal (business school of University of Montreal) and a core faculty member at Mila-Quebec AI Institute. His research focuses on graph representation learning, graph neural networks, drug discovery, and knowledge graphs.

  • Fabian Theis is the Professor of Mathematical Modeling of Biological Systems at the Department of Mathematics of the Technical University of Munich; the director of the Institute of Computational Biology, Helmholtz Zentrum München; the scientific director of the Helmholtz Artificial Intelligence Cooperation Unit; and a faculty member of the Wellcome Trust Sanger Institute, Cambridge, UK. His lab develops innovative methods for single-cell analysis.

  • Alexander Tong is a PhD candidate in Computer Science in the lab of Prof. Smita Krishnaswamy at Yale University. His research focuses on manifold learning and optimal transport with a focus on single-cell data. Alexander will be responsible for supporting the comparison infrastructure and evaluating submissions.

  • Guy Wolf is an assistant professor in the Department of Mathematics and Statistics at the Université de Montréal and a Core Member of Mila—the Québec AI Institute. His research focuses on manifold learning, representation learning, and geometric deep learning for exploratory data analysis, including methods for dimensionality reduction, visualization, denoising, data augmentation, and coarse graining, with particular focus on applications in biomedical data exploration.