Multimodal Single-Cell Data Integration

A NeurIPS Competition (2021)


Scaling from a dozen cells a decade ago to millions of cells today, single-cell measurement technologies are driving a revolution in the life sciences. Recent advances make it possible to measure multiple high-dimensional modalities (e.g. DNA accessibility, RNA, and proteins) simultaneously in the same cell. Such data provides, for the first time, a direct and comprehensive view into the layers of gene regulation that drive biological diversity and disease.

In this competition for NeurIPS 2021, we present three tasks on multimodal single-cell data using a first-of-its-kind multi-omics benchmarking dataset. Teams will predict one modality from another and learn representations of multiple modalities measured in the same cells. Progress will reveal how a common genetic blueprint gives rise to distinct cell types and processes, as a foundation for improving human health.

To learn more, you can watch a lecture presented at the Broad Institute’s Models Inferences and Algorithms meeting (slides).


We’re proud to announce the winners of our 2021 NeurIPS competition!

Task 1 - Modality Prediction

Given one modality, predict the other.

GEX→ATAC: Living Systems Lab

KAUST, code
Aidyn Ubingazhibov, Sumeer Khan, Robert Lehman, Xabier Martinez De Morentin, Minxing Pang, David Gomez-Cabrero, Narsis Kiani, Jesper Tegner


Francis Crick Institute, code
Anna Laddach, Roman Laddach, Michael Shapiro

GEX→ADT: Dengkw

University of Michigan, code
Kaiwen Deng

ADT→GEX: Novel

Novel Software Systems, code
Nikolay Russkikh, Gleb Ryazantsev, Igor I

Overall: DANCE

Michigan State University, code
Hongzhi Wen, Jiayuan Ding, Wei Jin, Xiaoyan Li, Zhaoheng Li, Yiqi Wang, Haoyu Han, Yanyi Ding, Xiaochun Ni, Yu Lei, Yuying Xie, Jiliang Tang

Task 2 - Match Modality

Given collections of cells in each modality, match profiles that originated from the same cell.

Winner in all categories: CLUE

Peking University, University of Washington, code
Zhi-Jie Cao, Xinming Tu, Chen-Rui Xia

Task 3 - Joint Embedding

Learn a low dimensional embedding of both modalities that preserves biology and removes batch effects.

Multiome, pre-trained & CITE, pre-trained: Amateur

Stanford University, Tsinghua University, code
Qiao Liu, Wanwen Zeng, Chencheng Xu

Multiome, online: Living Systems Lab

KAUST, code
Sumeer Khan, Robert Lehman, Xabier Martinez De Morentin, Minxing Pang, Aidyn Ubingazhibov, David Gomez-Cabrero, Narsis Kiani, Jesper Tegner

CITE, online: Dengkw

University of Michigan, code
Kaiwen Deng


The competition will focus on three tasks:

  1. Predicting one modality from another - Given one modality, predict the other.
  2. Modality alignment - Re-align multi-modal data measured in the same cells when we hide the correspondences between the two measurements.
  3. Learning a joint embedding - Learn a meaningful embedding of multimodal data measured in the same cells that preserves expert-annotated data organization

To get started:

  1. Read about the tasks and the submission quickstart on our competition website
  2. View the starter kit contents
  3. Explore the data and prototype methods for free on Saturn Cloud (Optional)
  4. Implement your method and generate a submission!

Full details about all the above can be found in the Competition Documentation.

If you ever have any questions, please feel free to reach out on the Open Problems Discord Server.

If you’d like to get updates, please fill out the interest list below:


Card image cap

Cellarity is redefining drug discovery by targeting cell behaviors as opposed to individual proteins.
Learn more.

CZI leverages technology and collaboration to accelerate progress in science, education and community work.
Learn more.

Card image cap
Card image cap
Card image cap


Organizing Team

In alphabetical order

  • Daniel Burkhardt (Primary contact) is a Machine Learning Scientist at Cellarity. He is a core organizer of the Open Problems in Single-Cell Analysis. Daniel completed his PhD in Genetics at Yale University with a specialization in machine learning under the supervision of Smita Krishnaswamy.

  • Jonathan M. Bloom leads Perturbation Biology and Machine Learning at Cellarity. As an Institute Scientist at the Broad Institute, he cofounded the Hail team, Learning Meaningful Representations of Life, and Models, Inference & Algorithms, while contributing to sequencing benchmarks, statistical genetics, ML theory, and neuroscience efforts. Jonathan completed his PhD at Columbia and Moore Instructorship and NSF postdoc at MIT, conducting research on algebraic topology and geometry while reimagining the teaching of statistics.

  • Robrecht Cannoodt is a postdoctoral researcher in Saeys lab at VIB-UGent and a computer science consultant at Data Intuitive. During his PhD, his research was focused mainly on unsupervised learning in single-cell omics, more specifically on trajectory inference. Robrecht oversees infrastructure development for building collaborative, scalable and reproducible pipelines using NextFlow and Viash.

  • Smita Krishnaswamy is an associate professor of Computer Science and Genetics at Yale University. Her research focuses on unsupervised learning, using data geometry, topology and deep learning methods for big biomedical data. She is a faculty advisor for Open Problems in Single-Cell Analysis.

  • Malte Lücken is a senior postdoctoral researcher and team leader for integrative single-cell analysis in the Machine Learning Group of Prof. Fabian Theis. His research focuses on evaluating computational methods for single-cell analysis and investigating how environmental stimuli and natural variation manifests on the level of single cells.

  • Debora Marks is an Associate Professor in the Department of Systems Biology at Harvard Medical School. Over the past five years, her lab has developed methods that accelerate structural biology with applications to cryoEM, crystallography, protein design and computed 3D structures of thousands of proteins with unknown folds, protein complexes and RNA interactions, as well as flexible, dynamic and even disordered protein ensembles.

  • Angela Pisco is the Associate Director of Bioinformatics at the Chan Zuckerberg Biohub. Her group is responsible for generating and annotated atlas-scale single-cell datasets across biological systems.

  • Bastian Rieck is a senior assistant in the Machine Learning and Computational Biology Lab of Prof. Dr. Karsten Borgwardt at ETH Zurich. His main research interests are manifold learning approaches based on concepts from topological data analysis, with a focus on personalized medicine and computational biology.

  • Jian Tang is currently an assistant professor at HEC Montreal (business school of University of Montreal) and a core faculty member at Mila-Quebec AI Institute. His research focuses on graph representation learning, graph neural networks, drug discovery, and knowledge graphs.

  • Fabian Theis is the Professor of Mathematical Modeling of Biological Systems at the Department of Mathematics of the Technical University of Munich; the director of the Institute of Computational Biology, Helmholtz Zentrum München; the scientific director of the Helmholtz Artificial Intelligence Cooperation Unit; and a faculty member of the Wellcome Trust Sanger Institute, Cambridge, UK. His lab develops innovative methods for single-cell analysis.

  • Alexander Tong is a PhD candidate in Computer Science in the lab of Prof. Smita Krishnaswamy at Yale University. His research focuses on manifold learning and optimal transport with a focus on single-cell data. Alexander will be responsible for supporting the comparison infrastructure and evaluating submissions.

  • Guy Wolf is an assistant professor in the Department of Mathematics and Statistics at the Université de Montréal and a Core Member of Mila—the Québec AI Institute. His research focuses on manifold learning, representation learning, and geometric deep learning for exploratory data analysis, including methods for dimensionality reduction, visualization, denoising, data augmentation, and coarse graining, with particular focus on applications in biomedical data exploration.