Several recently described technologies allow for simultaneous measurement of different aspects of cell state. For example, sci-CAR jointly profiles RNA expression and chromatin accessibility on the same cell and CITE-seq measures surface protein abundance and RNA expression from each cell. However, these joint profiling methods have several tradeoffs compared to unimodal measurements.
Joint methods can be more expensive or lower throughput or more noisy than measuring a single modality at a time. Therefore it is useful to develop methods that are capable of integrating measurements of the same biological system but obtained using different technologies.
Here the goal is to learn a latent space where observations from the same cell acquired using different modalities. A perfect result has each of the paired observations sharing the same coordinates in the latent space.
Metrics for multimodal data integration aim to characterize how well the aligned datasets correspond to the ground truth.