Batch integration feature

The task

This is a sub-task of the overall batch integration task. Batch (or data) integration integrates datasets across batches that arise from various biological and technical sources. Methods that integrate batches typically have three different types of output: a corrected feature matrix, a joint embedding across batches, and/or an integrated cell-cell similarity graph (e.g., a kNN graph). This sub-task focuses on all methods that can output feature matrices. Other sub-tasks for batch integration can be found for:

This sub-task was taken from a benchmarking study of data integration methods.

The metrics

Metrics for batch integration (feature) measure how well feature-level information is batch corrected. This is only done on by capturing biological variance conservation. Further metrics for batch correction and biological variance conservation that are calculated on lower dimensional feature spaces extrapolated from corrected feature outputs can be found in the batch integration embed and graph tasks.

  • HVG conservation: This metric computes the average percentage of overlapping highly variable genes per batch before and after integration.
DatasetBest MethodPaperCode
Immune (by batch)SCALEX (hvg)


Pancreas (by batch)Combat (hvg/unscaled)