# Dimensionality reduction for visualisation

## The task

Dimensionality reduction is one of the key challenges in single-cell data representation. Routine single-cell RNA sequencing (scRNA-seq) experiments measure cells in roughly 20,000-30,000 dimensions (i.e., features - mostly gene transcripts but also other functional elements encoded in mRNA such as lncRNAs). Since its inception, scRNA-seq experiments have been growing in terms of the number of cells measured. Originally, cutting-edge SmartSeq experiments would yield a few hundred cells, at best. Now, it is not uncommon to see experiments that yield over 100,000 cells or even > 1 million cells.

Each *feature* in a dataset functions as a single dimension. While each of the ~30,000
dimensions measured in each cell contribute to an underlying data structure, the overall
structure of the data is challenging to display in few dimensions due to data sparsity
and the *“curse of
dimensionality”* (distances in
high dimensional data don’t distinguish data points well). Thus, we need to find a way
to dimensionally reduce the
data for visualization and interpretation.

## The metrics

**Distance correlation**: the Spearman correlation between ground truth distances in the high-dimensional data and Euclidean distances in the dimension-reduced data, invariant to scalar multiplication.*Distance correlation*computes high-dimensional distances in Euclidean space, while*Distance correlation (spectral)*computes diffusion distances (i.e. Euclidean distances on the Laplacian Eigenmap).**Trustworthiness**: a measurement of similarity between the rank of each point’s nearest neighbors in the high-dimensional data and the reduced data (Venna & Kaski, 2001).**Density preservation**: similarity between local densities in the high-dimensional data and the reduced data (Narayan, Berger & Cho, 2020)**NN Ranking**: a set of metrics from pyDRMetrics relating to the preservation of nearest neighbors in the high-dimensional data and the reduced data.

Dataset | Best Method | Paper | Code |
---|---|---|---|

5k Peripheral blood mononuclear cells | densMAP PCA (logCP10k) | v0.5.3 | |

Mouse hematopoietic stem cell differentiation | densMAP (logCP10k) | v0.5.3 | |

Mouse myeloid lineage differentiation | densMAP (logCP10k) | v0.5.3 |