OpenProblems NeurIPS2021 Multiome

Single-cell Multiome (GEX+ATAC) data collected from bone marrow mononuclear cells of 12 healthy human donors.

openproblems_neurips2021

Info

openproblems_neurips2021/bmmc_multiome
Luecken et al. (2021)
7.78 GiB
14-02-2024
69249 × 13431

Used in

No related benchmarks found.

Description

Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X Multiome Gene Expression and Chromatin Accessibility kit. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.

Preview

dataset_mod1 is an AnnData object with n_obs × n_vars = 69249 × 13431 with slots:

dataset_mod2 is an AnnData object with n_obs × n_vars = 69249 × 116490 with slots:

Reference

Dataset mod1

Name Description Type Data type Size
obs
batch A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. vector category 69249
cell_type Classification of the cell type based on its characteristics and function within the tissue or organism. vector category 69249
size_factors The size factors created by the normalisation method, if any. vector float32 69249
var
feature_id Unique identifier for the feature, usually a ENSEMBL gene id. vector object 13431
feature_name A human-readable name for the feature, usually a gene symbol. vector object 13431
hvg Whether or not the feature is considered to be a ‘highly variable gene’ vector bool 13431
hvg_score A ranking of the features by hvg. vector float64 13431
obsm
X_svd The resulting SVD embedding. densematrix float32 69249 × 100
layers
counts Raw counts sparsematrix float32 69249 × 13431
normalized Normalised expression values sparsematrix float32 69249 × 13431
uns
dataset_description Long description of the dataset. atomic str 1
dataset_id A unique identifier for the dataset. This is different from the obs.dataset_id field, which is the identifier for the dataset from which the cell data is derived. atomic str 1
dataset_name A human-readable name for the dataset. atomic str 1
dataset_organism The organism of the sample in the dataset. atomic str 1
dataset_reference Bibtex reference of the paper in which the dataset was published. atomic str 1
dataset_summary Short description of the dataset. atomic str 1
dataset_url Link to the original source of the dataset. atomic str 1
normalization_id Which normalization was used atomic str 1

Dataset mod2

Name Description Type Data type Size
obs
batch A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. vector category 69249
cell_type Classification of the cell type based on its characteristics and function within the tissue or organism. vector category 69249
size_factors The size factors created by the normalisation method, if any. vector float32 69249
var
feature_id Unique identifier for the feature, usually a ENSEMBL gene id. vector object 116490
feature_name A human-readable name for the feature, usually a gene symbol. vector object 116490
hvg Whether or not the feature is considered to be a ‘highly variable gene’ vector bool 116490
hvg_score A ranking of the features by hvg. vector float64 116490
obsm
X_svd The resulting SVD embedding. densematrix float32 69249 × 100
layers
counts Raw counts sparsematrix float32 69249 × 116490
normalized Normalised expression values sparsematrix float32 69249 × 116490
uns
dataset_description Long description of the dataset. atomic str 1
dataset_id A unique identifier for the dataset. This is different from the obs.dataset_id field, which is the identifier for the dataset from which the cell data is derived. atomic str 1
dataset_name A human-readable name for the dataset. atomic str 1
dataset_organism The organism of the sample in the dataset. atomic str 1
dataset_reference Bibtex reference of the paper in which the dataset was published. atomic str 1
dataset_summary Short description of the dataset. atomic str 1
dataset_url Link to the original source of the dataset. atomic str 1
normalization_id Which normalization was used atomic str 1

References

Luecken, Malte, Daniel Burkhardt, Robrecht Cannoodt, Christopher Lance, Aditi Agrawal, Hananeh Aliee, Ann Chen, et al. 2021. “A Sandbox for Prediction and Integration of DNA, RNA, and Proteins in Single Cells.” In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, edited by J. Vanschoren and S. Yeung. Vol. 1. Curran. https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/file/158f3069a435b314a80bdcb024f8e422-Paper-round2.pdf.