OpenProblems NeurIPS2021 CITE-Seq

Single-cell CITE-Seq (GEX+ADT) data collected from bone marrow mononuclear cells of 12 healthy human donors.

openproblems_neurips2021

Info

openproblems_neurips2021/bmmc_cite
Luecken et al. (2021)
2.2 GiB
14-02-2024
90261 × 13953

Used in

No related benchmarks found.

Description

Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X 3 prime Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.

Preview

dataset_mod1 is an AnnData object with n_obs × n_vars = 90261 × 13953 with slots:

dataset_mod2 is an AnnData object with n_obs × n_vars = 90261 × 134 with slots:

Reference

Dataset mod1

Name Description Type Data type Size
obs
batch A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. vector category 90261
cell_type Classification of the cell type based on its characteristics and function within the tissue or organism. vector category 90261
size_factors The size factors created by the normalisation method, if any. vector float32 90261
var
feature_id Unique identifier for the feature, usually a ENSEMBL gene id. vector object 13953
feature_name A human-readable name for the feature, usually a gene symbol. vector object 13953
hvg Whether or not the feature is considered to be a ‘highly variable gene’ vector bool 13953
hvg_score A ranking of the features by hvg. vector float64 13953
obsm
X_svd The resulting SVD embedding. densematrix float32 90261 × 100
layers
counts Raw counts sparsematrix float32 90261 × 13953
normalized Normalised expression values sparsematrix float32 90261 × 13953
uns
dataset_description Long description of the dataset. atomic str 1
dataset_id A unique identifier for the dataset. This is different from the obs.dataset_id field, which is the identifier for the dataset from which the cell data is derived. atomic str 1
dataset_name A human-readable name for the dataset. atomic str 1
dataset_organism The organism of the sample in the dataset. atomic str 1
dataset_reference Bibtex reference of the paper in which the dataset was published. atomic str 1
dataset_summary Short description of the dataset. atomic str 1
dataset_url Link to the original source of the dataset. atomic str 1
normalization_id Which normalization was used atomic str 1

Dataset mod2

Name Description Type Data type Size
obs
batch A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. vector category 90261
cell_type Classification of the cell type based on its characteristics and function within the tissue or organism. vector category 90261
size_factors The size factors created by the normalisation method, if any. vector float32 90261
var
feature_id Unique identifier for the feature, usually a ENSEMBL gene id. vector object 134
feature_name A human-readable name for the feature, usually a gene symbol. vector object 134
hvg Whether or not the feature is considered to be a ‘highly variable gene’ vector bool 134
hvg_score A ranking of the features by hvg. vector float64 134
obsm
X_svd The resulting SVD embedding. densematrix float32 90261 × 100
layers
counts Raw counts sparsematrix float32 90261 × 134
normalized Normalised expression values sparsematrix float32 90261 × 134
uns
dataset_description Long description of the dataset. atomic str 1
dataset_id A unique identifier for the dataset. This is different from the obs.dataset_id field, which is the identifier for the dataset from which the cell data is derived. atomic str 1
dataset_name A human-readable name for the dataset. atomic str 1
dataset_organism The organism of the sample in the dataset. atomic str 1
dataset_reference Bibtex reference of the paper in which the dataset was published. atomic str 1
dataset_summary Short description of the dataset. atomic str 1
dataset_url Link to the original source of the dataset. atomic str 1
normalization_id Which normalization was used atomic str 1

References

Luecken, Malte, Daniel Burkhardt, Robrecht Cannoodt, Christopher Lance, Aditi Agrawal, Hananeh Aliee, Ann Chen, et al. 2021. “A Sandbox for Prediction and Integration of DNA, RNA, and Proteins in Single Cells.” In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, edited by J. Vanschoren and S. Yeung. Vol. 1. Curran. https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/file/158f3069a435b314a80bdcb024f8e422-Paper-round2.pdf.