graph LR classDef component fill:#decbe4,stroke:#333 classDef anndata fill:#d9d9d9,stroke:#333 classDef group fill:#f6f6f6,stroke:#333 normalization:::group dataset_processors:::group raw_dataset["Raw dataset"]:::anndata common_dataset[Common<br/>dataset]:::anndata dataset_loader[/Dataset<br/>loader/]:::component subgraph normalization [Normalization methods] log_cpm[/"Log CPM"/]:::component l1_sqrt[/"L1 sqrt"/]:::component log_scran_pooling[/"Log scran<br/>pooling"/]:::component sqrt_cpm[/Sqrt CPM/]:::component end subgraph dataset_processors[Dataset processors] pca[/PCA/]:::component hvg[/HVG/]:::component knn[/KNN/]:::component end dataset_loader --> raw_dataset --> log_cpm & l1_sqrt & log_scran_pooling & sqrt_cpm --> pca --> hvg --> knn --> common_dataset
src/datasets/
The dataset processing pipeline uses dataset loaders to create raw dataset files (Figure 1). The raw dataset files are then processed to generate common dataset files. Common dataset files are used in one or more tasks.
Dataset loader (
src/datasets/loaders
): This folder contains components to load and format datasets for various sources.Dataset normalization (
src/datasets/normalization
): This folder contains various dataset normalization methods.Dataset processors (
src/datasets/processors
): This folder contains components for processing datasets, such as computing a KNN, PCA, HVG or subsetting.Dataset file and component formats (
src/datasets/api
): This folder contains specifications for dataset file formats and component interfaces.Resource generation scripts (
src/common/resources_scripts
): This folder contains scripts for generating the datasets using the dataset loaders, normalization methods and processors.Test resource generation scripts (
src/common/resources_test_scripts
): This folder contains scripts for generating test resources.