src/datasets/

The dataset processing pipeline uses dataset loaders to create raw dataset files (Figure 1). The raw dataset files are then processed to generate common dataset files. Common dataset files are used in one or more tasks.

graph LR
  classDef component fill:#decbe4,stroke:#333
  classDef anndata fill:#d9d9d9,stroke:#333
  classDef group fill:#f6f6f6,stroke:#333
  normalization:::group
  dataset_processors:::group
  raw_dataset["Raw dataset"]:::anndata
  common_dataset[Common<br/>dataset]:::anndata
  dataset_loader[/Dataset<br/>loader/]:::component
  subgraph normalization [Normalization methods]
    log_cpm[/"Log CPM"/]:::component
    l1_sqrt[/"L1 sqrt"/]:::component
    log_scran_pooling[/"Log scran<br/>pooling"/]:::component
    sqrt_cpm[/Sqrt CPM/]:::component
  end
  subgraph dataset_processors[Dataset processors]
    pca[/PCA/]:::component
    hvg[/HVG/]:::component
    knn[/KNN/]:::component
  end
  dataset_loader --> raw_dataset --> log_cpm & l1_sqrt & log_scran_pooling & sqrt_cpm --> pca --> hvg --> knn --> common_dataset
Figure 1: Overview of the dataset processing workflow. Legend: Grey rectangles are AnnData .h5ad files, purple rhomboids are Viash components.