graph LR classDef component fill:#decbe4,stroke:#333 classDef anndata fill:#d9d9d9,stroke:#333 loader[/Dataset<br/>loader/]:::component dataset[Dataset]:::anndata method[/Method/]:::component output[Output]:::anndata metric[/Metric/]:::component score[Score]:::anndata loader --> dataset --- method --> output --- metric --> score
Concepts
Every component in OpenProblems, including dataset loaders, dataset processors, methods, and metrics, is a Viash component. To assemble these components into flexible benchmarking pipelines, AnnData serves as the standard file format for both input and output files of a component.
AnnData file format
AnnData, short for “Annotated Data”, is a file format for handling annotated, high-dimensional biological data (Virshup et al. 2021). In the context of OpenProblems, AnnData is used as the standard data format for both input and output files of components. This ensures a consistent and seamless exchange of data between different components of the benchmarking pipelines, allowing developers to focus on the core functionality of their components without worrying about data format compatibility.
X
, e.g. gene expression values), annotations of observations (obs
, e.g. cell metadata), annotations of variables (var
, e.g. gene metadata), and unstructured annotations (uns
). This organization makes it easy to work with complex datasets while maintaining data integrity and ensuring a standardized structure across different components.Files with the .h5ad
extension represent AnnData objects stored in an HDF5 file. AnnData objects can be opened in Python using the anndata.read_h5ad()
function, and in R using the anndata::read_h5ad()
function. Technically it can be read in any language using an HDF5 library.
Viash component
A Viash component is a combination of a code block or script and a small amount of metadata that makes it easy to generate pipeline modules, facilitating the separation of component functionality from the pipeline workflow. This enables developers to create reusable, modular, and robust components for OpenProblems, focusing on the specific functionality without having to worry about the chosen pipeline framework.
A Viash component consists of three main parts: a Viash config, a script, and one or more1 unit tests (Figure 3). Check out the Viash cheat sheet for more information on how to interact with Viash components.
References
Footnotes
Of course can choose not to write any tests at all, though we highly encourage you to add tests to your component.↩︎