Philosophy

The main philosophy behind the OpenProblems project and its goals.

OpenProblems aims to revolutionize benchmarking in single-cell omics, addressing the critical issues of poor extensibility and maintainability seen in most studies (Sonrel et al. 2023). Our approach prioritizes several key areas to elevate the standards and practices within the field:

Sonrel, Anthony, Almut Luetge, Charlotte Soneson, Izaskun Mallona, Pierre-Luc Germain, Sergey Knyazev, Jeroen Gilis, et al. 2023. “Meta-Analysis of (Single-Cell Method) Benchmarks Reveals the Need for Extensibility and Interoperability.” Genome Biology 24 (1). https://doi.org/10.1186/s13059-023-02962-5.

By focusing on these key areas, OpenProblems aspires not just to develop new best practices in single-cell omics but to foster a more dynamic, inclusive, and collaborative scientific community.

Extensibility

To define benchmarking tasks, OpenProblems incorporates the principles of the Common Task Framework as outlined by Donoho (2017). As such, each benchmarking task consists of three main elements – Datasets, Methods and Metrics – which are linked together in a larger workflow (Figure 1). Additionally, the nodes in this workflow represent either a file, or a computational component. The frameworks used for storing data structures – AnnData – and defining computational components – Viash – are chosen specifically to improve extensibility within the OpenProblems framework.

Donoho, David. 2017. “50 Years of Data Science.” Journal of Computational and Graphical Statistics 26 (4): 745–66. https://doi.org/10.1080/10618600.2017.1384734.
graph LR
  loader[/Dataset<br/>loader/]:::component
  dataset[Dataset]:::anndata
  method[/Method/]:::component
  output[Output]:::anndata
  metric[/Metric/]:::component
  score[Score]:::anndata
  loader --> dataset --- method --> output --- metric --> score
Figure 1: The structure of an OpenProblems task. Legend: Grey rectangles are HDF5-backed AnnData (.h5ad) files. Purple rhomboids are Viash components.

AnnData, short for “Annotated Data”, is a file format designed for handling annotated, high-dimensional biological data (Virshup et al. 2021). In OpenProblems, AnnData serves as the standard data format for both input and output files of components, ensuring a consistent and seamless exchange of data between different components of the benchmarking pipelines.

Virshup, Isaac, Sergei Rybakov, Fabian J. Theis, Philipp Angerer, and F. Alexander Wolf. 2021. “Anndata: Annotated Data.” https://doi.org/10.1101/2021.12.16.473007.
Cannoodt, Robrecht, Hendrik Cannoodt, Eric Van de Kerckhove, Andy Boschmans, Dries De Maeyer, and Toni Verbeiren. 2021. “Viash: From Scripts to Pipelines.” arXiv. https://doi.org/10.48550/ARXIV.2110.11494.

Viash is a meta-framework which allows generating modular Nextflow components and workflows from Python and R scripts (Cannoodt et al. 2021). Viash components are used in OpenProblems for creating dataset loaders, dataset processors, methods, and metrics, enabling developers to focus on the core functionality of their components without worrying about the chosen pipeline framework (i.e. Nextflow).

Specifications for the format of AnnData files and the interface of the Viash component can be found in the api directory of each task, and are rendered in the readme of each task (Example).

Note that this task structure is an oversimplification – both Dataset and Task workflows contain additional steps to safeguard against common pitfalls.

Reusability

We want contributors to be able to concentrate on the functionality of benchmarks, rather than the complexities of computational infrastructure. To this end, the technologies used in OpenProblems are chosen to maximize reproducibility, reusability and interoperability, while minimizing the amount of boilerplate code required to create new benchmarks. Amongst others, these technologies include Viash for defining computational components, AnnData for storing data, Nextflow for defining workflows, Docker for containerization, and standardized file formats and component interfaces for interoperability.

By leveraging these frameworks and tools, OpenProblems sets a high standard for reusability and efficiency in computational benchmarking, enabling developers to focus on innovation rather than busywork.

Community-driven

OpenProblems is deeply committed to being a community-driven platform, where we aim to create a dynamic feedback loop that propels the field of single-cell omics forward. Community members can actively contribute to the development and refinement of single-cell omics benchmarks, which are then run by our computational infrastructure. Once the results are available from our website, community members can use it to derive new best practices and steer future research for new computational methods.

To help foster community participation and collaborative developments, we organise hackathons, competitions and weekly working meetings. See the Events page for more information on upcoming events.

Inclusiveness

We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.

We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.

Our full Code of Conduct is adapted from the Contributor Covenant, version 2.1.