Galaxy (computational biology)

From Wikipedia, the free encyclopedia

Galaxy (computational biology)

Galaxy[2] is an open-source scientific workflow system designed to make research accessible, reproducible, and transparent. Originally developed for computational biology, Galaxy has evolved into a domain-agnostic framework utilized across various scientific disciplines. Some examples include: data science,[3] microbiology,[4] medical research,[5] neuroscience,[6] virology[7] and outbreak detection,[8] food safety,[9] wastewater tracking and antibiotic resistance,[10] long-read[11] and high-throughput[12] genomic sequencing, bioinformatics,[13] and other scientific disciplines.

Quick Facts Developer(s), Initial release ...
Close

For many computational biology processes, Galaxy accommodates scientists from newcomers to professionals. It supports code-free workflow development, GUI workflow visualization as well as command-line interface access, scheduled jobs, and cloud infrastructure management. It supports data persistence and data publishing to facilitate collaboration. The freely hosted services of UseGalaxy (United States, EU, and Australia) support a global community of over 500,000 registered users through the Galaxy Hub which holds events, an annual conference, and hundreds of free online tutorials at the Galaxy Training Network.

Use

Summarize
Perspective

Use Areas

Galaxy was originally written for biological data analysis, particularly genomics. Tools on the platform are used for gene expression, genome assembly, epigenomics, transcriptomics, and host of other disciplines, in the life sciences. The wide set of available tools has been greatly expanded over the years because the platform is domain agnostic and can be applied to any scientific domain as a general bioinformatics workflow management system.[14] For example, Galaxy servers and tools exist for image analysis,[15] machine learning and AI,[16] computational chemistry[17] and drug design,[18] spaceflight[19] and astronomy, cheminformatics,[20] proteomics, social science,[21] and linguistics.

Use Cases

Here is a selection of a few recent use cases:

In 2021, members of the Galaxy team published a paper in Nature Biotechnology[22] detailing a method for tracking COVID-19 variants using Galaxy's scheduled jobs feature, Planemo, which is capable of processing and monitoring hundreds of thousands of samples.

In 2021, Galaxy partnered with the Vertebrate Genomes Project (VGP) which "aims to generate near error-free reference genome assemblies"[23] for approximately 70,000 vertebrate species.

In 2022, Goecks Lab introduced a scalable and modular pipeline, MCMICRO, which is capable of processing multiplexed imaging critical for analyzing complex tissue in cancer research and for improving precision oncology.[24]

See the Galaxy Google Scholar page and the Galaxy Zotero Group for additional key papers and citations

Project Goals

Summarize
Perspective

Galaxy is "an open, web-based platform for performing accessible, reproducible, and transparent genomic science."[25]

Accessibility

Computational biology is a specialized domain that often requires knowledge in computer programming. Galaxy provides biomedical researchers access to computational biology without requiring expertise in computer programming.[26][27] To achieve this, Galaxy prioritizes a user-friendly interface[28] over the flexibility to construct highly complex workflows. This design choice makes it relatively easy to build typical analyses, but more difficult to build complex workflows that include, for example, looping constructs. (See Apache Taverna for an example of a data-driven workflow system that supports looping.[29])

Reproducibility

Reproducibility is fundamental to science: when scientific results are published, they should include sufficient information for others to replicate the experiment and obtain the same results. In recent years, significant efforts have been made to extend this standard beyond traditional laboratory experiments (the "wet lab") to computational research (the "dry lab"). However, achieving reproducibility in computational experiments has proven more challenging than initially anticipated.[30]

Galaxy supports reproducibility by systematically capturing all essential details of a computational analysis, ensuring that it can be precisely replicated at any point in the future. This includes recording all input, intermediate, and final datasets, as well as the parameters used and the exact sequence of analytical steps.

Transparency

Transparency is essential in science, as it enables verification, fosters collaboration, and accelerates discoveries by allowing others to build upon existing work. Galaxy promotes transparency in scientific research by allowing researchers to share their Galaxy Objects either publicly or with specific individuals. Shared items can be thoroughly examined, rerun as needed, and copied or modified to explore new hypotheses.

Features

Summarize
Perspective

Tools

Galaxy is extensible, as new command line tools can be integrated and shared within the Galaxy ToolShed.[31] An example of extending Galaxy is Galaxy-P from the University of Minnesota Supercomputing Institute, which is customized as a data analysis platform for mass spectrometry-based proteomics.[32]
Galaxy provides a web interface for many text manipulation tools, enabling researchers to do their own custom reformatting and manipulation without having to know computer programming or shell scripting. Galaxy includes interval manipulation tools for doing set theoretic operations (e.g. intersection, union, ...) on intervals. Many biological file formats include genomic interval data (a frame of reference, e.g., chromosome or contig name, and start and stop positions), allowing these data to be integrated.

Galaxy Objects: Datasets, Workflows, Histories, and Pages

Galaxy objects are anything that can be saved, persisted, and shared in Galaxy:

Datasets

Datasets includes any input, intermediate, or output dataset, used or produced in an analysis. Galaxy's data integration platform supports file uploads from the user's computer, by URL, and directly from many online external resources (such as the UCSC Genome Browser, BioMart and InterMine). Galaxy supports a range of widely used biological data formats, translation between those formats, and data conversions (see Tools).

Workflows

Workflows are computational analyses that specify all the steps (and parameters) in the analysis, but none of the data. Workflows are used to run the same analysis against multiple sets of input data.
Galaxy is a scientific workflow system. These systems provide a means to build multi-step computational analyses akin to a recipe. They typically provide a graphical user interface[28] for specifying what data to operate on, what steps to take, and what order to do them in.

Histories

Histories are computational analyses (recipes) run with specified input datasets, computational steps and parameters. Histories include all intermediate and output datasets as well.

Pages

Pages enables the creation of a virtual paper that describes the how and why of the overall experiment. Histories, workflows and datasets can include user-provided annotation. Tight integration of Pages with Histories, Workflows, and Datasets supports this goal.

Availability

Galaxy is available:

  1. As a free public web server,[33] supported by the Galaxy Project.[34] This server includes many bioinformatics tools that are widely useful in many areas of genomics research. Users can create logins, and save histories, workflows, and datasets on the server. These saved items can also be shared with others.
  2. As open-source software that can be downloaded, installed and customized to address specific needs.[35] Galaxy can be installed locally or using a computing cloud.[36]
  3. Public web servers hosted by other organizations.[14] Several organizations with their own Galaxy installation have also opted to make those servers available to others.

Implementation

Galaxy is open-source software implemented using the Python programming language. It is developed by the Galaxy team[37] at Penn State, Johns Hopkins University, Oregon Health & Science University, Moffitt Cancer Center, Cleveland Clinic, University of Freiburg (Galaxy EU), and Galaxy Australia[38][39] with community contributions from around the world.

Community

Summarize
Perspective

The Galaxy community is a global, interdisciplinary network of researchers, educators, and developers dedicated to making bioinformatics accessible, reproducible, and collaborative. With contributors from academia, government, and industry, the community actively develops new tools, maintains public Galaxy servers, and fosters open science initiatives. The Galaxy project has mailing lists,[40] a community hub,[34] and annual meetings.[41]

A key resource within this ecosystem is the Galaxy Training Network (GTN),[42] which provides comprehensive, open-access training materials for bioinformatics workflows and computational biology. The GTN offers hands-on tutorials covering a wide range of topics, from sequencing data analysis to machine learning applications in genomics. These materials are regularly updated and include step-by-step instructions for using Galaxy along with interactive Jupyter-based lessons, making them valuable for both self-guided learning and structured coursework. Through events like the annual Galaxy Community Conference (GCC), the annual Galaxy Training Academy, hackathons, and collaborative projects, the Galaxy community continues to expand its impact, ensuring that cutting-edge bioinformatics tools remain accessible to all researchers, regardless of their computational expertise. See the Galaxy events page for more information

See also

References

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.