UCSC Genome Browser

The UCSC Genome Browser is an online and downloadable genome browser hosted by the University of California, Santa Cruz (UCSC).^[2]^[3]^[4] It is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. The Browser is a graphical viewer optimized to support fast interactive performance and is an open-source, web-based tool suite built on top of a MySQL database for rapid visualization, examination, and querying of the data at many levels. The Genome Browser Database, browsing tools, downloadable data files, and documentation can all be found on the UCSC Genome Bioinformatics website.

Quick facts Content, Description ...

The UCSC Genome Browser

Content
Description	The UCSC Genome Browser
Contact
Research center	University of California Santa Cruz
Laboratory	Center for Biomolecular Science and Engineering, Baskin School of Engineering
Primary citation	Navarro Gonzalez & al. (2021)^[1]
Access
Website	genome.ucsc.edu

Remove ads

History

Summarize

Perspective

Origins and early development (2000–2003)

The UCSC Genome Browser was developed in 2000 by graduate student Jim Kent and Professor David Haussler at the University of California, Santa Cruz (UCSC), to provide public access to the draft human genome sequence produced by the Human Genome Project.^[5] On July 7, 2000, UCSC released the first working draft of the human genome online, accompanied by an initial version of the Genome Browser.^[5] This release enabled researchers worldwide to access and explore the genome data interactively. The project received early funding from the Howard Hughes Medical Institute (HHMI) and the National Human Genome Research Institute (NHGRI).^[6] In 2002, the team published a detailed description of the Genome Browser in Genome Research, outlining its MySQL-based database and web interface.^[7] The browser featured various aligned annotation tracks, including gene predictions, mRNA/EST alignments, and SNP markers, all presented in a scrollable view.^[7] Users could also add custom tracks to visualize their data alongside official annotations. In that same year, the browser expanded to include the mouse genome, facilitating comparative genomics studies. Tools like BLAT (BLAST-like alignment tool) and LiftOver were introduced to enhance sequence alignment and coordinate conversion between different genome assemblies.^[8]

Expansion and feature enhancements (2004–2010)

Between 2004 and 2010, the UCSC Genome Browser incorporated numerous additional genomes, including those of rat, chicken, dog, and chimpanzee, among others.^[9] The development of chain and net alignment algorithms allowed for whole-genome alignments between species, and the Conservation track visualized evolutionary conserved elements.^[10] To accommodate the influx of data from new genomic technologies, UCSC introduced Genome Graphs in 2007–2008, enabling users to plot genome-wide datasets, such as association study p-values, across entire genomes.^[11] The browser also implemented the BigBed and BigWig binary data formats in 2010, facilitating efficient visualization of large-scale sequencing datasets.^[12]

Further integration with major genomic projects (2011–2015)

In 2011, UCSC launched Track Data Hubs, allowing external researchers to integrate their annotation tracks into the Genome Browser via remote URLs.^[13] UCSC played a pivotal role in the ENCODE (Encyclopedia of DNA Elements) project since its launch in 2003. This new feature significantly enhanced how researchers could interact with and visualize large-scale genomic datasets. The browser hosted a vast array of functional genomics data generated by ENCODE, including ChIP-seq, RNA-seq, and DNase hypersensitivity assays.^[14] The browser also integrated data from the 1000 Genomes Project, providing comprehensive access to human genetic variation data.^[15] In 2013, UCSC partnered with the GENCODE project to adopt its high-quality gene annotations. In 2015, the GENCODE gene set (GRCh38/hg38 assembly) replaced UCSC's in-house track as the default gene set of the human genome browser.^[16]

Recent developments and recognition (2016–present)

Beginning in 2016, the UCSC Genome Browser expanded its capabilities by integrating clinical and variant datasets, including those from ClinVar and various cancer genomics resources.^[17] In 2017, UCSC launched the UCSC Cell Browser, a companion platform designed to handle single-cell sequencing datasets and spatial transcriptomics.^[18] The browser has also integrated data from the Genotype-Tissue Expression (GTEx) project, providing visualization resources for gene expression across various human tissues.^[19] The browser now hosts over 180 genome assemblies from more than 100 species, including the fully telomere-to-telomere human genome assembly (T2T-CHM13) released by the T2T Consortium in 2022.^[20] Funding for the UCSC Genome Browser has transitioned to rely exclusively on NIH grants, with continued support from the NHGRI. In 2022, the browser was recognized as one of the inaugural Global Core Biodata Resources, highlighting its critical role in life science research and ensuring prioritized long-term funding.^[5] As of 2025, the UCSC Genome Browser continues to serve as an essential, freely accessible tool for researchers worldwide, accommodating daily usage by tens of thousands and regularly updating with new genomic data and functionalities.^[5]

Remove ads

Genomes

Summarize

Perspective

In the years since its inception, the UCSC Browser has expanded to accommodate genome sequences of all vertebrate species and selected invertebrates for which high-coverage genomic sequences is available,^[21] now including 108 species. High coverage is necessary to allow overlap to guide the construction of larger contiguous regions. Genomic sequences with less coverage are included in multiple-alignment tracks on some browsers, but the fragmented nature of these assemblies does not make them suitable for building full featured browsers. (more below on multiple-alignment tracks). The species hosted with full-featured genome browsers are shown in the table.^[22] It is important to note that updates to this section are dependent on new genome releases from sequencing centers and that explains the reason as to why there was a 2 year difference between the last two genome additions.

More information great apes, non-ape primates ...

Species
great apes	baboon, bonobo, chimpanzee, gibbon, gorilla, human, orangutan
non-ape primates	bushbaby, golden snub-nosed monkey, green monkey, marmoset, mouse lemur, proboscis monkey, rhesus macaque, squirrel monkey, tarsier, tree shrew
non-primate mammals	alpaca, armadillo, bison, brown kiwi, cat, Chinese hamster, Chinese pangolin, cow, dog, dolphin, elephant, ferret, guinea pig, hawaiian monk seal, hedgehog, horse, kangaroo rat, little brown bat, Malayan flying lemur, manatee, megabat, Minke whale, mouse, naked mole-rat, opossum, panda, pig, pika, platypus, rabbit, rat, rock hyrax, sheep, shrew, sloth, squirrel, Tasmanian devil, tenrec, wallaby, white rhinoceros
non-mammal chordates	African clawed frog, American alligator, Atlantic cod, budgerigar, chicken, coelacanth, elephant shark, Fugu, garter snake, goldean eagle, lamprey, lizard, medaka, medium ground finch, Nile tilapia, painted turtle, stickleback, Tetraodon, Nanorana parkeri, turkey, Xenopus tropicalis, zebra finch, zebrafish
invertebrates	Anopheles gambiae, Apis mellifera, Caenorhabditis spp (5), California sea hare, Ciona intestinalis, Drosophila spp. (11), Lancelet, Pristionchus pacificus, sea squirt, sea urchin, yeast
viruses	Ebolavirus, SARS-CoV-2 coronavirus

Apart from these 108 species and their assemblies, the UCSC Genome Browser also offers Assembly Hubs, web-accessible directories of genomic data that can be viewed on the browser and include assemblies that are not hosted natively on it. There, users can load and annotate unique assemblies for which UCSC does not provide an annotation database. A full list of species and their assemblies can be viewed in the GenArk Portal, including 2,589 assemblies hosted by both UCSC Genome Browser database and Assembly Hubs. An example can be seen in the Vertebrate Genomes Project assembly hub. Below is a snippet of what users can find when they use the assembly hub:

More information hub gateway, description ...

Assembly Hub
hub gateway	description
primates	NCBI primate genomes (226 assemblies)
mammals	NCBI mammal genomes (679 assemblies)
birds	NCBI bird genomes (432 assemblies)
fishes	NCBI fish genomes (488 assemblies)
vertebrate	NCBI other vertebrate genomes (311 assemblies)
invertebrate	NCBI invertebrate genomes (1158 assemblies)
fungi	NCBI fungi genomes (921 assemblies)
plants	NCBI plant genomes (313 assemblies)
viral	NCBI virus genomes (421 assemblies)
bacteria	NCBI bacteria genomes (123 assemblies)

Subsets of Assemblies
hub gateway	description
VGP	Vertebrate Genomes Project collection (1213 assemblies)
CCGP	The California Conservation Genomics Project (126 assemblies)
BRC	BRC Analytics
HPRC	Human Pangenome Reference Consortium (96 assemblies)
globalReference	Global Human Reference genomes, January 2020 (10 assemblies)
mouseStrains	16 mouse strain assembly and track hub, May 2017
legacy	NCBI genomes legacy/superseded by newer versions (582 assemblies)

Limitations

The UCSC Genome browser is a good tool to use for analyzing genomic sequences and data but it has its own limitations some which include a legacy website interface. In this age of advancements in technology, it would be expected that a website commonly used by thousands of students and researchers globally would have a user friendly interface that is easy to navigate but that is not the case with the UCSC genome browser.

Another pitfall of the UCSC Genome browser is that it is primarily a visualization tool used to showcase various sequences and to do some analysis of these sequences, users would have to use external tools such as MAFFT, COFFEE or MUSCLE.

Remove ads

Browser functionality

Summarize

Perspective

The large amount of data about biological systems that is accumulating in the literature makes it necessary to collect and digest information using the tools of bioinformatics. The UCSC Genome Browser presents a diverse collection of annotation datasets (known as "tracks" and presented graphically), including mRNA alignments, mappings of DNA repeat elements, gene predictions, gene-expression data, disease-association data (representing the relationships of genes to diseases), and mappings of commercially available gene chips (e.g., Illumina and Agilent). The basic paradigm of display is to show the genome sequence in the horizontal dimension, and show graphical representations of the locations of the mRNAs, gene predictions, etc. Blocks of color along the coordinate axis show the locations of the alignments of the various data types. The ability to show this large variety of data types on a single coordinate axis makes the browser a handy tool for the vertical integration of the data.^[23]

To find a specific gene or genomic region, the user may type in the gene name, a DNA sequence, an accession number for an RNA, the name of a genomic cytological band (e.g., 20p13 for band 13 on the short arm of chr20) or a chromosomal position (chr17:38,450,000-38,531,000 for the region around the gene BRCA1).

Presenting the data in the graphical format allows the browser to present link access to detailed information about any of the annotations. The gene details page of the UCSC Genes track provides a large number of links to more specific information about the gene at many other data resources, such as Online Mendelian Inheritance in Man (OMIM) and SwissProt.

Designed for the presentation of complex and voluminous data, the UCSC Browser is optimized for speed. By pre-aligning millions of RNA secuences from GenBank to each of the 244 genome assemblies (many of the 108 species have more than one assembly), the browser allows instant access to the alignments of any RNA to any of the hosted species.

The juxtaposition of the many types of data allow researchers to display exactly the combination of data that will answer specific questions. A pdf/postscript output functionality allows export of a camera-ready image for publication in academic journals.

One unique and useful feature that distinguishes the UCSC Browser from other genome browsers is the continuously variable nature of the display. Sequence of any size can be displayed, from a single DNA base up to the entire chromosome (human chr1 = 245 million bases, Mb) with full annotation tracks. Researchers can display a single gene, a single exon, or an entire chromosome band, showing dozens or hundreds of genes and any combination of the many annotations. A convenient drag-and-zoom feature allows the user to choose any region in the genome image and expand it to occupy the full screen.

Researchers may also use the browser to display their own data via the Custom Tracks tool. This feature allows users to upload a file of their own data and view the data in the context of the reference genome assembly. Users may also use the data hosted by UCSC, creating subsets of the data of their choosing with the Table Browser tool (such as only the SNPs that change the amino acid sequence of a protein) and display this specific subset of the data in the browser as a Custom Track.^[24]

Any browser view created by a user, including those containing Custom Tracks, may be shared with other users via the Saved Sessions

Custom Tracks support multiple file formats, including BED, WIG, GFF, GTF, PSL, and big* formats such as bigBed and bigWig. Users may input data via direct paste, file upload, or by referencing a URL pointing to the remote data. Tracks are temporary and those not associated with a saved session are removed after 48 hours.^[24]

Users can configure tracks with track lines to specify attributes such as name, description, visibility, color, and link targets. Optional browser lines may be included to define initial display coordinates and browser settings. Uploaded tracks can be managed, updated, or deleted through the “Manage Custom Tracks” interface.^[24]

For larger or more persistent data hosting, users may use Track Hubs, which provide a scalable system for remote data integration and advanced configuration.

Tracks

Below the displayed images of the UCSC Genome browser are eleven categories of additional tracks that can be selected and displayed alongside the original data. Researchers can select tracks which best represent their query to allow for more applicable data to be displayed depending on the type and depth of research being done. These categories are as follows:

More information Category, Description ...

Categories
Category	Description	Examples of tracks
Mapping and Sequencing	It allows control over the style of sequencing displayed (e.g., genomic coordinates, sequences, gaps etc.). It can also display a percentage based track to show a researcher if a particular genetic element is more prevalent in the specified area.	Base Position. Mappability, Gap
Genes and Gene Predictions	It offers programs to predict genes and which databases to display known genes from. The different tracks allow the user to display gene models, protein coding regions, non-coding RNA etc. Users can quickly compare their query with pre-selected sets of genes to look for correlations between known sets of genes.	GENCODE v24, Geneid Genes, Pfam in UCSC Gene
Phenotype and Literature	Databases containing specific styles of phenotype data. These tracks are intended for use primarily by physicians and other professionals concerned with genetic disorders (e.g., genetics researchers, students in science and medicine). Users can display a track that shows the genomic positions of natural and artificial amino acid variants. Recent additions include AlphaMissense, which uses deep learning to predict the pathogenicity of missense variants, and Varchat, which applies large language models to summarize scientific literature related to specific genomic variants.^[25]	OMIM Alleles, Cancer Gene Expr Super-track, AlphaMissense, enGenome VarChat
COVID-19	It shows data from Genome-Wide Association Studies (GWAS) and variant calling experiments to identify genetic variants associated with severity and susceptibility to COVID-19 disease.	COVID GWAS v3, COVID GWAS v4, Rare Harmful Vars
Single Cell RNA-Seq	It offers RNA expression data at single cell level (scRNA-Seq) from different human tissues (e.g., kidney, colon, heart, muscle, placenta, peripheral blood mononuclear cells etc.)	Blood (PBMC), Heart Cell Atlas, Colon Wang
mRNA and EST	It shows Expressed Sequence Tags (ESTs) and messenger RNA. ESTs are single-read sequences, typically about 500 bases in length, that usually represent fragments of transcribed genes. The mRNA tracks allow the display of mRNA alignment data in Humans, as well as, other species. There are also tracks allowing comparison with regions of ESTs that show signs of splicing when aligned with the genome.	Human ESTs, Other ESTs, Other mRNAs
Expression	It offers genetic data and related gene expression in tissue areas. This allows users to discover if a particular gene or sequence is linked with various tissues throughout the body. The expression tracks also allow for displays of consensus data about the tissues that express the query region.	GTEx Gene, Affy U133
Regulation	Information relevant to regulation of transcription from different studies. Users can adjust the regulation tracks to add a display graph to the genome browser. These displays allow for more detail about regulatory regions, transcription factor binding sites, RNA binding sites, regulatory variants, haplotypes, and other regulatory elements.	ENCODE Regulation Super-track Settings, ORegAnno
Comparative Genomics	It shows sequences conservation data, including primates, vertebrates, mammals among others. The comparative alignments give a graphical view of the evolutionary relationships among species. This makes it a useful tool both for the researcher, who can visualize regions of conservation among a group of species and make predictions about functional elements in unknown DNA regions, and in the classroom as a tool to illustrate one of the most compelling arguments for the evolution of species. The Conservation track on the human assembly clearly shows that the farther one goes back in evolutionary time (this track includes 100 species), the less sequence homology remains, but functionally important regions of the genome (e.g., exons and control elements, but not introns typically) are conserved much farther back in evolutionary time.	Conservation, Cactus 241-way, Cons 30 Primates
Variation	It compares the searched sequence with known variations. For example, the entire contents of each release of the dbSNP database from NCBI are mapped to human, mouse and other genomes. This includes the fruits of the 1000 Genomes Project, as soon as they are released in dbSNP. Other types of variation data include copy-number variation data (CNV) and human population allele frequencies from the HapMap project.	Common SNPs(150), All SNPs(146), Flagged SNPs(144)
Human Pangenome – HPRC	It provides access to tracks from the Human Pangenome Reference Consortium assembly hub, including short variant calls, multi-assembly multiple alignments, chain/net alignments, and structural variant summaries derived from 47 high-quality diploid assemblies aligned to the hg38 reference. It enables researchers to explore sequence and structural diversity across a genetically diverse cohort, compare allele distributions, visualize conservation patterns among multiple human assemblies, and inspect large-scale rearrangements in the human pangenome.^[26]	Multiple Alignment, Pairwise Alignments, Rearrangements, Short Variants
Repeats	Allows tracking of different kinds of repeated sequences in the query. Users can quickly see if their specified search contains large amounts of repeated sequences at a glance and adjust their search or track displays accordingly.	RepeatMasker, Microsatellite, WM + SDust

Remove ads

Analysis tools

Summarize

Perspective

Overview

The UCSC site hosts a set of genome analysis tools. Each tool allows users to create, find, and modify sequences to find similar sequences or patterns. These tools are generally free to use for academic purposes, nonprofit organizations and individuals with a personal interest in genomics.

Tools developed by UCSC include: Genome Browser, BLAT, In-Silico PCR, Table Browser, LiftOver, REST API, Variant Annotation Integrator, Gene Sorter, Genome Graph, Data Integrator, UShER, Gene Interactions, VisiGene, DNA Duster, Protein Duster, and Phylogenetic Tree PNG Maker. Source Code for BLAT, LiftOver and Genome Browser is available for download on the UCSC website.

Other useful tools that work with UCSC file formats include: BEDOPS, bedtools, bwtool CrossMap, CruzDb, G-OnRamp, libBigWig, MakeHub, RTrackLayer, trackhub, twobitreader, ucsc-genomes-download, and Wiggle Tools.

BLAT

BLAT^[27] is a FASTA format sequence alignment tool that is useful for finding sequences in the massive sequence (human genome = 3.23 billion bases [Gb]) of any of the featured genomes. Users are able to paste a sequence into the text box or upload a file containing the sequence. The tool also includes customizability depending on what a user is looking for. Users may choose the genome and assembly the sequence belongs to, Query type, Sort output and Output type. Using BLAT on DNA finds sequences ≥ 95% similarity of bases of lengths ≥ 25. It indexes the genome in memory consisting of all overlapping 11-mers stepping by 5 unless there are repeats. Using BLAT on protein finds sequences ≥ 80% similarity of amino acids of lengths ≥ 20. It indexes the genome in memory of all overlapping 4-mers stepping 5 unless there are repeats. BLAT was written by Jim Kent, more information about the software can be found on his website.

Genome graphs

The Genome Graphs tool allows users to view all chromosomes at once and display the results of genome-wide association studies (GWAS). Users can customize the clad an organism is in, the genome and assembly type, graph colors, and the significance threshold. Users can also either upload their own data, import database assemblies or configure the layout of the graph, graph style, and chromosome layout. There is a more detailed instruction guide for users who may want to utilize all features to their fullest potential on the Genome Graphs User's Guide.

LiftOver

The LiftOver^[28] tool uses whole-genome alignments to allow conversion of sequences from one assembly to another or between species. A user can enter the genome coordinates and annotations into the textbox or upload the file to the system. The original genome and assembly are selected first as well as the new genome and assembly that it is going to be converted into. The input can be customized in two categories: Regions defined by chrom:start-end (BED 4 to BED 6) and Regions with an exon-intron structure (usually transcripts, BED 12). Regions defined by chrom:start-end can be customized to allow for multiple output regions, set the minimum hit size in query and set minimum chain size in target. Regions with an exon-intron structure can be customized to set the minimum ratio of alignment blocks or exons that must map and set; if an exon is not mapped, use the closest mapped base.

Remove ads

Python APIs

Summarize

Perspective

The UCSC Genome Browser provides Python-compatible interfaces that allow researchers to programmatically access genomic data and annotations. These APIs support automation, integration into computational workflows, and large-scale analysis tasks, enhancing accessibility beyond the graphical browser.

Overview

The UCSC REST API is the primary method for programmatic interaction. It allows users to send HTTP requests to retrieve genomic sequences, annotation tracks, and gene-related information. While the API itself is language-agnostic, Python developers can easily integrate it using libraries such as requests. Community-developed wrappers and tools further simplify API usage in Python-based bioinformatics environments.

Functionality and use cases

Common uses of the UCSC REST API in Python include:

Sequence Retrieval – Downloading nucleotide sequences from specific genome coordinates
Gene Annotation Access – Accessing curated data from RefSeq, GENCODE, and other gene tables
Variation Data Queries – Obtaining information about SNPs, insertions, or structural variants in defined regions
Track Information Extraction – Listing available tracks and metadata for a given genome build
Pipeline Integration – Automating queries in larger workflows for comparative or functional genomics

These capabilities make the API useful for custom dashboards, automated annotation pipelines, and downstream analysis in tools like Jupyter Notebooks or Snakemake.

Example: Fetching genomic sequence with Python

import requests

# Define endpoint and parameters
url = "https://api.genome.ucsc.edu/getData/sequence"
params = {
    "genome": "hg38",
    "chrom": "chr1",
    "start": 1000000,
    "end": 1000100
}

# Make the request
response = requests.get(url, params=params)
data = response.json()

# Display the DNA sequence
print("Retrieved sequence:", data["dna"])

This snippet requests the sequence from position 1,000,000 to 1,000,100 on the hg38 human genome assembly and returns the raw DNA bases. It illustrates how researchers can access genome content without downloading entire datasets.

Comparison to other access methods

More information Feature, Table Browser ...

Feature	UCSC REST API	Table Browser	MySQL Access
Requires graphical UI	No	Yes	No
Suitable for automation	Yes	Limited	Yes
Supports Python scripts	Yes	No	Yes (via connectors)
Returns JSON	Yes	No	No

This flexibility makes the REST API ideal for rapid, scriptable access to UCSC’s genomic resources.

Limitations

While the UCSC REST API is highly accessible, it is limited by:

Rate limits and request size constraints
Lack of complex filtering (compared to MySQL or Table Browser advanced queries)
No built-in authentication for sensitive data (e.g., private tracks)

For large datasets or bulk analysis, users may still prefer downloading entire tracks or working with the UCSC Genome Browser database locally.

Resources and documentation

UCSC Genome Browser REST API Documentation

Remove ads

Open source / mirrors

The UCSC Browser code base is open-source for non-commercial use, and is mirrored locally by many research groups, allowing private display of data in the context of the public data. The UCSC Browser is mirrored at several locations worldwide, as shown in the table.

More information Official mirror sites ...

Official mirror sites
European mirror — maintained by UCSC at University of Bielefeld, Germany
Asian mirror — maintained by UCSC at RIKEN, Yokohama, Japan

The Browser code is also used in separate installations by the UCSC Malaria Genome Browser and the Archaea Browser.

Remove ads

References

Loading content...

External links

Loading content...

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads

History

Origins and early development (2000–2003)

Expansion and feature enhancements (2004–2010)

Further integration with major genomic projects (2011–2015)

Recent developments and recognition (2016–present)

Genomes

Limitations

Browser functionality

Tracks

Analysis tools

Overview

BLAT

Genome graphs

LiftOver

Python APIs

Overview

Functionality and use cases

Example: Fetching genomic sequence with Python

Comparison to other access methods

Limitations

Resources and documentation

Open source / mirrors

See also

References

External links