A graphical toolkit for visualizing genome data

The world of genomics does not lack visualization tools. But as new methods and new types of data emerge, existing techniques may struggle to cope. Now, a tool known as Gosling allows bioinformaticians to create applications that can display genomic information with the same level of flexibility that developers expect from other graphical programming tools.

First published in 2020 by bioinformatician Nils Gehlenborg and his team at Harvard Medical School in Boston, Massachusetts, Gosling stands for “grammar of scalable linked interactive nucleotide graphics”1. But the name is also a nod to structuralist biologist Raymond Gosling, who together with Rosalind Franklin captured the famous “Photograph 51”, which revealed the structure of DNA.

Gosling is what is called a grammar. It is implemented in programming libraries that provide a flexible syntax for describing genomic regions and interactions and how they should be laid out on a web page. Researchers and bioinformaticians can use these libraries to create interactive and scalable visualizations that they can share with their colleagues and to create custom genetic analysis tools.

Views created by Gosling can be linked, so that selecting a region in one panel highlights the same region in another. They can also be panned, manipulated and zoomed in and out from the chromosomal level down to single nucleotides. “The visual representation adapts to the zoom level,” says Gehlenborg – a feature called semantic zoom. An online test environment provides visualizations that users can extend to create and export their own graphs. And libraries for Python (Gos) and JavaScript (gosling.js) allow bioinformaticians to program the images directly into Jupyter compute notebooks and other applications. An alpha R version was released in July. Libraries are used to systematically link datasets to their visualizations, says Tamara Munzner, a computer scientist at the University of British Columbia in Vancouver, Canada. Popular libraries such as ggplot2 and Vega-Lite use “graph grammar” to define their visualizations. But these tools can be used for any type of graph, whereas Gosling is specifically designed for genomic visualizations. “It’s like Vega-Lite for genomics,” says Munzner.

Bridging the gap

Programming tools for visualizations range from template-based functions that use a single line of code to create a standard type of chart to those that assemble visualizations piece by piece from lines and geometric shapes, such as the Library JavaScript D3.js. The model version is easy to use, but relatively stiff; the other offers a lot more customization, but is laborious to use.

“Gosling is really filling that gap by making it easier to build new tools with visualization components,” says Maria Nattestad, software engineer at Google in Mountain View, California. As part of his doctoral research in 2015, Nattestad developed a tool called SplitThreader, which presents the genome in a circular layout known as a Circos plot, with sequencing reads as arcs to highlight variations structural. With no other options, she drew these elements from scratch, using D3.js to specify the location and dimensions of each line, rectangle, and circle. “It was such a learning curve,” she says. “It took me a long time to build SplitThreader,” she says, but adds that it probably could have been built much faster with Gosling.

Gehlenborg says Gosling arose from 2019 literature review2, during which his team studied the landscape of genome visualization and built a taxonomy for the tools and their capabilities. From there, the researchers developed a syntax to systematically describe the visualizations these tools could do. Gosling, explains Gehlenborg, “is a fundamental approach to assembling genomic visualizations using this same taxonomy.”

Gosling encodes the data using a plain text format called JavaScript Object Notation (JSON) and uses a genomics-specific language to supplement more general terms used in standard graphics libraries. Gosling.js, Gos and g(R)osling then use this encoding to generate files in their respective programming languages. The final visualization is drawn in a web browser using a rendering engine and file formatting tools developed by Gehlenborg’s team to visualize chromosomal data from a technique called Hi-C3. Visualizations on gosling-lang.org provide starting points for Circos plots, gene annotation, chromatin conformation heatmaps, evolutionary conservation and more.

Postdoc Sehi L’Yi, who led the development of Gosling, says what sets Gosling apart from other visualization tools is its expressiveness. With most tools, he says, the graphics that can be created and what they will look like are predefined. “It’s really not easy to customize visualizations as a user.” But with Gosling, users can, for example, specify the color, dimensions, and location of the symbol used to represent a centromere or genomic interval, then overlay it on an ideogram of a chromosome to highlight a region of ‘interest.

An interesting space

A team of master’s students from the University of British Columbia decided to use Gosling to create their final project in a data visualization course. “One of my teammates heard about it at a conference last year,” says team member Armita Safa. “Even for someone with no coding background, Gosling is relatively easier to work with than most other things that are used for visualization,” she says. That said, she notes that they initially struggled to extract the data they needed to allow users to click on regions and create new visualizations.

Dominic Girardi, product manager at data visualization company Datavisyn in Linz, Austria, also experimented with Gosling to create an interactive playground that lets users filter an array of genes by genomic region. The company — which Gehlenborg co-founded — is now using Gosling to build visualization tools for its enterprise clients, though it isn’t done yet, Girardi says.

Gosling isn’t the only visualization library for genomic data; other examples include ggbio, gggenomes, and gggenes, all of which are extensions of the ggplot2 graphics library. But most of these tools create static images, says Gehlenborg — images rather than interactive visualizations. Gehlenborg says future plans for Gosling include giving it a graphical interface, so researchers can create visualizations by dragging and dropping widgets onto a virtual canvas rather than having to program them.

Robert Buels, who leads the development of a genome browser at the University of California, Berkeley, says Gosling “takes up a really interesting space” in the genomic visualization toolkit. “You can get a lot more customization with Gosling,” he says. But users don’t need to write as much code as for tools like D3.js.

“It’s a really interesting niche between the two things,” he says, “which I think is a really good addition to the field.”

Comments are closed.