A graphical toolkit for visualizing genome data
The world of genomics does not lack visualization tools. But as new methods and new types of data emerge, existing techniques may struggle to cope. Now, a tool known as Gosling allows bioinformaticians to create applications that can display genomic information with the same level of flexibility that developers expect from other graphical programming tools.
First published in 2020 by bioinformatician Nils Gehlenborg and his team at Harvard Medical School in Boston, Massachusetts, Gosling stands for “grammar of scalable linked interactive nucleotide graphics”1. But the name is also a nod to structuralist biologist Raymond Gosling, who together with Rosalind Franklin captured the famous “Photograph 51”, which revealed the structure of DNA.
Gosling is what is called a grammar. It is implemented in programming libraries that provide a flexible syntax for describing genomic regions and interactions and how they should be laid out on a web page. Researchers and bioinformaticians can use these libraries to create interactive and scalable visualizations that they can share with their colleagues and to create custom genetic analysis tools.
Bridging the gap
“Gosling is really filling that gap by making it easier to build new tools with visualization components,” says Maria Nattestad, software engineer at Google in Mountain View, California. As part of his doctoral research in 2015, Nattestad developed a tool called SplitThreader, which presents the genome in a circular layout known as a Circos plot, with sequencing reads as arcs to highlight variations structural. With no other options, she drew these elements from scratch, using D3.js to specify the location and dimensions of each line, rectangle, and circle. “It was such a learning curve,” she says. “It took me a long time to build SplitThreader,” she says, but adds that it probably could have been built much faster with Gosling.
Gehlenborg says Gosling arose from 2019 literature review2, during which his team studied the landscape of genome visualization and built a taxonomy for the tools and their capabilities. From there, the researchers developed a syntax to systematically describe the visualizations these tools could do. Gosling, explains Gehlenborg, “is a fundamental approach to assembling genomic visualizations using this same taxonomy.”
Postdoc Sehi L’Yi, who led the development of Gosling, says what sets Gosling apart from other visualization tools is its expressiveness. With most tools, he says, the graphics that can be created and what they will look like are predefined. “It’s really not easy to customize visualizations as a user.” But with Gosling, users can, for example, specify the color, dimensions, and location of the symbol used to represent a centromere or genomic interval, then overlay it on an ideogram of a chromosome to highlight a region of ‘interest.
An interesting space
A team of master’s students from the University of British Columbia decided to use Gosling to create their final project in a data visualization course. “One of my teammates heard about it at a conference last year,” says team member Armita Safa. “Even for someone with no coding background, Gosling is relatively easier to work with than most other things that are used for visualization,” she says. That said, she notes that they initially struggled to extract the data they needed to allow users to click on regions and create new visualizations.
Dominic Girardi, product manager at data visualization company Datavisyn in Linz, Austria, also experimented with Gosling to create an interactive playground that lets users filter an array of genes by genomic region. The company — which Gehlenborg co-founded — is now using Gosling to build visualization tools for its enterprise clients, though it isn’t done yet, Girardi says.
Gosling isn’t the only visualization library for genomic data; other examples include ggbio, gggenomes, and gggenes, all of which are extensions of the ggplot2 graphics library. But most of these tools create static images, says Gehlenborg — images rather than interactive visualizations. Gehlenborg says future plans for Gosling include giving it a graphical interface, so researchers can create visualizations by dragging and dropping widgets onto a virtual canvas rather than having to program them.
Robert Buels, who leads the development of a genome browser at the University of California, Berkeley, says Gosling “takes up a really interesting space” in the genomic visualization toolkit. “You can get a lot more customization with Gosling,” he says. But users don’t need to write as much code as for tools like D3.js.
“It’s a really interesting niche between the two things,” he says, “which I think is a really good addition to the field.”