Machine learning and genetic research insights show how cells work – and may help develop new drugs for COVID-19 and other diseases
The research brief is a brief overview of interesting academic work.
The big idea
We combined a machine learning algorithm with knowledge gleaned from hundreds of biological experiments to develop a technique that enables biomedical researchers to understand the functions of proteins that turn genes on and off in cells, called transcription factors. This knowledge could facilitate the development of drugs for a wide range of diseases.
At the start of the COVID-19 pandemic, scientists who worked out the genetic code for RNA molecules in cells in the lungs and intestines found that only a small group of cells from these organs were most vulnerable to infection with the SARS-CoV-2 virus. This allowed the researchers to focus on blocking the ability of the virus to enter these cells. Our technique could make it easier for researchers to find this kind of information.
The biological knowledge we work with comes from this type of RNA sequencing, which gives researchers insight into the hundreds of thousands of RNA molecules in a cell as they are translated into proteins. A widely regarded machine learning tool, the Seurat analysis platform, has helped researchers around the world discover new cell populations in healthy and diseased organs. This machine learning tool processes data from single-cell RNA sequencing without any information up front about how these genes work and relate to each other.
Our technique takes a different approach by adding knowledge about certain genes and cell types to find clues about the distinct roles of cells. There has been over a decade of research identifying all of the potential targets for transcription factors.
Armed with this knowledge, we used a mathematical approach called Bayesian inference. In this technique, prior knowledge is converted into probabilities which can be calculated on a computer. In our case, it is the probability that a gene is regulated by a given transcription factor. We then used a machine learning algorithm to determine the function of transcription factors in each of the thousands of cells we analyzed.
We published our technique, called Bayesian Inference Transcription Factor Activity Model, in the journal Genome Research and also made the software available for free so that other researchers can test and use it.
Why is this important
Our approach works on a wide range of cell and organ types and could be used to develop treatments for conditions like COVID-19 or Alzheimer’s disease. Drugs for these difficult-to-treat diseases work best if they target the cells that cause the disease and prevent collateral damage to other cells. Our technique makes it easier for researchers to focus on these targets.
What other research is in progress
Single-cell RNA sequencing has revealed how each organ can have 10, 20, or even more specialized cell subtypes, each with distinct functions. A very exciting new development is the emergence of spatial transcriptomics, in which RNA sequencing is performed in a spatial grid that allows researchers to study the RNA of cells at specific locations in an organ.
A recent paper used a Bayesian statistical approach similar to ours to determine the distinct roles of cells while taking into account their proximity to each other. Another research group spatial data combined with single-cell RNA sequencing data and studied the distinct functions of neighboring cells.
We plan to work with colleagues to use our new technique to study complex diseases such as Alzheimer’s disease and COVID-19, work that could lead to new drugs for these diseases. We also want to work with colleagues to better understand the complexity of interactions between cells.
[Understand new developments in science, health and technology, each week. Subscribe to The Conversation’s science newsletter.]