AI Discovers Causes of Autism in Uncharted DNA

Using artificial intelligence (AI), a research team has recently uncovered novel genetic mutations associated with autism in noncoding regions of DNA. The scientists leveraged deep learning to analyze these ‘junk’ regions of the genome that may not affect what certain genes produce, but rather how much of it they make. The work was published on May 27 in Nature Genetics.

Deep Learning AI Applied to Human Genome

This deep learning AI technique was applied to the genomes of 1,790 families in which one child has autism and the rest of the family does not. In other words, the participants with autism did not come from families with a history of the condition, therefore their autism is likely caused by random mutations rather than inheritance. By limiting their analysis to such cases of autism, the researchers were able to theoretically isolate mutations that were unique to autism.

The team went through 120,000 mutations to identify those that were tied to genetic behavior in the autistic individuals. These results did not show the exact causes of autism but did reveal the possible noncoding genetic contributors that were unique to those with the condition.

The researchers note that until recently, it was not possible to analyze a full genome for noncoding elements that regulate genes and predict how mutations in these regions could play into disease development. Though their work was strictly with noncoding mutations contributing to autism, the team emphasizes that this approach has broader implications.

“This method provides a framework for doing this analysis with any disease,” said senior author Olga Troyanskaya, professor of computer science and genomics. The approach could be particularly helpful for neurological disorders, cancer, heart disease and many other conditions that have eluded efforts to identify genetic causes. This transforms the way we need to think about the possible causes of those diseases.” Troyanskaya is also genomic director at the Simons Foundation’s Flatiron Institute in New York.

The deep learning algorithm employed in this study utilizes complex data analysis to uncover patterns that would be extremely challenging to identify by other means. This algorithm found the relevant regions of DNA in the analyzed genomes, then predicted which sections impact the 2,000+ protein interactions that regulate genes. The deep learning system also predicted if a single mutation in a DNA pairing could significantly impact these protein interactions.

READ MORE: Using Gold Particles to Treat Genetic Diseases

Analyzing each individual pair of DNA subunits, the algorithm scans the genome for all mutations and predicts the effect of a mutation at each location. The end result is a list of DNA sequences that are predicted to regulate genes and mutations that are likely to impair this regulation. Before such deep learning technology, researchers would have to analyze each sequence in a lab and predict the effects of each possible mutation within that sequence. To do this without AI in a lab would require millions of experiments using different tissue and cell types and is not a scalable solution for repeating the research. Machine learning AI has been applied to analyze sections of DNA, but has not been able to do so at the level of this deep learning algorithm approach.

To test if their algorithm made accurate predictions regarding these noncoding mutations, the researchers implanted these genetic segments into cells to observe altered gene expression. The changes in gene expression observed in these cells confirmed the accuracy of the AI’s predictions.

Troyanskaya and her team plan to improve upon this system, aiming to enhance genome analysis and the management of diseases.

“Right now, 98 percent of the genome is usually being thrown away. Our work allows you to think about what we can do with the 98 percent.”

Sources: Princeton Engineering, Simons Foundation