Integration of population genome data and 3D protein models for the improved interpretation of coding variants


General Background
Next Generation Sequencing (NGS) technologies have enabled large-scale DNA sequencing projects of patient cohorts. The very large number of rare variants present in every individual's genome has made the interpretation of genomic variation the main obstacle in identifying the genetic cause of disease. Current bioinformatic tools that predict the effect of variants are not sensitive enough to identify a single cause of disease amongst all of these rare variants.

Objective
The overall aim is to improve the interpretation of coding variants. For this proposal we intend to focus on two outcomes:
  • Improving the understanding of protein structures through knowledge about its mutability and vice versa, i.e. improving the annotation of genomic variation through protein structure information.

  • Practical application. The improved interpretation of genomic variation to facilitate:

    • The genetic diagnoses of common Mendelian disorders. Exome sequencing has become a standard for the diagnosis of genetically heterogeneous disorders, and therefore time and effort required to interpret all potentially relevant variants has increased significantly, especially for missense mutations which represent the majority of all mutations occurring in the coding region. Currently most diagnostic requests are on genetic forms of blindness, deafness, movement disorders, metabolic disorders and ID. We will provide single nucleotide pathogenicity predictions for the genes that are most commonly mutated in these disorders, thereby improving interpretation of novel variants and facilitating the diagnostic decision-making.
    • The identification of novel ID genes. The improved interpretation of genomic variation can be used to facilitate the identification of novel ID genes. ID is a major research focus within the department of genetics and several large-scale studies are ongoing to identify novel genes for ID. We estimate that less than half of all ID genes are known to date. When we are better at predicting the pathogenicity of missense mutations in genes not yet related to ID, we will be able to better prioritize genes for functional follow-up studies.

Project description
The interpretation of genomic variation is one of the largest challenges in modern day genetics, now that identification of these variants is becoming more and more routine using ever improving next generation sequencing technologies. Especially within diagnostic laboratories diagnostic reports are often inconclusive for novel mutations, even in known disease genes until functional work has been done or more patients with the same mutation are described. Methods that address this problem are in great demand. We intend to model population variation onto protein 3D structures and use this 3D level information to assess pathogenicity on the genome level. By modeling benign genomic variation that occurs within the normal population onto protein 3D structures we can identify protein substructures (domains) that are tolerant to genetic variation and therefore less important for normal protein function.

Go back