Disease gene discovery on chromosome (chr) X is challenging owing to its unique modes of inheritance. We undertook a systematic analysis of human chrX genes. We observe a higher proportion of disorder-associated genes and an enrichment of genes involved in cognition, language, and seizures on chrX compared to autosomes. We analyze gene constraints, exon and promoter conservation, expression, and paralogues, and report 127 genes sharing one or more attributes with known chrX disorder genes. Using machine learning classifiers trained to distinguish disease-associated from dispensable genes, we classify 247 genes, including 115 of the 127, as having high probability of being disease-associated. We provide evidence of an excess of variants in predicted genes in existing databases. Finally, we report damaging variants in CDK16 and TRPC5 in patients with intellectual disability or autism spectrum disorders. This study predicts large-scale gene-disease associations that could be used for prioritization of X-linked pathogenic variants.Discovering disease genes on the X chromosome can be particularly challenging. Here, the authors use features of known disease genes and machine learning to predict genes that remain to be associated with disorders on this chromosome.
Systematic analysis and prediction of genes associated with monogenic disorders on human chromosome X
Scala M.;Striano P.;
2022-01-01
Abstract
Disease gene discovery on chromosome (chr) X is challenging owing to its unique modes of inheritance. We undertook a systematic analysis of human chrX genes. We observe a higher proportion of disorder-associated genes and an enrichment of genes involved in cognition, language, and seizures on chrX compared to autosomes. We analyze gene constraints, exon and promoter conservation, expression, and paralogues, and report 127 genes sharing one or more attributes with known chrX disorder genes. Using machine learning classifiers trained to distinguish disease-associated from dispensable genes, we classify 247 genes, including 115 of the 127, as having high probability of being disease-associated. We provide evidence of an excess of variants in predicted genes in existing databases. Finally, we report damaging variants in CDK16 and TRPC5 in patients with intellectual disability or autism spectrum disorders. This study predicts large-scale gene-disease associations that could be used for prioritization of X-linked pathogenic variants.Discovering disease genes on the X chromosome can be particularly challenging. Here, the authors use features of known disease genes and machine learning to predict genes that remain to be associated with disorders on this chromosome.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.