The availability of high-throughput, low-cost sequencing has transformed the landscape of biomedical research by dramatically expanding our capacity to interrogate the sequence of the human genome. Consequently, there has been an explosion of biomedical literature describing the role of specific genomic variants and their impact on human diseases. The six knowledgebases of the VICC have been independently created to curate the biomedical literature for these interpretations. However, the vast majority of the cited papers from any of the knowledgebases are unique to the collective. These findings illustrate the enormity of the task of curating the biomedical literature.
The knowledgebase integration project is built upon the GA4GH Genotype-to-Phenotype framework. The intent of the project is to leverage the collective knowledge of the disparate existing resources of the VICC to improve the comprehensiveness of clinical interpretation of genomic variation. An ongoing goal will be to provide and improve upon standards and guidelines by which other groups with clinical interpretation data may make it accessible and visible to the public. We have released a preprint discussing our initial harmonization effort and observed disparities in the structure and content of variant interpretations.
We are actively developing a prototype cross-knowledgebase query interface, available at: https://search.cancervariants.org/.