WP3

Data sharing will be done within the Working group for data sharing of the genetic laboratory society VKGL. This group consists of laboratory specialists and bioinformaticians of each centre. Of note, part of the implementation and deployment of the databases to be used is already financed by BBMRI and the centres themselves.

OBJECTIVES
To efficiently interpret the clinical relevance of the enormous amount of variants generated by WGS in this project, it is necessary to have easy access to information about previous observations of variants and their frequencies. The first step in diagnostics interpretation is filtering the variants observed in the patient with known variants from, e.g. 1000Genomes, ExAC and dbSNP. From experience it is known that these databases do not contain sufficient information to efficiently filter population specific benign variants. Moreover, the time to do a diagnose will be considerably shortened by sharing rare potentially pathogenic variants; if a clinically similar patient with a variant in the same gene is identified it will be more likely this variant is indeed the cause of the disorder. This WP will deliver infrastructure for storage and data sharing by using bioinformatics tools like VARDA and MOLGENIS for storing frequency data (VCF) and by making high quality clinical classification accessible (Figure 4). To minimize cost, we will however not implement centralized variant calling, potentially sacrificing the opportunity to standardize this aspect between all Dutch genetic laboratories. The ultimate goal is to make all anonymized data publicly accessible.

TASKS AND DESCRIPTION OF WORK

  • 3.1: Deliver a frequency calculation pipeline 'plug-in' to be executed by the laboratories to derive comparable frequencies and uniform reference data for benchmarking and research using gVCF data from the WGS generating centres and a central server for storing frequencies using the VARDA database software.
  • 3.2: Depositing and processing the variant classifications in an accessible database, with links back to individual labs for permitted using MOLGENIS software.
  • 3.3: Development of interfaces to integrate with local diagnostics software to prevent handwork in data sharing.
  • 3.4: Making genotype frequencies and variant classifications publicly accessible via BBMRI-NL in interoperable formats with consideration of ELSI constraints (WP4). In addition, data will be disseminated via LOVD, ClinVar, dbSNP, GA4GH, EGA and others if applicable. All data, including raw sequence data and per-individual genotypes, will be deposited in the European Genotype Archive for long-term archiving. Of note, this archive has controlled access via a designated data access committee.
  • 3.5: Seamless querying of frequency and classification data by integrating MOLGENIS and VARDA from a user perspective using microservices and adding advanced user interfaces for direct use of the data (complementing Task 3.4).
  • 3.6: Generating and analysing data on bioinformatics and lT cost of large scale WGS analysis with evaluation of sustainability options (e.g. cloud).

DELIVERABLES OF WP3

  • D3.1 A de-central variant (frequency) calculation tool and classification sharing protocol (M6)
  • D3.2 Deposition and processing of gVCF data into VARDA (M36)
  • D3.2 Deposition and processed variant classifications into MOLGENIS (M36)
  • D3.3 Data interfaces to diagnostic applications and operational support(M36)
  • D3.4 Public data accessibility via graphical user interfaces and programmatic interfaces to public repositories (M36)
  • D3.5 Evaluation of IT cost and sustainability options WGS analysis (M36)