Population name | van Blokland |
Genome | GRCh38 |
Consortium | Various research groups and Universities |
Super population | EUR, AMR |
Population description | Individuals from the Generation Scotland, Helix cohort (Helix DNA Discovery project) in the United States, Lifelines COVID-19 cohort (Lifelines population cohort and
the Lifelines NEXT birth cohort in the Northern part of the Netherlands) and the Netherlands twin register (NTR)
D1-Predicted self reported COVID-19, B2-Hospitalised COVID-19, C1-covid vs. lab/self-reported negative, C2- covid vs. population |
Population origin | Northern, Western Europe and USA |
Case population size | 1865 |
Control population size | 29174 |
Comorbidities | Not specified |
Mean / median age | 47.5 |
Sex | 26% male |
Severity | Severe |
Sample source | Nasopharyngeal swab / Whole blood |
Method | Menni COVID-19 prediction model to find COVID-19 cases. GWAS performed on predicted COVID-19 case-controls. Top 20 SNPs from COVID-19 HGI D1 cohort were replicated in the C1, C2 and B1 analysis to compare predicted COVID-19 and other cohorts. Data generated with various methods from the four cohorts as follows: HumanCytoSNP Infinium (Global Screening Assay (GSA)/GSA MultiEthnic Disease Version); Perlegen-Affymetrix; Affymetrix (6.0/Axiom); Illumina (Human Quad Bead 660/Omni 1M GSA/OmniExpres |
Bioinformatics | Cohort data in SAIGE format was processed in WDL workflows made available at https://github.com/covid19-hg/META_ANALYSIS.
Inverse variance weighting of effects was used to account for strand-differences and allele flips in individual studies. All build 37 statistics were upgraded to 38 build and allele harmonization was performed using gnomAD 3.0 genomes before beginning the meta-analysis. |
Imputation details | Generation Scotland- phasing using Shapeit v2.r873 and duohmm and imputation using the HRC.r1-1 panel. Helix-1000 Genomes Phase 3 data for imputation. Lifelines-Haplotype Reference Consortium (HRC) panel v1.1 at the Sanger imputation server. NTR- data was phased using Eagle and then imputed to 1000 Genomes and Topmed using Minimac |
Limitations | 1. Predictive COVID-19 training data might not be fully representative of the whole spectrum of COVID-19, 2. Symptoms may overlap with other diseases, predicted cases may be falsely identified 3. Prevalence of COVID-19 might be different among different populations and cohorts |