Population

Population name van Blokland
Genome GRCh38
Consortium Various research groups and Universities
Super population EUR, AMR
Population description Individuals from the Generation Scotland, Helix cohort (Helix DNA Discovery project) in the United States, Lifelines COVID-19 cohort (Lifelines population cohort and the Lifelines NEXT birth cohort in the Northern part of the Netherlands) and the Netherlands twin register (NTR) D1-Predicted self reported COVID-19, B2-Hospitalised COVID-19, C1-covid vs. lab/self-reported negative, C2- covid vs. population
Population origin Northern, Western Europe and USA
Case population size 1865
Control population size 29174
Comorbidities Not specified
Mean / median age 47.5
Sex 26% male
Severity Severe
Sample source Nasopharyngeal swab / Whole blood
Method Menni COVID-19 prediction model to find COVID-19 cases. GWAS performed on predicted COVID-19 case-controls. Top 20 SNPs from COVID-19 HGI D1 cohort were replicated in the C1, C2 and B1 analysis to compare predicted COVID-19 and other cohorts. Data generated with various methods from the four cohorts as follows: HumanCytoSNP Infinium (Global Screening Assay (GSA)/GSA MultiEthnic Disease Version); Perlegen-Affymetrix; Affymetrix (6.0/Axiom); Illumina (Human Quad Bead 660/Omni 1M GSA/OmniExpres
Bioinformatics Cohort data in SAIGE format was processed in WDL workflows made available at https://github.com/covid19-hg/META_ANALYSIS. Inverse variance weighting of effects was used to account for strand-differences and allele flips in individual studies. All build 37 statistics were upgraded to 38 build and allele harmonization was performed using gnomAD 3.0 genomes before beginning the meta-analysis.
Imputation details Generation Scotland- phasing using Shapeit v2.r873 and duohmm and imputation using the HRC.r1-1 panel. Helix-1000 Genomes Phase 3 data for imputation. Lifelines-Haplotype Reference Consortium (HRC) panel v1.1 at the Sanger imputation server. NTR- data was phased using Eagle and then imputed to 1000 Genomes and Topmed using Minimac
Limitations 1. Predictive COVID-19 training data might not be fully representative of the whole spectrum of COVID-19, 2. Symptoms may overlap with other diseases, predicted cases may be falsely identified 3. Prevalence of COVID-19 might be different among different populations and cohorts