Population

Population name	van Blokland
Genome	GRCh38
Consortium	Various research groups and Universities
Super population	EUR, AMR
Population description	Individuals from the Generation Scotland, Helix cohort (Helix DNA Discovery project) in the United States, Lifelines COVID-19 cohort (Lifelines population cohort and the Lifelines NEXT birth cohort in the Northern part of the Netherlands) and the Netherlands twin register (NTR) D1-Predicted self reported COVID-19, B2-Hospitalised COVID-19, C1-covid vs. lab/self-reported negative, C2- covid vs. population
Population origin	Northern, Western Europe and USA
Case population size	1865
Control population size	29174
Comorbidities	Not specified
Mean / median age	47.5
Sex	26% male
Severity	Severe
Sample source	Nasopharyngeal swab / Whole blood
Method	Menni COVID-19 prediction model to find COVID-19 cases. GWAS performed on predicted COVID-19 case-controls. Top 20 SNPs from COVID-19 HGI D1 cohort were replicated in the C1, C2 and B1 analysis to compare predicted COVID-19 and other cohorts. Data generated with various methods from the four cohorts as follows: HumanCytoSNP Infinium (Global Screening Assay (GSA)/GSA MultiEthnic Disease Version); Perlegen-Affymetrix; Affymetrix (6.0/Axiom); Illumina (Human Quad Bead 660/Omni 1M GSA/OmniExpres
Bioinformatics	Cohort data in SAIGE format was processed in WDL workflows made available at https://github.com/covid19-hg/META_ANALYSIS. Inverse variance weighting of effects was used to account for strand-differences and allele flips in individual studies. All build 37 statistics were upgraded to 38 build and allele harmonization was performed using gnomAD 3.0 genomes before beginning the meta-analysis.
Imputation details	Generation Scotland- phasing using Shapeit v2.r873 and duohmm and imputation using the HRC.r1-1 panel. Helix-1000 Genomes Phase 3 data for imputation. Lifelines-Haplotype Reference Consortium (HRC) panel v1.1 at the Sanger imputation server. NTR- data was phased using Eagle and then imputed to 1000 Genomes and Topmed using Minimac
Limitations	1. Predictive COVID-19 training data might not be fully representative of the whole spectrum of COVID-19, 2. Symptoms may overlap with other diseases, predicted cases may be falsely identified 3. Prevalence of COVID-19 might be different among different populations and cohorts

COHG-SA

Population