Pre-Imputation Quality Control (QC)

Detecting and correcting issues such as genotyping errors, sample handling errors, population stratification etc is important in GWAS. The preimp_qc module addresses these issues and cleans (QC) your data. Below is a flow diagram of the filters applied when QC’ing input data:

Examples

You can run pre-imputation qc using the preimp_qc module: (1) via the command line; or (2) inside a python script

Command line

preimp_qc --dirname data/ --basename sim_sim2a_eur_sa_merge.miss --input-type plink

Inside a python script

import gwaspy.preimp_qc as qc
qc.preimp_qc.preimp_qc(input_type="plink", dirname="data/", basename="sim_sim2a_eur_sa_merge.miss")

Arguments and options

Argument	Description
`--dirname`	Path to where the data is
`--basename`	Data basename
`--input-type`	Input type. Options: [`hail`, `plink`, `vcf`]
`--export-type`	Export type. Options: [`hail`, `plink`, `vcf`]
`--out-dir`	Directory path to where output files are going to be saved
`--annotations`	Annotations file to be used for annotating sample with information such as Sex and Phenotype
`--reference`	Reference genome build. Default is GRCh38. Options: [`GRCh37`, `GRCh38`]
`--report`	Generate a QC PDF report or not. Default is True
`--pre-geno`	include only SNPs with missing-rate < NUM (before ID filter), important for post merge of multiple platforms
`--mind`	include only IDs with missing-rate < NUM
`--fhet-aut`	include only IDs within NUM < FHET < NUM
`--fstat-y`	include only female IDs with fhet < NUM
`--fstat-x`	include only male IDs with fhet > NUM
`--geno`	include only SNPs with missing-rate < NUM
`--midi`	include only SNPs with missing-rate-difference (case/control) < NUM
`--withpna`	include monomorphic (invariant) SNPs
`--maf`	include only SNPs with MAF >= NUM
`--hwe-th-con`	HWE_controls < NUM
`--hwe-th-cas`	HWE_cases < NUM

Output(s)

QC’ed file(s) i.e. file with all the variants and/or samples that fail QC filters removed
A detailed PDF QC report including pre- and post-QC variant/sample counts, figures such as Manhattan and QQ plots etc.