Principal Component Analysis
Principal components analysis (PCA) can be used to detect and quantify the genetic structure of populations.
In GWASpy, the pca module can be run in 3 different ways: (1) normal PCA without a reference panel; (2) joint PCA; or (3) Projection PCA.
Arguments and options
Argument |
Description |
|---|---|
|
Path to where reference data is |
|
Reference basename |
|
Path to reference information. Tab-delimited file with sample IDs and their SuperPop labels |
|
Genome reference build. Default is GRCh38. Options: [ |
|
Type of PCA to run. Default is normal. Options: [ |
|
Path to where the data is |
|
Data basename |
|
Data input type. Options: [ |
|
include only SNPs with MAF >= NUM in PCA. Default is 0.05 |
|
include only SNPs with HWE >= NUM in PCA. Default is 1e-03 |
|
include only SNPs with call-rate > NUM. Default is 0.98 |
|
Squared correlation threshold (exclusive upper bound). Must be in the range [0.0, 1.0]. Default is 0.2 |
|
Window size in base pairs (inclusive upper bound). Default is 250000 |
|
Number of PCs to use. Default is 20 |
|
Method to use for the inference of relatedness. Default is pc_relate. Options: [ |
|
Threshold value to use in relatedness checks. Default is 0.98 |
|
Minimum probability of belonging to a given population for the population to be set. Default is 0.8 |
|
Path to where output files will be saved |
Output
A tab-delimited file with the first 20 principal components (PCs) computed and graphical visualizations of the PCs are generated.