Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Illumina human exome genotyping array clustering and quality control

Abstract

With the rise of high-throughput sequencing technology, traditional genotyping arrays are gradually being replaced by sequencing technology. Against this trend, Illumina has introduced an exome genotyping array that provides an alternative approach to sequencing, especially suited to large-scale genome-wide association studies (GWASs). The exome genotyping array targets the exome plus rare single-nucleotide polymorphisms (SNPs), a feature that makes it substantially more challenging to process than previous genotyping arrays that targeted common SNPs. Researchers have struggled to generate a reliable protocol for processing exome genotyping array data. The Vanderbilt Epidemiology Center, in cooperation with Vanderbilt Technologies for Advanced Genomics Analysis and Research Design (VANGARD), has developed a thorough exome chip–processing protocol. The protocol was developed during the processing of several large exome genotyping array-based studies, which included over 60,000 participants combined. The protocol described herein contains detailed clustering techniques and robust quality control procedures, and it can benefit future exome genotyping array–based GWASs.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Examples of low-quality clusters for the haploid genomes described in Steps 7–22.
Figure 2: Example clusters relevant to Steps 23–25.
Figure 3: General distribution of the basic quality control (QC) parameters.
Figure 4: Example clusters relevant to Steps 26 and 27.
Figure 5: Example clusters relevant to Steps 28–30.
Figure 6: Example clusters relevant to Step 34A.
Figure 7: Example clusters relevant to Steps 38 and 39 of the PROCEDURE.
Figure 8: Example clusters relevant to Steps 40–43.
Figure 9: Distribution of HWE and heterozygosity rates relevant to Steps 46–50.
Figure 10: Example clusters relevant to the PLINK-related Steps 53–56.
Figure 11: Example clusters relevant to the PLINK-related Steps 53–56.
Figure 12: Example clusters relevant to the PLINK-related Step 57.

Similar content being viewed by others

References

  1. Samuels, D.C. et al. Finding the lost treasures in exome sequencing data. Trends Genet. 29, 593–599 (2013).

    Article  CAS  Google Scholar 

  2. Guo, Y. et al. Exome sequencing generates high quality data in non-target regions. BMC Genomics 13, 194 (2012).

    Article  CAS  Google Scholar 

  3. Abecasis Lab. Exome Chip Design Wiki Site (http://genome.sph.umich.edu/wiki/Exome_Chip_Design).

  4. Szatkiewicz, J.P. et al. Detecting large copy number variants using exome genotyping arrays in a large Swedish schizophrenia sample. Mol. Psychiatry 18, 1178–1184 (2013).

    Article  CAS  Google Scholar 

  5. Huyghe, J.R. et al. Exome array analysis identifies new loci and low-frequency variants influencing insulin processing and secretion. Nat. Genet. 45, 197–201 (2013).

    Article  CAS  Google Scholar 

  6. McElroy, J.J. et al. Maternal coding variants in complement receptor 1 and spontaneous idiopathic preterm birth. Hum. Genet. 132, 935–942 (2013).

    Article  CAS  Google Scholar 

  7. Moura, R. et al. Exome analysis of HIV patients submitted to dendritic cells therapeutic vaccine reveals an association of CNOT1 gene with response to the treatment. J. Int. AIDS Soc. 17, 18938 (2014).

    Article  Google Scholar 

  8. Seddon, J.M. et al. Rare variants in CFI, C3 and C9 are associated with high risk of advanced age-related macular degeneration. Nat. Genet. 45, 1366–1370 (2013).

    Article  CAS  Google Scholar 

  9. Mosley, J.D. et al. Mechanistic phenotypes: an aggregative phenotyping strategy to identify disease mechanisms using GWAS data. PLoS ONE 8, e81503 (2013).

    Article  Google Scholar 

  10. Psaty, B.M. et al. Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium: design of prospective meta-analyses of genome-wide association studies from 5 cohorts. Circ. Cardiovasc Genet. 2, 73–80 (2009).

    Article  Google Scholar 

  11. Grove, M.L. et al. Best practices and joint calling of the HumanExome BeadChip: the CHARGE Consortium. PLoS ONE 8, e68095 (2013).

    Article  CAS  Google Scholar 

  12. Perreault, L.P. et al. Comparison of genotype clustering tools with rare variants. BMC Bioinform. 15, 52 (2014).

    Article  Google Scholar 

  13. Ritchie, M.E., Liu, R., Carvalho, B.S. & Irizarry, R.A. Comparing genotyping algorithms for Illumina's Infinium whole-genome SNP BeadChips. BMC Bioinform. 12, 68 (2011).

    Article  Google Scholar 

  14. Nelson, S.C., Doheny, K.F., Laurie, C.C. & Mirel, D.B. Is 'forward' the same as 'plus'?...and other adventures in SNP allele nomenclature. Trends Genet. 28, 361–363 (2012).

    Article  CAS  Google Scholar 

  15. Illumina. “TOP/BOT” Strand and “A/B” Allele (http://res.illumina.com/documents/products/technotes/technote_topbot.pdf).

  16. Zhang, Y. et al. Rare coding variants and breast cancer risk: evaluation of susceptibility loci identified in genome-wide association studies. Cancer Epidemiol. Biomarkers Prev. 23, 622–628 (2014).

    Article  CAS  Google Scholar 

  17. Abecasis, G.R. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

    Article  Google Scholar 

  18. The International HapMap Project. Nature 426, 789–796 (2003).

  19. Ilumina. Infinium Genotyping Data Analysis (http://res.illumina.com/documents/products/technotes/technote_infinium_genotyping_data_analysis.pdf).

  20. University Medical Center. BioVU (https://victr.vanderbilt.edu/pub/biovu/).

  21. Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

    Article  CAS  Google Scholar 

  22. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  Google Scholar 

  23. Goldstein, J.I. et al. zCall: a rare variant caller for array-based genotyping: genetics and population analysis. Bioinformatics 28, 2543–2545 (2012).

    Article  CAS  Google Scholar 

  24. Dunnett, C.W. A multiple comparison procedure for comparing several treatments with a control. J. Am. Stat. Assoc. 50, 1096–1121 (1955).

    Article  Google Scholar 

  25. Wittke-Thompson, J.K., Pluzhnikov, A. & Cox, N.J. Rational inferences about departures from Hardy-Weinberg equilibrium. Am. J. Hum. Genet. 76, 967–986 (2005).

    Article  CAS  Google Scholar 

  26. Kruskal, W.H. & Wallis, W.A. Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47, 583–621 (1952).

    Article  Google Scholar 

  27. Conover, W.J., Johnson, M.E. & Johnson, M.M. A comparative study of tests for homogeneity of variances, with applications to the outer continental shelf bidding data. Technometrics 23, 351–361 (1981).

    Article  Google Scholar 

  28. Guo, Y. et al. Multi-perspective quality control of Illumina exome sequencing data using QC3. Genomics 103, 323–328 (2014).

    Article  CAS  Google Scholar 

  29. Perreault, L.P. et al. Comparison of genotype clustering tools with rare variants. BMC Bioinform. 15, 52 (2014).

    Article  Google Scholar 

  30. Guthridge, J.M. et al. Two functional lupus-associated BLK promoter variants control cell-type– and developmental-stage–specific transcription. Am. J. Hum. Genet. 94, 586–598 (2014).

    Article  CAS  Google Scholar 

  31. Wu, C., DeWan, A., Hoh, J. & Wang, Z. A comparison of association methods correcting for population stratification in case-control studies. Ann. Hum. Genet. 75, 418–427 (2011).

    Article  Google Scholar 

  32. Gomes, I. et al. Hardy-Weinberg quality control. Ann. Hum. Genet. 63, 535–538 (1999).

    Article  CAS  Google Scholar 

  33. Hosking, L. et al. Detection of genotyping errors by Hardy-Weinberg equilibrium testing. Eur. J. Hum. Genet. 12, 395–399 (2004).

    Article  CAS  Google Scholar 

  34. Hong, H. et al. Assessing batch effects of genotype-calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples. BMC Bioinform. 9 (suppl. 9), S17 (2008).

    Article  Google Scholar 

Download references

Acknowledgements

Development of this protocol is supported by Cancer Center Support Grant (CCSG) nos. (P30 CA068485) and R01CA158473. We thank M. Bjoring for editorial support.

Author information

Authors and Affiliations

Authors

Contributions

Y.G. wrote the manuscript and designed the protocol with J.L.; S.Z., J.H., H.W., Q.S. and X.Z. contributed to script writing and generated the figures and tables; Y.S. and D.C.S. provided intellectual contributions to the overall design of the protocol.

Corresponding author

Correspondence to Yan Guo.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Example clusters relevant to the Step 34A of the Procedure.

Cartesian plot of the same two SNPs shown in Figure 6. The x-axis denotes normalized intensity value of A allele. The y-axis denotes normalized intensity value of B allele.

Supplementary information

Supplementary Figure 1

Example clusters relevant to the Step 34A of the Procedure. (PDF 965 kb)

Supplementary Table 1

Markers distributions of major SNP arrays. (XLSX 9 kb)

Supplementary Table 2

SNPs that do not match HG19 plus strand after perform Step 24 and 25. (XLSX 12 kb)

Supplementary Table 3

HAPMAP trio samples used in the example study. (XLSX 15 kb)

Supplementary Table 4

Supplementary script list. (XLSX 10 kb)

Supplementary Table 5

Resource files used in the protocol. (XLSX 9 kb)

Supplementary Table 6

AIMs on the exome chip. (XLSX 40 kb)

Supplementary Table 7

Identical and triallelic SNPs on the exome chip. (XLSX 57 kb)

Supplementary Table 8

SNPs with different alleles between the exome chip and the 1000 Genomes Project. (XLSX 21 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, Y., He, J., Zhao, S. et al. Illumina human exome genotyping array clustering and quality control. Nat Protoc 9, 2643–2662 (2014). https://doi.org/10.1038/nprot.2014.174

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nprot.2014.174

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing