IJBBB 2014 Vol.4(2): 116-120 ISSN: 2010-3638
DOI: 10.7763/IJBBB.2014.V4.322
DOI: 10.7763/IJBBB.2014.V4.322
A Novel Method for Detecting Contaminated Sample Based on Illumina Sequencing Data
Zheng Huang, Qibin Li, Wei Jin, Qijun Liao, and Xiao Sun
Abstract—Illumina sequencing platform is widely used in genetics research. Due to the complex andlong-term library construction and DNA sequencing, samples can be contaminated with different sources, which can lead to false-positive SNP calling. To identify the contaminated samples, we built a model of mappability score to quantitatively measurethe accessibility of different parts ofhuman genome. By characterizing the genomic region with high probability of uniqueness and counting the discordant reads against genotypes on the unique region, we could detect outliers as the contaminated samples in a population scale. Totest the effectiveness of our method, we manually mixed the sequencing reads of two clean samples. With the prior knowledge of mixture rate, we concluded that ourmethodis quite sensitive for female samples contaminated even slightly by male samples, accurate for male samples with moderate contamination by female samples and powerful for severe cross-individual contamination with the same gender. This method is easily understood but fairly effective in population-scale sample quality control.
Index Terms—Contamination, mappability score, sample quality control, unique region.
Zheng Huang and Xiao Sun are with the State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, CO 210096, China (e-mail: huangzheng@ genomics.cn, xsun@seu.edu.cn).
Qibin Li, Wei Jin, and Qijun Liao are with BGI-Shenzhen, Shenzhen, CO 518083, China (e-mail: liqb@genomics.cn, jinwei@genomics.cn, liaoqijun@genomics.cn).
Index Terms—Contamination, mappability score, sample quality control, unique region.
Zheng Huang and Xiao Sun are with the State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, CO 210096, China (e-mail: huangzheng@ genomics.cn, xsun@seu.edu.cn).
Qibin Li, Wei Jin, and Qijun Liao are with BGI-Shenzhen, Shenzhen, CO 518083, China (e-mail: liqb@genomics.cn, jinwei@genomics.cn, liaoqijun@genomics.cn).
Cite: Zheng Huang, Qibin Li, Wei Jin, Qijun Liao, and Xiao Sun, "A Novel Method for Detecting Contaminated Sample Based on Illumina Sequencing Data," International Journal of Bioscience, Biochemistry and Bioinformatics vol. 4, no. 2, pp. 116-120, 2014.
General Information
ISSN: 2010-3638 (Online)
Abbreviated Title: Int. J. Biosci. Biochem. Bioinform.
Frequency: Quarterly
DOI: 10.17706/IJBBB
Editor-in-Chief: Prof. Ebtisam Heikal
Abstracting/ Indexing: Electronic Journals Library, Chemical Abstracts Services (CAS), Engineering & Technology Digital Library, Google Scholar, and ProQuest.
E-mail: ijbbb@iap.org
-
Sep 29, 2022 News!
IJBBB Vol 12, No 4 has been published online! [Click]
-
Jun 23, 2022 News!
News | IJBBB Vol 12, No 3 has been published online! [Click]
-
Dec 20, 2021 News!
IJBBB Vol 12, No 1 has been published online! [Click]
-
Sep 23, 2021 News!
IJBBB Vol 11, No 4 has been published online! [Click]
-
Jun 25, 2021 News!
IJBBB Vol 11, No 3 has been published online! [Click]
- Read more>>