IJBBB 2014 Vol.4(5): 355-360 ISSN: 2010-3638
DOI: 10.7763/IJBBB.2014.V4.370
DOI: 10.7763/IJBBB.2014.V4.370
Identification of Mislabeled Samples and Sample Mix-ups in Genotype Data Using Barcode Genotypes
Christian Theil Have, Emil Vincent Appel, Niels Grarup, Torben Hansen, and Jette Bork-Jensen
Abstract—Undetected mislabeled samples may affect the results of genotype studies, particular when rare genetic variants are investigated. Mislabeled samples are not often detected during quality control and if they are detected, they are normally discarded due to a lack of a reliable method to recover the correct labels.
Here we describe a statistical method which given a few extra independent genotypes (barcode genotypes) detects mislabeled samples and recovers the correct labels for sample mix-ups. We have implemented the method in a program (named Wunderbar) and we evaluate the reliability of the method on simulated data. We find that even with only a small number of barcode genotypes, Wunderbar is capable of identifying mislabeled samples and sample mix-ups with high sensitivity and specificity, even with a high genotyping error rate and even in the presence of dependency between the individual barcode genotypes.
To detect mislabeled samples, we calculate the probability that the discordance between genotypes in the data and in the independent genotypes can be attributed to random (non-mislabeling) genotyping errors. To identify mix-ups, we calculate the probability of identifying the set of identical genotypes between sample x and sample y by chance. Based on this we calculate a mix-up confidence score with penalization for introducing mismatches in the proposed new label and adjustment for independency among the genotypes. This confidence score is used to identify probable mix-ups.
Index Terms—Barcoding, genetics, quality control.
The authors are with the Novo Nordisk Foundation Center for Basic Metabolic Research, Section of Metabolic Genetics, Copenhagen University, Denmark (e-mail: c.have@sund.ku.dk, vincent@sund.ku.dk, ngrp@sund.ku.dk, torben.hansen@sund.ku.dk, jbj@sund.ku.dk).
Here we describe a statistical method which given a few extra independent genotypes (barcode genotypes) detects mislabeled samples and recovers the correct labels for sample mix-ups. We have implemented the method in a program (named Wunderbar) and we evaluate the reliability of the method on simulated data. We find that even with only a small number of barcode genotypes, Wunderbar is capable of identifying mislabeled samples and sample mix-ups with high sensitivity and specificity, even with a high genotyping error rate and even in the presence of dependency between the individual barcode genotypes.
To detect mislabeled samples, we calculate the probability that the discordance between genotypes in the data and in the independent genotypes can be attributed to random (non-mislabeling) genotyping errors. To identify mix-ups, we calculate the probability of identifying the set of identical genotypes between sample x and sample y by chance. Based on this we calculate a mix-up confidence score with penalization for introducing mismatches in the proposed new label and adjustment for independency among the genotypes. This confidence score is used to identify probable mix-ups.
Index Terms—Barcoding, genetics, quality control.
The authors are with the Novo Nordisk Foundation Center for Basic Metabolic Research, Section of Metabolic Genetics, Copenhagen University, Denmark (e-mail: c.have@sund.ku.dk, vincent@sund.ku.dk, ngrp@sund.ku.dk, torben.hansen@sund.ku.dk, jbj@sund.ku.dk).
Cite: Christian Theil Have, Emil Vincent Appel, Niels Grarup, Torben Hansen, and Jette Bork-Jensen, "Identification of Mislabeled Samples and Sample Mix-ups in Genotype Data Using Barcode Genotypes," International Journal of Bioscience, Biochemistry and Bioinformatics vol. 4, no. 5, pp. 355-360, 2014.
General Information
ISSN: 2010-3638 (Online)
Abbreviated Title: Int. J. Biosci. Biochem. Bioinform.
Frequency: Quarterly
DOI: 10.17706/IJBBB
Editor-in-Chief: Prof. Ebtisam Heikal
Abstracting/ Indexing: Electronic Journals Library, Chemical Abstracts Services (CAS), Engineering & Technology Digital Library, Google Scholar, and ProQuest.
E-mail: ijbbb@iap.org
-
Sep 29, 2022 News!
IJBBB Vol 12, No 4 has been published online! [Click]
-
Jun 23, 2022 News!
News | IJBBB Vol 12, No 3 has been published online! [Click]
-
Dec 20, 2021 News!
IJBBB Vol 12, No 1 has been published online! [Click]
-
Sep 23, 2021 News!
IJBBB Vol 11, No 4 has been published online! [Click]
-
Jun 25, 2021 News!
IJBBB Vol 11, No 3 has been published online! [Click]
- Read more>>