Document Type
Article
Source of Publication
BMC Medical Genomics
Publication Date
7-26-2017
Abstract
© 2017 The Author(s). Background: Edit distance is a well established metric to quantify how dissimilar two strings are by counting the minimum number of operations required to transform one string into the other. It is utilized in the domain of human genomic sequence similarity as it captures the requirements and leads to a better diagnosis of diseases. However, in addition to the computational complexity due to the large genomic sequence length, the privacy of these sequences are highly important. As these genomic sequences are unique and can identify an individual, these cannot be shared in a plaintext. Methods: In this paper, we propose two different approximation methods to securely compute the edit distance among genomic sequences. We use shingling, private set intersection methods, the banded alignment algorithm, and garbled circuits to implement these methods. We experimentally evaluate these methods and discuss both advantages and limitations. Results: Experimental results show that our first approximation method is fast and achieves similar accuracy compared to existing techniques. However, for longer genomic sequences, both the existing techniques and our proposed first method are unable to achieve a good accuracy. On the other hand, our second approximation method is able to achieve higher accuracy on such datasets. However, the second method is relatively slower than the first proposed method. Conclusion: The proposed algorithms are generally accurate, time-efficient and can be applied individually and jointly as they have complimentary properties (runtime vs. accuracy) on different types of datasets.
DOI Link
ISSN
Publisher
BioMed Central Ltd.
Volume
10
First Page
55
Last Page
67
Disciplines
Computer Sciences | Medicine and Health Sciences
Keywords
Edit distance approximation on genomic data, Genomic sequence similarity, Privacy of genomic data, Secure edit distance, Secure genomic sequence similarity
Scopus ID
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
Al Aziz, Md Momin; Alhadidi, Dima; and Mohammed, Noman, "Secure approximation of edit distance on genomic data" (2017). All Works. 3041.
https://zuscholars.zu.ac.ae/works/3041
Indexed in Scopus
yes
Open Access
yes
Open Access Type
Gold: This publication is openly available in an open access journal/series