Protecting File Transfers Against Silent Data Corruption with Robust End-to-End Integrity Verification

Loading...
Thumbnail Image

Authors

Charyyev, Batyr

Issue Date

2019

Type

Thesis

Language

Keywords

Data transfer , End to end Integrity verification , High Performance Computing , Silent data corruptions , Undetected disk errors

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

Scientific applications generate large volumes of data that often needs to be moved between geographically distributed sites which has led to a significant increase in data transfer rates. As an increasing number of scientific applications are becoming sensitive to silent data corruption, end-to-end integrity verification has been proposed. End-to-end integrity verification minimizes the likelihood of silent data corruption by comparing checksum of files at the source and the destination using secure hash algorithms such as MD5 and SHA1. However, existing implementations of end-to-end data integrity verification for file transfers compute checksum of files based on memory copy (i.e. cache) of the file, thus fall short to detect silent disk errors that take place while writing cached data to disk. In this thesis, we inspect the robustness of existing end-to-end integrity verification approaches against silent data corruption and propose a Robust Integrity Verification Algorithm (i.e. RIVA) to enhance data integrity. Extensive experiments show that unlike existing solutions, RIVA is able to detect silent disk corruptions by invalidating file contents in page cache and reading them directly from disk. Since RIVA clears page cache and reads file contents directly from the disk, it incurs delay to execution time. However, by running transfer, cache invalidation, and checksum operations concurrently, RIVA is able to keep its overhead below 15% in most cases compared to the state-of-the-art solutions in exchange of more secure file transfers. We also introduce a novel fault injection mechanism to assesses the robustness of RIVA against undetected disk errors by altering file content on the disk. Finally, we present dynamic parallelism to adjust the number of transfer and checksum threads to overcome performance bottlenecks. The results show that dynamic parallelism lead to more than 5x increase in RIVA’s speed.

Description

Citation

Publisher

License

Journal

Volume

Issue

PubMed ID

DOI

ISSN

EISSN