A runtime fault survival method for deployed software during production runs

Jooyoung Seo, Jihyun Park, Byoungju Choi

Research output: Contribution to journalArticlepeer-review

Abstract

Runtime memory faults during production run should be more thoroughly addressed because they severely affect system availability. This paper proposes a method for mitigating memory faults during production runs of deployed software, thereby ensuring normal system operation until patches to fix the faults are delivered. Furthermore, the method helps enhance debugging efficiency by providing accurate on-site fault information used by developers to release timely patches. The core of the method is to offer information tagging to identify runtime faults and a fault survival algorithm to provide differentiated fault mitigation according to the runtime state. We implemented ROPHE on a Linux 2.6 platform and conducted an empirical study of representative Linux applications. The results show that the average fault-handling rate among the applications is 35.75%, whereas the RemOte runtime Protection for High-risk Error (ROPHE) greatly improves capacity to an average of 91.94%. Specifically, the fault-handling rates of the applications ranged widely from 7.32% to 62.96%, while ROPHE provided fault-survival rates in the relatively narrow range of 82.35-97.44%. The experimental results show that the proposed method guarantees the same level of reliability for all applications regardless of their individual fault handling capacity.

Original languageEnglish
Pages (from-to)97-119
Number of pages23
JournalJournal of Software: Evolution and Process
Volume28
Issue number2
DOIs
StatePublished - 1 Feb 2016

Keywords

  • deployed software reliability
  • fault mitigation
  • fault survival
  • runtime memory fault

Fingerprint

Dive into the research topics of 'A runtime fault survival method for deployed software during production runs'. Together they form a unique fingerprint.

Cite this