Abstract

Automatic failure-path inference (AFPI) is an application-generic, automatic technique for dynamically discovering the failure dependency graphs of componentized Internet applications. AFPI's first phase is invasive, and relies on controlled fault injection to determine failure propagation; this phase requires no a priori knowledge of the application and takes on the order of hours to run. Once the system is deployed in production, the second, noninvasive phase of AFPI passively monitors the system, and updates the dependency graph as new failures are observed. This process is a good match for the perpetually-evolving software found in Internet systems; since no performance overhead is introduced, AFPI is feasible for live systems. We applied AFPI to J2EE and tested it by injecting Java exceptions into an e-commerce application and an online auction service. The resulting graphs of exception propagation are more detailed and accurate than what could be derived by time-consuming manual inspection or analysis of readily-available static application descriptions

Details

Actions