Constraint Based System-Level Diagnosis of Multiprocessors

Altmann, J.; Pataricza, A.; Bartha, T.; Urbán, P.

doi:10.1007/3-540-61772-8_51

Altmann, J.; Pataricza, A.; Bartha, T.; Urbán, P.

1996

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Massively parallel multiprocessors induce new requirements for system-level fault diagnosis, like handling a huge number of processing elements in an inhomogeneous system. Traditional diagnostic models (like PMC, BGM, etc.) are insufficient to fulfill all of these requirements. This paper presents a novel modelling technique, based on a special area of artificial intelligence (AI) methods: constraint satisfaction (CS). The constraint based approach is able to handle functional faults in a similar way to the Russel-Kime model. Moreover, it can use multiple-valued logic to deal with system components having multiple fault modes. The resolution of the produced models can be adjusted to fit the actual diagnostic goal. Consequently, constrint based methods are applicable to a much wider range of multiprocessor architectures than earlier models. The basic problem of system-level diagnosis, syndrome decoding, can be easily transformed into a constraint satisfaction problem (CSP). Thus, the diagnosis algorithm can be derived from the related constraint solving algorithm. Different abstraction leveles can be used for the various diagnosis resolutions, employing the same methodology. As examples, two algorithms are described in the paper; both of them is intended for the Parsytec GCel massively parallel system. The centralized method uses a more elaborate system model, and provides detailed diagnostic information, suitable for off-line evaluation. The distributed method makes fast decisions for reconfiguration control, using a simplified model. Keywords system-level self-diagnosis, massively parallel computing systems, constraint satisfaction, diagnostic models, centralized and distributed diagnostic algorithms.

Details

Title Constraint Based System-Level Diagnosis of Multiprocessors

Author(s) Altmann, J. ; Pataricza, A. ; Bartha, T. ; Urbán, P.

Published in Dependable Computing — EDCC-2

Editor(s)

Hlawiczka, A. ; Silva, J.G.S. ; Simoncini, L.

Series Lecture Notes in Computer Science, 1150

Pages 403-414

Conference 2nd IEEE European Dependable Computing Conference (EDCC-2), Taormina, Italy, October 2–4, 1996

Date 1996

Publisher Springer-Verlag

DOI https://doi.org/10.1007/3-540-61772-8_51

Laboratories LSR

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IC Archives > LSR - Distributed Systems Laboratory
Conference Papers
Work produced at EPFL
Published

Record creation date 2005-05-20

Actions

Preview

Select file: