Message passing applications on a distributed computer require tools to integrate state saving and rollback, to support dynamic program reconfiguration, fault tolerance and others. The paper presents the results of integrating two independently developed tools that combine flexibility and portability. The User-Triggered CheckPointing (UTCP) provides checkpointing and recovery while relying on the programmer to indicate the position of the recovery line and the contents of the checkpoint. The tool PVMsnap provides an extension to PVM to obtain a consistent cut of the message passing application. The combination of both tools results in a portable and flexible solution for fault tolerance which can be adapted to the applications' need
A flexible state-saving library for message-passing systems
GIANUZZI, VITTORIA
1998-01-01
Abstract
Message passing applications on a distributed computer require tools to integrate state saving and rollback, to support dynamic program reconfiguration, fault tolerance and others. The paper presents the results of integrating two independently developed tools that combine flexibility and portability. The User-Triggered CheckPointing (UTCP) provides checkpointing and recovery while relying on the programmer to indicate the position of the recovery line and the contents of the checkpoint. The tool PVMsnap provides an extension to PVM to obtain a consistent cut of the message passing application. The combination of both tools results in a portable and flexible solution for fault tolerance which can be adapted to the applications' needI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.