System health monitoring and evaluation using cooperative check pointing mechanism
Ozor Godwin O, Oleka Chioma V, Nwobodo Lois O
Incessant failures had been recorded in medium and large scale systems in the past. Greater percentage of the damages were during runtime and partly on design. The problem is largely attributed to poor mechanism for checking and evaluating health of the system both in design and operation. To mitigate the challenges, an effective, simple but rugged fault tolerance mechanism was proposed. Cooperative checkpointing fault tolerance mechanism was investigated and modelled to report the health status of the system and recommend a real-time response to the results of the evaluation. The theoretical results was simulated in MATLAB.