AN EFFICIENT COORDINATED CHECKPOINTING APPROACH FOR DISTRIBUTED COMPUTING SYSTEMS WITH RELIABLE CHANNELS

Lalit K. Awasthi, Manoj Misra, and Ramesh C. Joshi

References

  1. [1] J. Tsai, S.Y. Kuo, and Y.M. Wang, Theoretical analysis forcommunication-induced checkpointing protocols with rollback-dependency trackability, IEEE Transactions on Parallel andDistributed Systems, 9(10), 1998, 963–971.
  2. [2] Y.M. Wang, Consistent global checkpoints that contain a givenset of local checkpoints, IEEE Transactions on Computers,46(4), 1997, 456–458.
  3. [3] D.R. Russel, State restoration in systems of communicatingprocesses, IEEE Transactions on Software Engineering, 6(2),1980, 183–194.
  4. [4] E.N. Elnozahy, L. Alvisi, Y.M. Wang, and D.B. Johnson,A survey of rollback-recovery protocols in message-passingsystems, ACM Computing Surveys, 34(3), 2002, 375–408.
  5. [5] E.N. Elnozahy, D.B. Johnson, and W. Zwaenepoel, The perfor-mance of consistent checkpointing, Proc. 1992 Symposium onReliable Distributed Systems, IEEE Computer Society, 1992,39–47.
  6. [6] R. Koo and S. Toueg, Checkpointing and rollback recovery fordistributed systems, IEEE Transaction on Software Engineer-ing, 13(1), 1987, 23–31.
  7. [7] G. Cao and M. Singhal, On the impossibility of min-process non-blocking checkpointing and an efficient checkpointing algorithmfor mobile computing systems, Proc. 1988 Int. Conf. on ParallelProcessing, IEEE Computer Society, 1998, 37–44.
  8. [8] G. Cao and M. Singhal, Mutable checkpoints: A new check-pointing approach for mobile computing systems, IEEE Trans-action on Parallel and Distributed Systems, 12(2), 2001, 157–172.
  9. [9] E. Gendelman, L.F. Bic, and M.B. Dillencourt, Efficient check-pointing algorithm for distributed systems implementing re-liable communication channels, Proc. Symposium on ReliableDistributed Systems, IEEE Computer Society, 1999, 290–291.
  10. [10] K.M. Chandy and L. Lamport, Distributed snapshots: Deter-mining global state of distributed systems, ACM Transactionon Computing Systems, 3(1), 1985, 63–75.
  11. [11] T.H. Lai and T.H. Yang, On distributed snapshots, InformationProcessing Letters 25(3), 1987, 53–158.
  12. [12] K. Li, J.F. Naughton, and J.S. Plank, Checkpointing multicom-puter applications, Proc. Symposium on Reliable DistributedSystems, IEEE Computer Society, 1991, 2–11.
  13. [13] L.M. Silva and J.G. Silva, Global checkpointing for distributedprograms, Proc. Symposium on Reliable Distributed Systems,IEEE Computer Society, 1992, 155–162.
  14. [14] A. Acharya and B.R. Badrinath, Checkpointing distributedapplications on mobile computers, Proc. 1994 Int. Conf. onParallel and Distributed Information Systems, IEEE ComputerSociety, 1994, 73–80.
  15. [15] R. Baldoni, J.M. H´elary, A. Mostefaoui, and M. Raynal, Acommunication-induced checkpointing protocol that ensuresrollback-dependency trackability, in Proc. 1997 Int. Sympo-sium on Fault-Tolerant-Computing Systems, IEEE ComputerSociety, 1997, 68–77.
  16. [16] D. Briatico, A. Ciuffoletti, and L. Simoncini, A distributeddomino-effect free recovery algorithm, Proc. 1984 Int. Sym-posium on Reliable Distributed Software and Database, IEEEComputer Society, 1984, 207–215.
  17. [17] J.M. H´elary, A. Mostefaoui, and M. Raynal, Communication-induced determination of consistent snapshots, Proc. Int. Sym-posium on Fault-Tolerant Computing, IEEE Computer Society,1998, 208–217.
  18. [18] L.K. Awasthi and P. Kumar, A synchronous checkpointingprotocol for mobile distributed systems: Probabilistic approachInternational Journal of Information and Computer Security,1(3), 2007, 298–314.
  19. [19] L.K. Awasthi, M. Misra, and R.C. Joshi, A weighted check-pointing protocol for mobile distributed system, InternationalJournal of ad hoc and ubiquitous Computing, 5(3), 2010,137–149.
  20. [20] L. Kumar, M. Misra, and R.C. Joshi, Low overhead optimalcheckpointing for mobile distributed systems, Proc. IEEE Int.Conf. on Data Engineering (ICDE 03), March 2003, 686–688.
  21. [21] R. Baldoni, F. Quaglia, and P. Fornara, An index-basedcheckpointing algorithm for autonomous distributed systems,9IEEE Transactions Parallel Distributed Systems, 10(2), 1999,181–192.
  22. [22] J. Tsai, An efficient index-based checkpointing protocol withconstant-size control information on messages, IEEE Transac-tion on Dependable and Secure Computing, 2(4), 2005, 287–96.
  23. [23] E.N. Elnozahy and J.S. Plank, Checkpointing for peta-scalesystems: A look into the future of practical rollback-recovery,IEEE Transaction on Dependable and Secure Computing, 1(2),2004, 97–108.

Important Links:

Go Back