FAULT RESOLUTION SYSTEM FOR INTER-CLOUD ENVIRONMENT

Authors

  • HA MANH TRAN Computer Science and Engineering International University-Vietnam National University Ho Chi Minh City, Vietnam
  • SYNH VIET UYEN HA Computer Science and Engineering International University-Vietnam National University Ho Chi Minh City, Vietnam
  • HUYNH TU DANG Computer Science and Engineering International University-Vietnam National University Ho Chi Minh City, Vietnam
  • KHOA VAN HUYNH Network and Services Management VNPT Dong Thap-VNPT Group Dong Thap, Vietnam

Keywords:

Cloud Computing, Fault Resolution, Peer-to-Peer Network, Inter-Cloud Environment, Bug Tracking System

Abstract

Fault resolution in communication networks and distributed systems is a complicated process that demands the involvement of system administrators and supporting systems in monitoring, diagnosing, resolving and recording faults. This process becomes more challenging in inter-cloud environment where multiple cloud systems coordinate in provisioning applications and services. In this context, we propose a fault resolution system that assists system administrators in resolving faults in inter-cloud environment. The proposed system is characterized by the capability of sharing and searching fault knowledge resources among cloud systems for fault resolution. It uses a peer-to-peer network of fault managers that provide facilities to monitor faults occurring in cloud systems and search similar faults with solutions occurring in other cloud systems. We have implemented several components of the proposed system including fault monitor, fault searcher and fault updater. We have also experimented and evaluated the prototyping system on fault databases obtained from several fault sources, such as bug tracking systems, online discussion forums and vendor knowledge bases.

 

Downloads

Download data is not yet available.

References

R. Buyya, R. Ranjan, and R. N. Calheiros (2010), Intercloud: Utility-Oriented Federation of

Cloud Computing Environments for Scaling of Application Services, In Proc. 10th International

Conference on Algorithms and Architectures for Parallel Processing (ICA3PP'10), pp 13-31,

Heidelberg, Germany, Springer-Verlag.

Apache Hadoop Project (2005), http://hadoop.apache.org/, last access in July 2013.

OpenStack Cloud Software (2010), http://www.openstack.org/, last access in July 2013.

M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A.

Rabkin, I. Stoica, and M. Zaharia (2010), A View of Cloud Computing, ACM Communications,

Vol. 53, No. 4, pp 50-58.

R. Jhawar, V. Piuri, and M. Santambrogio (2012), Fault Tolerance Management in Cloud

Computing: A System-Level Perspective, Systems Journal, Vol. 7, No. 2.

R. Dudko, A. Sharma, and J. Tedesco (2012), Effective Failure Prediction in Hadoop Clusters,

Technical Report, University of Illinois.

A. S. Thanamani (2011), A Survey on Failure Prediction Methods, International Journal of

Engineering Science and Technology (IJEST), Vol. 3, No. 2.

N. Kuromatsu, M. Okita, and K. Hagihara (2013), Evolving Fault-Tolerance in Hadoop with

Robust Auto-Recovering JobTracker, Bulletin of Networking, Computing, Systems, and Software,

Vol. 2, No. 1.

E. Garduno, S. P. Kavulya, J. Tan, R. Gandhi, and P. Narasimhan (2012), Theia: Visual

Signatures for Problem Diagnosis in Large Hadoop Clusters, In Proc. 26th International

Conference on Large Installation System Administration: Strategies, Tools, and Techniques

(LISA'12), pp 33-42, Berkeley, USA, USENIX Association.

J. Tan, S. Kavulya, R. Gandhi, and P. Narasimhan (2010), Visual, Log-based Causal Tracing for

Performance Debugging of MapReduce Systems, In Proc. 2010 IEEE 30th International

Conference on Distributed Computing Systems (ICDCS’10), pp 795-806, Washington, USA,

IEEE Computer Society.

H. M. Tran and J. Schönwälder (2007), Fault Representation in Case-Based Reasoning, In Proc.

th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management, pp

-61, Springer-Verlag.

H. M. Tran and J. Schönwälder (2007), Heuristic Search using a Feedback Scheme in

Unstructured Peer-to-Peer Networks, In Proc. 5th International Workshop on Databases,

Information Systems and Peer-to-Peer Computing, Springer-Verlag.

H. M. Tran and J. Schönwälder (2008), Fault Resolution in Case-Based Reasoning, In Proc. 10th

Pacific Rim International Conference on Artificial Intelligence (PRICAI ’08), pp 417-429,

Springer-Verlag.

H. M. Tran, G. Chulkov, and J. Schönwälder (2008), Crawling Bug Tracker for Semantic Bug

Search, In Proc. 19th IFIP/IEEE International Workshop on Distributed Systems: Operations and

Management (DSOM ’08), pp 55-66, Springer-Verlag.

H. M. Tran and J. Schönwälder (2011), Evaluation of the Distributed Case-Based Reasoning

System on a Distributed Computing Platform, In Proc. 7th International Symposium on Frontiers

of Information Systems and Network Applications (FINA 2011), pp 53-58.

A. Aamodt and E. Plaza (1994), Case-Based Reasoning: Foundational Issues, Methodological

Variations, and System Approaches, AI Communications, Vol. 7, No. 1, pp 39-59.

D. Hausheer and C. Morariu (2008), Distributed Test-Lab: EMANICSLab, The 2nd International

Summer School on Network and Service Management (ISSNSM ’08), University of Zurich,

Switzerland.

M. Uddin, R. Stadler, and A. Clemm (2013), A Query Language for Network Search, In Proc.

th IFIP/IEEE International Symposium on Integrated Network Management (IM ’13), IEEE

Computer Society.

Ganglia Monitoring System (2000), http://ganglia.info/, last access in July 2013.

The Industry Standard In IT Infrastructure Monitoring (1999), http://www.nagios.org/, last access

in July 2013.

Apache Flume (2009), http://flume.apache.org/, last access in Jan. 2014.

Eucalyptus Open Source AWS Compatible Private Clouds (2008), http://www.eucalyptus.com/,

last access in Jan. 2014.

OpenNebula Flexible Enterprise Cloud Made Simple (2008), http://opennebula.org/, last access in

Jan. 2014.

S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Schenker (2001), A Scalable Content

Addressable Network, In Proc. Conference on Applications, Technologies, Architectures, and

Protocols for Computer Communications (SIGCOMM ’01), pp 161-172, New York, USA, ACM

Press.

I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan (2001), Chord: A Scalable

Peer-to-Peer Lookup Service for Internet Applications, In Proc. Conference on Applications,

Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM ’01), pp

-160, New York, USA, ACM Press.

P. Maymounkov and D. Mazières (2002), Kademlia: A Peer-to-Peer Information System Based

on the XOR Metric, In Proc. 1st International Workshop on Peer-to-Peer Systems (IPTPS ’01), pp

-65, London, UK, Springer-Verlag.

Gnutella Protocol Specification version 0.4 (2001), http://rfc-gnutella.sourceforge.net/

developer/stable/index.html, last access in Mar. 2013.

I. Clarke, O. Sandberg, B. Wiley, and T. W. Hong (2000), Freenet: A Distributed Anonymous

Information Storage and Retrieval System, In Proc. International Workshop on Design Issues in

Anonymity and Unobservability, pp 46-66, Heidelberg, Germany, Springer-Verlag.

B. Cohen (2003), Incentives Build Robustness in Bittorrent, In Proc. 1st Workshop on Economics

of Peer-to-Peer Systems.

ITU-T (1995), Trouble Management Function for ITU-T Applications, X.790 Recommendation.

D. Johnson (1992), NOC Internal Integrated Trouble Ticket System Functional Specification

Wishlist, RFC 1297.

D. Bloom (1994), Selection Criterion and Implementation of a Trouble Tracking System: What’s

in a Paradigm?, In Proc. 22nd Annual ACM SIGUCCS Conference on User Services (SIGUCCS

’94), pp 201-203, New York, USA, ACM Press.

H. M. Tran, S. T. Le, S. V. U. Ha, and T. K. Huynh (2013), Software bug ontology supporting

bug search on peer-to-peer networks, In Proc. 6th International KES Conference on Agents and

Multi-agent Systems Technologies and Applications (AMSTA ’13), IOS Press.

B. Yang and H. Garcia-Molina (2003), Designing a super-peer network, In Proc. 19th

International Conference on Data Engineering (ICDE’03), pp 49, Los Alamitos, USA, IEEE

Computer Society

Downloads

Published

2014-02-26

How to Cite

TRAN, H. M. ., UYEN HA, S. V., DANG, H. T. ., & HUYNH, K. V. . (2014). FAULT RESOLUTION SYSTEM FOR INTER-CLOUD ENVIRONMENT. Journal of Mobile Multimedia, 10(1-2), 016–029. Retrieved from https://journals.riverpublishers.com/index.php/JMM/article/view/4591

Issue

Section

Articles