Monitoring von verteilten Systemen

This work has been finished in June 2000.

Enhancements to computer systems and networking technologies have lead to an increasing decentralization of services and data which implies the usage of common resources (e.g. networks) and more parallelization of processes in application systems to a greater extent. One of the advantages of these systems is that they can be better adapted to physical, organizational and software-technological requirements.

Developers of distributed systems are confronted in the construction process with problems which do not appear or only appear in a restricted form in the development of non-distributed systems. Examples of such problems are: mapping distributed resources and/or processes onto a set of computers; paralleling process steps; identification and handling of faults, etc. Monitoring and dynamic program analysis should support software development of distributed systems. The support should occur in such a way that system requirements can be met, the above mentioned problems can be solved, and distributed systems can be observed.

The work is divided into seven chapters. Besides the chapters introduction and conclusion, there are five chapters with the following topics: terms of distributed systems; distributed systems requirements and problems during the development of such systems; monitoring concepts; existing monitoring systems; and the monitoring system Orwell.

The second chapter describes important terms of distributed systems. Starting with the term »system« , a characterization of distributed system will be made, to discuss important properties of these systems. Based on the properties and terms, we discuss important concepts and mechanisms, which play an important role in the construction of distributed systems.

The third chapter describes on the basis of the above mentioned concepts and mechanisms, important aspects of distributed systems from the view of software engineering. Such aspects are reliability, fault tolerance, efficiency, scalability and security. After that, we discuss problems of the development process of distributed systems. Additionally, object-oriented systems will be discussed. Furthermore, we will discuss aspects like time and causality, which are important for monitoring distributed systems.

The fourth chapter consists of a discussion of basic monitoring-system requirements. The main part of the chapter is about the general monitoring model of Mansouri-Samani[95] and its concepts, activities, and strategies for the observation and analysis of distributed systems.

The fifth chapter consists of a presentation of selected monitoring research approaches and tools.

The sixth chapter describes the monitoring system Orwell. The architecture of the distributed and object-oriented analysis environment and its functionality will be discussed. Finally it contains a comparison and evaluation with other approaches.

This work concludes with a summary of results as well as a discussion of future developments and possible research directions.