Carbon Management
Problem
Most WS02 services (WSAS, ESB) have management interfaces, both JMX based and Web based management UI. However, in a real life deployments (e.g. with 10 WSAS) this approach does not scale as the administrators have to log in to each service instance separately. Therefore, the problem is to design a management dashboard that allows users to monitor and control WSO2 SOA stack via a one interface.
Bare minimum: Monitor the state of the system via a one entity, visualize the status and generate alerts. Management tasks are performed manually.
Complete Solution: Detailed monitoring, and control. The management UI can be used to bring up the system as well, and system supports automated management actions. On this setting, focus of the management system is to present, monitor and control the whole system as a one entity rather than separate components, aiding in system level control and decisions. Few possibilities are listed in section 3.
Scale: Dashboard should support managing about 100 services.
Current Status
Both WSAS and ESB expose the management functionality via JMX, and those functionalities can be grouped under monitoring and control. Similar functionality will be available with Carbon platform, enabling those features to all services.
Outline (System Management)
In theory, a management system comprises of a management loop that includes sensors (who monitor), the decision framework, and actuators (which provide control). There are lot of work under system management, and autonomic computing. The decision framework can be human, semi-automatic, or automatic. However, as long as it complete the control loop (even with human in the loop ) all those implementations are useful. .
Monitoring
- Transport statistics - how many requests received, how many failed, how many-rejected, and transport specific parameters (e.g. size of active connections, or thread pool) etc.
- Service statistics - how many requests done, successful, failed, pending, average, min, max service times.
- JVM statistics - CPU usage, memory usage, number of threads
Control
- Edit configurations
- Shutdown - graceful, or kill or maintenance mode
- Start service - if service has previously stopped. This does not support deploying and starting a new service.
- Restart a service
- Migrate a service - migrate to a different machine
Both WSAS and ESB has been integrated with Hyperic HQ, and they can be managed with JMX consoles like MC4j. Hyperic HQ provide extensive support for service discovery, and it also support triggers that generate alerts. We are still exploring its support for automatic management actions. Feedback I got about Hyperic HQ was mixed, (it seems slow, and depends on JBoss application server.).However, I think it has minimal requirements for bare minimum solution.
Decision Framework
There are many ways to do this, and for simple cases, providing simple way to perform management actions and relying on human decisions suffices. However, few ways to implement automatic control are given below.
- Using rules for decisions (e.g. Jboss Drools any business rule engine.).
- Using complex event processing (e.g. Esper), the approch is called ECA - (Event Condition Action) Rules.
- Using decision tables (this is basically a lookup).
Use cases and Requirements
Among following cases, I consider 1, 2, and 3 are essential and others should be viewed as nice (cool) features, but not essential.
- Management system must discover resources in the system automatically, and asking users to enter the service endpoints does not scale. One solution is use a registry. Services may register themselves using a soft state protocol in the registry, and management code can use that for service discovery.
- Furthermore, it is interesting to allow services to discover each other (e.g. via registry), which would allow complete stack to be automatically brought up, and discover alternative services in case of a failue. Resource need to be grouped for query purposes, and administrators should be able to provide collective control. Figuring out correct user interaction medium is an interesting problem. For an example one possibility is to support a management shell, which enable users to query resources (like ls in UNIX shell), and pipe them to management actions.
- Instrumentation enhancements - a) Given a host machine, deploy WSAS, deploy and start a new service in that machine. This will enable starting up the complete system using a script, and performing recovery tasks. These actions need to be done remotely (that can be supported using java ssh tool). b) Enable host level monitoring that captures the information missing from JVM stats. c) Service migration - initially for stateless services only.
- Expose clustering, load balancing and failover functionalities of ESB, WSAS and Axis2 via management dashboard. This is essentially to expose configurations to management UI, and make them editable. Here are few use cases. a) User defines a cluster, and edits size of the cluster. b) User decides to create a fail over service, for a give service.
- Allow users to edit system structure (that is if service A talk to Service B, and B to C, service structure is defined by A knowing address of B and B knowing address of C). This is done by editing configuration of services (e.g. via JMX), and administrators can do this via a management script, or a GUI that supports editing a wire diagram. a) User adds a proxy service to the exiting service. b) Setup a master slave setting.
- Support a distributed TCP monitor that allow administrator to selectively track messages flow thought the system. This can be implemented by writing a handler that intercept and send messages to a one service, and allowing users to connect to that service and view messages. Administrators should be able to monitor selected connections by turning on/off the Handler appropriately.
- Management system can be use to control the WSO2 SOA stack deployment in VMs or computation cloud like EC2.