Pentaho For Dummies
In a previous blog entry I discussed a fair amount on business intelligence and other related concepts. So today I thought discussing about a specific business intelligence system would be worth while. I have been playing with the Pentaho business intelligence platform for the last couple of weeks for a project. The Pentaho BI project is an ongoing effort by the open source community to develop a feature rich business intelligence system which is also capable of enterprise reporting, data analysis, dash boarding, data mining and workflow management. This seems little too much for a business intelligence system and that's probably why we should refer to it as a business intelligence platform and not as just a simple BI system. To make a long story short, it supports all kinds of BI related tasks and is the number 1 ranked open source BI system.
The Pentaho BI platform is process-centric. The central controller of the platform is a workflow engine which makes use of process definitions to execute various BI processes. A BI process is made of a number of actions. For an example we may define a simple BI process with three actions as follows.
Gather Data > Prepare a 50 page report > E-Mail the report to a mailing list
In Pentaho BI platform the existing BI processes can be easily customized and new processes can be added without too much effort. In fact utilizing Pentaho BI platform is all about defining BI processes and adding them to the Pentaho solution repository. The Pentaho BI platform is armed with many components each capable of performing some action in a BI process. In addition to that Pentaho uses an XML based mechanism for storing process definitions.
Architecture
The work horse of Pentaho BI platform is the Pentaho solution engine. It is connected to a solution repository where various process definitions are stored. as mentioned above in Pentaho a process is a sequence of actions. An action is the smallest unit of task that can be performed by the Pentaho BI platform. There is a number of components in Pentaho BI platform each capable of performing an action or a set of actions. If we consider the action sequence given in the previous section Pentaho has individual components to retrieve data, generate the report and then e-mail the report. Once component may depend on other components.
When a user makes a request to execute a BI process the solution engine looks for the relevant process definition or action sequence file which is nothing more than a plain XML file probably with a .xaction extension in the solution repository. Once the definition is located the engine calls the necessary components in the order specified in the action sequence definition. For some components to work some parameters has to be passed in. For an example if we consider a report generation component we have to pass in the data to be appeared in the report as a parameter to the component. There are several ways a component may get the required parameter values. In the example of data gathering and creating a report we may configure the action sequence in such a way that output of the data gathering component is directed to the reporting component as an input. Also components may get input parameters from the user session, runtime of the component, as global variables and as user specified values.
Pentaho Reporting
There is a variety of tools for easy development and deploying of Pentaho reports. Pentaho report generation wizard which is available as a standalone application as well as an Eclipse plugin is a good way to quickly get up and running with Pentaho reports. Pentaho Report Designer is more feature rich tool for generating complex reports. Pentaho also supports a concept called subreports. That is we can emberd any number of reports within another report.
When developing simple Pentaho reports we have to basically create two definition files. One is of course the action sequence definition file (.xaction) which describes how to get data how to use them in the report. The other is the report specification file (.xreportspec) which defines the visual appearance of the report. The action sequence file makes a reference to the report specification file. The report specification is actually an input parameter to the reporting component. The above mentioned tools from Pentaho can be used for quickly generating these files through a user friendly GUI.
(More on Pentaho will follow...)