[Wsf-general] Messaging project concept
James Clark
james at wso2.com
Tue Mar 27 03:21:40 PDT 2007
> I don't want to bet you to a race but I'd be surprised if you could write
> this whole thing in Python faster than I can package up Axis2 (Java, in my
> case) bits into multiple processes ;-).
You've convinced me that you could package up Axis2/Java in a
multi-process architecture, and I'm sure that you could it faster than I
could do the whole thing in Python. However, I suspect that JVM startup
time and process size would make this not a viable option.
An option that might work is to use gcj to compile it, being careful to
include just the bits you need. This would also eliminate a runtime
requirement for JRE. That actually might be quite a workable approach.
(Has anybody tried compiling Axis2 with gcj?)
But for Axis2/C I think the balance is very different: first, the very
nature of C makes the kind of repackaging we're talking about much, much
harder; second, the Java stack is much more mature than the C stack;
third, Java protects you from low-level memory problems just as Python
does, whereas C doesn't.
In addition, I believe I would eventually be able to get a better
quality implementation by going with the approach of using all Python at
first with some C added as needed. There are lots of clever things in
the qmail/postfix approach: one of them is the way that they achieve the
reliability and transactional semantics of a database without the
overhead of actually using a database. This is important both for
performance and for avoiding a dependency on a heavy-duty database. (A
dependency on SQLite would be fine, but it has poor performance for
concurrent access by multiple processes.) The cleverness is in having
an extremely carefully crafted scheme of using the filesystem (in terms
in system calls, directories, choice of filenames). The scheme also
ensures that they don't have to repeatedly do a lot of IO on the message
body; mostly processes just need to look at the message header.
I'm fairly confident that with sufficient thought a similar kind of
scheme could be devised for Telegon. I suspect several aspects might be
very specialized to the needs of Telegon: for example, it will be very
common in Telegon for the bulk of the Body to be a single chunk of
base64 data serialized using MTOM, so we may well want to have some
special handling for that.
This is a very different approach from what Sandesha is doing at the
moment: it relies on the database to do the heavy lifting; the handling
of each message is wrapped in a transaction, and the entire
MessageContext state is serialized to the database. This sort of
approach makes for easy implementation, but if you have each handler
doing this kind of thing, you're going to be stuck with really, really
bad performance (and it will not be fixable by rewriting bits in C).
Another issue is state. As you mentioned, we need to be able to handle
complex state: I don't need any convincing that the combination of RM
and WS-Security and WS-SecureConversation has complex state. State that
is scoped to a message can be handled simply by including this in the
on-disk queue format. However, as you pointed out, there will be state
that has a broader scope than a single message. This represents a
challenge for a multi-process architecture. You can't just pass around
a pointer to it as you can do in a single-process architecture. Also
you don't want to be forced to keep this shared state on disk and
serialize/deserialize it all the time. Now you could handle this by
having Axis2 use some general purpose clustering/shared state system
(like you are doing with WSAS clustering), but I think this is far too
heavyweight for Telegon.
Postfix deals with this issue by having a number of separate processes
that are responsible for maintaining this shared state and whose
lifetime is matched to the lifetime of the state they maintain; a
process handling an individual message can then communicate with the
process maintaining the shared state to retrieve and update the state as
necessary. That's the kind of approach I think we need in Telegon, and
it's more than a trivial detail: each process boundary is a potentially
a trust boundary with processes on each side of the boundary have
different security properties; there needs to be defined protocol for
them to communicate and neither process can blindly trust what it's sent
by the other process.
James
More information about the Wsf-general
mailing list