[wsf-c-dev] AXIOM NG

James Clark james at wso2.com
Fri Dec 8 05:18:21 PST 2006


Here's my thinking about how XML might be handled in a future version of
the SOAP stack.  This design is by no means fully-baked.  Please don't
be shy about pointing out things that you think won't work.

There are three fundamental interfaces:

- xml_reader
- xml_writer
- xml_element

The xml_reader and xml_writer interfaces are what you might expect:
xml_reader is a pull-parser style interface, with a next_event method;
xml_writer is a SAX event handler style interface, with start_element,
end_element etc methods.

The xml_element interface is a bit more interesting.  The key ideas are
as follows:

- you cannot examine the content of an xml_element directly; instead an
xml_element has a method to create an xml_reader that iterates over that
element

- an xml_element also has a method to output itself to an xml_writer

- an xml_element has an optional use-once semantic.  Each xml_element is
either USE_ONCE or USE_MANY.  When you create an xml_element it starts
out as being USE_ONCE.  Most methods on morean xml_element "consume" the
xml_element. For example, creating an xml_reader on an xml_element or
outputting an xml_element to an xml_writer consumes the xml_element. An
xml_element that is USE_ONCE can be consumed at most once. A USE_MANY
xml_element can be consumed arbitrarily many times.  A USE_ONCE
xml_element can be turned into a USE_MANY xml_element by calling an
allow_multiple_use method on the xml_element *before it is consumed*.
Calling a consuming method on an xml_element that has already been
consumed will return an error.  The implementation of allow_multiple_use
may change the representation of an xml_element.

- an xml_element is conceptually immutable (apart from the use-once
semantic)

- there's an implementation of xml_writer that creates an xml_element
from what was written to the xml_writer

- you can create an xml_element from a string or an input byte stream

- if the current event of an xml_reader is a start-element event, then
you can ask the xml_reader to read the rest of element and give it to
you as an xml_element; you can choose whether the current event should
remain as that start-element event or should be left on the matching
end-element event

- xml_element would also have an extensibility hook, so you can say: if
you can, give me the element in format X, where X is a string (e.g.
"libxml2") 

- there would be some methods on xml_element to support data-binding; I
haven't figured this bit out yet, but it would allow you to go directly
from the xml_element to a strongly-typed, streamed, binary
representation

- there would probably be ways for the user to give hints about how an
xml_element is going to be used in order to allow the implementation to
choose the most efficient representation

- there might also be some sort of XPath support (e.g. filtering an
xml_element using an XPath)

The fundamental idea of an xml_element is to take the laziness that's in
the current AXIOM to the next level.  Do as little work as possible when
you create the xml_element; keep it initially in whatever format is most
efficient for the producer of the xml_element.  Allow the consumer of
the xml_element to ask for the element directly in whatever format is
most efficient for the consumer.  Then you can potentially convert
directly from the producer's preferred format into the consumer's
preferred format.  The optional use-once semantic allows streaming
conversions to be used where they are possible.  The immutabability
together with the optional use-once semantic allow for efficient
implementation of operations with functional semantics.

The xml_reader and xml_writer interfaces would provide MTOM support via
a blob interface.  The blob interface would be immutable with an
optional use-once semantic like xml_element, and would be convertible
to/from input/output byte streams.  An xml_reader would be able to give
you the content of an an element as a blob, and you would be able to
write a blob to an xml_writer.

The xml_reader/xml_writer would also have support for asynchronous
reading and writing. I'll discuss this in a future message when I talk
about concurrency.

The representation of a SOAP message would be an array of header objects
plus an xml_element for the body or payload. Each header object would
consist of a local name, namespace name, SOAP-defined properties (such
as must understand) and an xml_element.  I think initially we'll want to
parse an incoming message so that we have a xml_element for the SOAP
body.  But at some point before we give it to the user we'll probably
want to change it into an xml_element for the payload.  The xml_element
interface will need to provide efficient support for this
wrapping/unwrapping.

I have a question related to this.  What uses are there currently of
attributes on the SOAP envelope and SOAP body? One thing that springs to
mind is putting a wsu:id on the SOAP body for signing (and similarly
with encryption except the element won't be called soap:body, right?).
Anything else?

The SOAP engine would use a variety of different implementations of
xml_element in different contexts.  You wouldn't just plug-in a single
implementation.  For example, the xml_element for a SOAP body might be
represented as input byte stream plus a set of in-scope namespaces,
whereas the xml_element for a header block might be represented as a set
of in-scope namespaces, a byte array (for the original XML) and a list
of XML events encoded as an array of integers.

One issue I'm still thinking about is whether it might be better to have
an xml_content interface rather than an xml_element interface.  An
xml_content object would be able to contain anything that an XML element
can contain i.e. a sequence of zero or more elements, characters,
comments and processing instructions.

James










More information about the Wsf-c-dev mailing list