[Wsf-general] data services and resources
sumedha rubasinghe
sumedha at wso2.com
Wed Apr 18 01:09:20 PDT 2007
Sanjiva Weerawarana wrote:
> I agree the language integrated query work makes it so much easier to
> do a wide class of applications. Java is without a doubt so behind in
> language innovation now.
>
> On ADO Entities- is that really different from EJB CMP and EJB QL?
Looks almost similar to me. Wonder if the performance is also 'G00D' as
EJB QL, when it comes to large queries with lots of joins.
s/G00D/bad
>
> Sanjiva.
>
> James Clark wrote:
>> The ADO.NET Entity Framework
>>
>> http://msdn2.microsoft.com/en-us/library/aa697427(VS.80).aspx
>>
>> seems to have some ideas that are relevant to this.
>>
>> (It's scary how far Microsoft's language-integrated query work puts it
>> ahead of the Java world.)
>>
>> James
>>
>> On Mon, 2007-03-19 at 17:53 +0700, James Clark wrote:
>>> I've been wondering whether a rather different approach to specifying
>>> the data service might be preferable. I haven't worked this approach
>>> out in anything like the detail that our current approach has been
>>> worked out in, but I will try to explain the basic idea.
>>>
>>> The overall philosophy is to be higher-level, more declarative, easier
>>> to use, but less flexible and less powerful. With this philosophy it's
>>> not a goal that the user should be able to design an arbitrary web
>>> service or REST interface to the information in the database and then
>>> use the configuration file to specify that design. Instead the user
>>> gets to decide the kind of reading, writing and searching of the data
>>> that they require the web service/REST interface to provide and we
>>> automatically create a good-quality web service/REST interface that
>>> meets that requirement, together with some modest level of
>>> tweakability.
>>>
>>> The fundamental concept is an "entity-set". The data service would
>>> declare one or more named entity-sets. An entity-set is (surprise,
>>> surprise) a set of entities. Each entity has an identifier that
>>> uniquely identifies it within its entity-set. On the database side of
>>> things, for each entity set there would be a corresponding table; the
>>> primary key would correspond to the entity's identifier (for
>>> simplicity,
>>> at first I would expect we wouldn't handle multi-part primary keys).
>>> However, not every table corresponds to an entity-set. On the REST
>>> side
>>> of things for each entity there's a resource that directly corresponds
>>> to that entity; there may also be other resources that provide
>>> alternative views of the entity.
>>>
>>> The data service specification would declare one or more top-level
>>> named
>>> entity-sets. In my order database example, the top-level entity-sets
>>> might be named "products", "orders" and "customers". The name of the
>>> database table corresponding with a particular named entity set would
>>> obviously default to the name of the entity set.
>>>
>>> The second key part of this approach is dealing with things like the
>>> order_items table in my example, where information that is logically
>>> associated with one entity is in a separate database table from the
>>> entity-set to which the entity belongs. I think the way to handle this
>>> is to use the concept that a table that does not correspond to a
>>> top-level entity-set can be "owned by" a table that does. So for
>>> example, the order_items table would be owned by the orders tables.
>>> For
>>> some cases, it may be necessary to be explicit about how the rows of
>>> the
>>> owned table relate to the rows of the owning table, but my guess is
>>> that
>>> in most cases you can do the right thing by looking at the primary
>>> key/foreign key information in the database schema.
>>>
>>> I believe it's possible to provide a basic REST interface for many
>>> databases using just
>>>
>>> - the top-level entity-sets and their corresponding tables,
>>>
>>> - ownership relationships between tables, and
>>>
>>> - the database schema
>>>
>>> Obviously there are lots of different ways of providing a REST
>>> interface, but I think most of them can be intelligently defaulted (or
>>> even fixed). Let's assume we want to expose the REST interface at
>>> http://example.com/db/.
>>>
>>> - There needs to be an XML representation of an entity that both can be
>>> generated from the database and also allows the database to be updated
>>> from the XML representation. The database schema is enough to allow a
>>> reasonable default. Any customization facilities mustn't be so
>>> flexible
>>> that they inhibit automatic generation of an XML schema or mapping from
>>> the XML back to the database. The tricky bit will be handling
>>> ownership
>>> relationships. My guess is that you can mostly do the right thing by
>>> looking at the primary key/foreign keys. It should also be possible to
>>> automatically turn foreign keys into the appropriate URI because you
>>> can
>>> tell from the configuration where the URI for the resource
>>> corresponding
>>> to an entity is.
>>>
>>> - There needs to be a URI for the resource corresponding to each
>>> entity-set. This can be defaulted from the name of the entity-set: for
>>> example, the orders entity-set might be at
>>> http://example.com/db/orders/. A GET on that would provide a
>>> listing of
>>> all the entities in the entity-set. There would need to be a
>>> configurable limit on the number of entities returned by such a GET and
>>> a way to iterate over large entity sets (e.g. using queries to specify
>>> the range of the result to return). There should be some configuration
>>> that says what fields of the entity are returned in a GET on the
>>> entity-set: obviously the URI of the entity needs to be there; you
>>> might
>>> want just that, you might want a single title-like field (as in
>>> Atom) to
>>> be returned, or you might want all fields to be returned.
>>>
>>> - There needs to be a URI for the resource corresponding to each
>>> entity.
>>> This would default to the URI for the entity-set plus the canonical
>>> lexical representation of the primary key. For example,
>>> http://example.com/db/orders/12345. A GET on this would return the XML
>>> representation of the entity. The existing entity could be modified by
>>> doing a PUT on its URI. DELETE on the URI will delete the entity.
>>>
>>> - There are two ways that a new entity might be added: doing a POST on
>>> the _entity-set_ URI or doing a PUT on the _entity_ URI. It should be
>>> possible to automatically figure out which is the right way for a
>>> particular entity set based on whether the primary key is autogenerated
>>> (I think you can get this from the database schema): POST if it's
>>> autogenerated, PUT if it's not.
>>>
>>> The next big thing that a REST interface would need is some searching
>>> capability. A starting point is to allow the user to specify that
>>> certain fields are searchable. For example, if they specify that the
>>> country field of the customers entity-set is searchable. Then
>>> http://example.com/db/customers?country=US would return a listing of
>>> all
>>> customers with a country field equal to US. The next step might be to
>>> allow the a query parameter to be associated with an SQL expression.
>>> For
>>> example, we might want http://example.com/db/customers?min-age=18 to
>>> give us a list of all customers aged at least 18. The configuration
>>> might have something like this:
>>>
>>> <query-param name="min-age" type="int"> <field
>>> name="dateOfBirth"/> - now > 18 years
>>> </query-param>
>>>
>>> This would allow query parameters to compose properly with no extra
>>> work
>>> (e.g. http://example.com/db/customers?min-age=18&country=US would "just
>>> work").
>>>
>>> We would also probably want a way to provide different views of the
>>> entities, e.g. that excluded certain fields.
>>>
>>> By working at a relatively high level, we can automatically can do
>>> several nice things for the user:
>>>
>>> - we can automatically provide introspection facilities (e.g. WADL),
>>> complete with XSD and RELAX NG schemas
>>>
>>> - we should (I think) be able to automatically generate ETags; this
>>> important for cacheability and crucial for dealing with concurrent
>>> updates
>>>
>>> - it should be a small step to get an Atom interface as well
>>>
>>> So far I've focused on REST. That's partly because I think we have a
>>> bit of corporate REST deficit at the moment, and partly because I think
>>> it's easy to go from a REST interface to a WSDL (service-oriented)
>>> interface than vice-versa. How might a WSDL interface be specified? I
>>> envisage there being a number of built-in methods such as add,
>>> addMultiple, delete, deleteMatching, search, iterate, get which could
>>> apply to an entity or entity-set. The basic idea would be that the
>>> user
>>> would identify which built-in methods are allowed for which
>>> entity-sets.
>>> Each built-in method would have some number of configurable parameters.
>>> For example, the user might specify that they want to enable the "add"
>>> method for the "customers" entity-set. By default we might choose a
>>> WSDL operation name of addCustomer, but there would be a configurable
>>> parameter that would allow it to be changed to createCustomer. There
>>> might be some configurable parameters at the entity-set level: for
>>> example, the singular noun to be used (e.g. so that you can have a
>>> table
>>> called "people" and get methods called "addPerson", "removePerson").
>>>
>>> Given the built-in method and the database schema it should be possible
>>> to automatically generate a tasteful default WSDL interface. The user
>>> wouldn't need to worry about writing an XSD schema: even when the input
>>> XML is complex, the semantics of the builtin method together with the
>>> database schema should be enough to allow us to create the XSD for the
>>> user. Apart from automatically generating the WSDL, another nice thing
>>> we should be able to do for the user in the WS-* world is automatically
>>> support WS-Transfer and WS-Enumeration. Maybe we could even have a
>>> method that generates events when the database is modified (though this
>>> would require permission to create database triggers).
>>>
>>> In some cases, the built-in methods may not be sufficient. I envisage
>>> providing two ways to go beyond this. The first way would require XML
>>> and SQL skills but not programming skills. This would be quite similar
>>> to what we have at the moment: the user would provide a fragment of
>>> SQL,
>>> perhaps an XSD for the output XML or more likely the input XML, perhaps
>>> an XPath to get the input XML into SQL parameters, perhaps an XSLT to
>>> get the SQL into the desired XML form. The second way, which would
>>> require programming skills, would be to make the set of built-in
>>> methods
>>> extensible. The user would be able to extend the available built-in
>>> methods just by dropping in a jar file containing a class that
>>> implements a particular interface. The tricky bit would be designing
>>> this interface: maybe it would work by generating
>>> SQL/XSD/XPath/XSLT, or
>>> maybe it would work completely differently.
>>>
>>> In terms of tooling, I think this is declarative enough that it should
>>> be possible to create a nice, easy to use Ajax interface that works on
>>> the XML configuration file, which would guided by an XML representation
>>> of the database schema.
>>>
>>> This message is already rather long. I haven't talked about what I see
>>> as the problems with the current approach. I can do that if people
>>> want. The fundamental reason why I prefer the approach I've outlined
>>> above is that I think it's better for the specification to express as
>>> much as it can at as high a semantic level as possible. I don't think
>>> there's a big technical risk in the kind of approach I'm suggesting: it
>>> has a lot of conceptual similarities to object-relational mapping
>>> technologies, such as the Java Persistence API
>>> (http://java.sun.com/developer/technicalArticles/J2EE/jpa/).
>>>
>>> BTW, if anybody's a bit rusty on databases, I would recommend this
>>> book:
>>> http://www.amazon.com/Database-Systems-Complete-Hector-Garcia-Molina/dp/0130319953/
>>> (the Amazon customer reviews page has an amusing mixture of 1-star
>>> and 5-star reviews).
>>>
>>> James
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Wsf-general mailing list
>>> Wsf-general at wso2.org
>>> http://wso2.org/cgi-bin/mailman/listinfo/wsf-general
>>
>>
>
More information about the Wsf-general
mailing list