[Registry-dev] Content Search - Excluding XML Tags

Glen Daniels glen at wso2.com
Mon Jul 28 07:30:25 PDT 2008


Hi Kalani!

Kalani wrote:
> As there is a requirement to exclude XML tags when indexing XML files, I 
> thought to use the SAX API in Java 1.5 to exclude all tags and to 
> extract the text.

Where does this "requirement" come from??  Also, when you say "text" I 
assume you mean both attribute values and tag content?

I definitely think we need to index tags and namespaces - what if I want 
to search the Registry for all documents containing a <wsa:Metadata> 
element, or all documents that use a particular namespace's tags?

Lucene supports the notion of splitting content into different 
properties and then searching across one or more of these properties, 
right?  So if you want let's pull tags and namespaces into separate 
properties (I haven't thought this all the way through so I'm not 100% 
on the details) and then perhaps give the option to include them or not 
in the "default" search.  Can we look into something like that?

Speaking of options, we need to come up with a general framework for 
storing user preferences (what search options do you like, etc) in the 
Registry itself and making them available to the UI code.

Thanks,
--Glen



More information about the Registry-dev mailing list