Showing posts with label xml. Show all posts
Showing posts with label xml. Show all posts

Saturday, March 22, 2008

Generating Sample XML file from Schema

Generating Sample XML files from Schema

Eclipse XML Editor (comes with WTP Editor) has a nice feature to generate a sample xml file from the given schema.


Make sure you have the WTP plugin installed on your Eclipse editor – if not please check this or search in google.


Let us create a sample XML file for the following Schema (this schema file is created using Eclipse Schema editor – more details can be found here.)


<?xml version="1.0" encoding="UTF-8"?>

<schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.org/shiporder" xmlns:tns="http://www.example.org/shiporder" elementFormDefault="qualified">


<element name="shiporder" type="tns:shiporderType"></element>

<complexType name="shiporderType">

<sequence>

<element name="orderperson" type="string"></element>

<element name="shipto" type="tns:shiptoType"></element>

<element name="item" type="tns:itemType" maxOccurs="unbounded" minOccurs="1"></element>

</sequence>

</complexType>


<complexType name="shiptoType">

<sequence>

<element name="street" type="string"></element>

<element name="city" type="string"></element>

<element name="country">

<simpleType>

<restriction base="string">

<enumeration value="INDIA"></enumeration>

<enumeration value="US"></enumeration>

<enumeration value="UK"></enumeration>

<enumeration value="Japan"></enumeration>

</restriction>

</simpleType>

</element>

<element name="pincode">

<simpleType>

<restriction base="int">

<pattern value="[0-9]{5}"></pattern>

</restriction>

</simpleType>

</element>

</sequence>

</complexType>

<complexType name="itemType">

<sequence>

<element name="title" type="string"></element>

<element name="quantity" type="int"></element>

<element name="price" type="double"></element>

</sequence>

</complexType>

</schema>



Assume that the above schema is stored in your project or in your system.


Now, to create a sample XML file for the above schema.


1) Create a New XML file using File -> New -> Others... and select XML and click next.




2) In the next wizard page, give the file name and the location where to create the file and click “Next”




3) Here give select the option “Create XML file from XML Schema” (if you want to create an XML file from DTD, you can select the first option. Click “Next” to choose the schema file.





4) In this wizard page, you need to give the schema file location. If your schema file is present in your workbench select the option “Select file from Workbench”. If your schema file is in your local system (but not in your workbench) you can import these files using “Import” button. However if you want to cretae an xml file from a standard schema file like WSDL or SOAP, you can choose “Select XML Catalog entry” - more on this later. Click “Next” button.




5) In this wizard you need to select the root element of you document. Let us select “shiporder” as our root element. You can check “Create optional attributes” and “Create optional elements” if you want to. However as our schema don't have any option attributes or element, you can simply ignore thos. Click “Finish” to create the XML file.







6) This will create an xml file like this




You can validate this XML file against the schema using the “Validate” option. Right click on the editor and select “Validate” option.


Observe that XML Validate found some errors in pincode tag.








It has an error saying

cvc-pattern-valid: Value '0' is not facet-valid with respect to pattern '[0-9]{5}' for type '#AnonType_pincodeshiptoType'.



This is because in our schema we defined the pincode to follow this pattern.


<element name="pincode">

<simpleType>

<restriction base="int">

<pattern value="[0-9]{5}"></pattern>

</restriction>

</simpleType>

</element>


Eclipse XML editor can't generate sample data based on pattern. It simply generate “0” for intergers, “0.0” for float or double and the tag name for strings. Ofcouse many comercial ides also can't generate data based on pattern restriction.


To fix this error manually give a valid value for pincode (a five digit value), save it and validate it again. The problem marker will go off if it is a valid value!



Working with XML Catalogs

Now we will look at the option “Select XML Catalog entry” to create a xml instead of creating an XML file from a give schema.





If you want to create an xml file from a standand schema like WSDL, SOAP you can select this option and in the XML Catalog you can select the key.


Let us select the key “http://schemas.xmlsoap.org/wsdl” and click Next. Select “definitions” as root element and click Finish. This will generate a file that will confirm to wsdl schema.




You can also add your custom schema to the XML Catalog entry so that whenever you want to create a XML file of that type you don't need to select the schema file again.


Now let us add our shiporder.xsd to XML Catalog.


Select Window > Preferences menu item. Here select Web and XML > XML Catalog as shown in figure.




Scroll down and select the “Add button” and choose either workspace or File system (wherever you schema file is present).




Select the workspace option and give your schema location and click Ok.





This will add you schema to XML Catalog




Click Ok to close the Preference setting dialog.


Now, you can create an XML file using this catalog entry. The first 3 three steps (till “Create XML file from Schema”) option is same. After this step Select “XML Catalog entry” and choose our shiporder.xsd and click next.




Click Next and then Click finish. This will create our require XML file.


The difference between the too files (one generated by directly giving the schema location) is the value of schemocation tag.


xsi:schemaLocation="http://www.example.org/shiporder shiporder.xsd "


xsi:schemaLocation="http://www.example.org/shiporder http://www.example.org/shiporder


In the first one first one the schema file is directly referenced using relative location, so if you schema file “shiporder.xsd” is not present in the directory where the xml file recides validation won't happen propely. However for file generated from XML catalog it is refering to a web location which is not present – but validation occurs fine as it uses the catalog to search for XML Schemas before going to the web address.


Suppose if you have 50 xml files that uses a schema and now if the schema needs to be moved to someother folder you need to update the schemaLocation attribute for all the 50 xml files if you have used the first option, but if you choosen to generate XML files from XML Catalog, only you need to update the catalog by selecting Edit option in your preferences page so all your 50 xml files automatically pick the correct schema – no need to update the xml files.


So, cataloging you XML Schemas is a better way if you what to repeadly use this schema.



Creating XML File from Templates

The New XML file wizard in Eclipse also have an option to generate XML files based on a template.


In the “Create XML File from” page select the “Create XML file from an XML template” option.




And click next, this open up the “Select XML Template” wizard page. Now select the “Use XML Template” option and click “XML Tempaltes” hyperlink.





This will open up the Preferences page




Click on “New...” button to create a new template.




Give a name “shipOrder” and decription. Also select the Context as “New XML” so this template will be used while creating new xml files.


In the pattern file, type (or copy and paste) the XML you want. You can also add variables to the pattern using Inset Variable... button.


Clikc OK once you are done.


Now our shipOrder will be shown in the template list as shown in the figure (note that only templates with context New XML will be displayed here).



Select our newly created template “shipOrder” and click finish.


This will create an XML file using that template.


Sunday, March 2, 2008

SAX Parser tips

Recently I got couple of interesting questions from my friends who are working on XML and using SAX parser to 'parse' the XML data - for performance and memory efficient; SAX parser can work efficiently even for 2 GB XML files!

Identifying Self ending tags:
Actually in XML both <br/> and <br/></br> are equivalent. So, using SAX parser you can't find whether it is a self ending tag or not. However there is a work around for it - using locator objects!

For <br/>, in both startElement and endElement you get the same location (getLineNumber() and getColumn number()) will be same.

For <br/></br>, they will be different – column numbers will be different (or even line number!).

But, using Locator object with SAXParser might slightly decrease the performance.
Also one more thing, all SAX may not support Locators as this is an optional feature.

More about Locators can be found at http://www.saxproject.org/apidoc/org/xml/sax/Locator.html


Handling default attributes

Problem:
Input file : <xhtml:td>VI</xhtml:td>Benzyl</xhtml:td>

Output file :
<xhtml:td rowspan="1" colspan="1">VI</xhtml:td>
<xhtml:td align="left" rowspan="1" colspan="1">Benzyl</xhtml:td>

The data has "rowspan" , “colspan” automatically included in the output. But the same is not present in the input.

The dtd declaration for the xhtml:td is as below
<!ATTLIST %td.qname;
%attrs;
abbr %Text; #IMPLIED
axis CDATA #IMPLIED
headers IDREFS #IMPLIED
scope %Scope; #IMPLIED
xhtml:rowspan %Number; "1"
xhtml:colspan %Number; "1"
%cellhalign;
%cellvalign;
>

These attributes are coming because they have a default value in DTD.

In the DTD it is mentioned that the default value of the xhtml:rowspan is 1, so unless you specify some value the rowspan will be 1.

Even if you don’t declare that attribute, SAXParser automatically get the value from the DTD (a ‘special’ feature of SAX parser called DTD defaulting).

You can only handle this in SAX2 parser (not in SAX parser version 1.x). I think most of the SAX parser available (like one comes with JDK1.5) today are SAX2.

In your startElement method, you will get an object of Attributes2 instead of Attributes; Actually Attributes2 is a subclass of Attributes.

Attributes2 interface has method isSpecified() which returns true unless the attribute value was provided by DTD defaulting.

So, keep this check in startElement method:



public void startElement (String uri, String localName,
String qName, Attributes attributes) throws SAXException
{
if (attributes instanceof Attributes2) {
Attributes2 att = (Attributes2) attributes
for (int i = 0; i < att.getLength(); i++) {
if (att.isSpecified(i)) // present in xml file
System.out.println(att.getQName(i) + "=\"" + att.getValue(i) + "\"");
else {// not present in xml file, came from DTD.
}
}
} // if not, we don't have a choice output all attributes.
}



There is another better way to check whether the SAX Parser Attributes2 or not - by checking the system property http://xml.org/sax/features/use-attributes2
More details at http://www.saxproject.org/apidoc/org/xml/sax/package-summary.html#package_description

Copyright (c) 2008 - Suresh