How to read XML in Java using Dom parser

290 Flares Twitter 0 Facebook 0 Google+ 290 LinkedIn 0 290 Flares ×

DOM ( Document Object Model)  is a platform and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure, and style of a document. That means it is able to read many document format like HTML document, XML Document etc. In the example below we are using XML DOM parser.

DOM parser loads the entire XML document into memory and then makes the document available in a Tree structure. The XML DOM views an XML document as a tree-structure. The tree structure is called a node-tree.All nodes can be accessed via  the tree. Their contents can be modified or deleted, and new elements can be created on the fly.

DOM is not an ideal choice for parsing big ( huge size) XML files as it will be slow to load and also it will consume a lot of memory. But for smaller files DOM is fine and it is easy to traverse the structure easier with DOM.

Key Terms

Node: In DOM, everything in an XML document is a Node.

  1. The entire document is a document node
  2. Every XML element is an element node
  3. The text in the XML elements are text nodes
  4. Every attribute is an attribute node
  5. Comments are comment nodes

We know everything is a Node and DOM represents the nodes in a tree structure in a hierarchy. It is similar to the way Windows Explorer displays the files and directories in a tree structure. The entire XML is a Document Node and it has different types of nodes – an element node, text node, attribute node etc. It is important to note that an Element is also a Node.

According to the DOM Specs that different types of  Node are

  • Document — Element (maximum of one), ProcessingInstruction, Comment, DocumentType
  • DocumentFragment — Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference
  • DocumentType — no children
  • EntityReference — Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference
  • Element — Element, Text, Comment, ProcessingInstruction, CDATASection, EntityReference
  • Attr — Text, EntityReference
  • ProcessingInstruction — no children
  • Comment — no children
  • Text — no children
  • CDATASection — no children
  • Entity — Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference
  • Notation — no children

Another Key point to understand is that don’t expect an element to contain a plain text. Text is always stored inside a TextNode. For example consider the XML snippet

<name> Martin</name>

Here the element node <name> contains a text node with value Martin. The <name> element node doesn’t simply hold the value Martin but holds it inside a TextNode

Refer : XML DOM

Read an XML using DOM Parser – Example