Adding DOM like features to Neo4j xml graph

This is an extension to my previous post on how to convert an xml format data to a neo4j graph. For instance, see the graph generated an xml file in the following format

<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
.
.
.
</catalog>

And the neo4j graph visualization below..

Catlog-xml

There are 4 <book> nodes under <catlog> and for each book, the associated metadata. Altogether it’s a pretty standard xml.

Most of our xml operations involve traversing the xml. JAXP is bread and butter for any developer working with xml in Java. I have done my bit of xml parsing and the interface provided is so wonderful (and resource intensive because of the DOM).

Similar flexibility is possible for the xmlGraph too..

I have tried to implemented the following methods..

  • getTags – Fetches the element objects for the specified xml tag
  • getParent – Fetches the parent element object
  • getChildren – Fetches the child element object
  • getSiblings – Fetches the siblings of the element

The implementation of the methods are in a separate service Facade..

public interface XmlTreeService {
  public List<XmlElement> getTags(String tagName);
  public List<XmlElement> getChildren(XmlElement parent);
  public XmlElement getParent(XmlElement child);
  public List<XmlElement> getSiblings(XmlElement element);
}

Implementation Details

Fetching the list of Tags, (the getTags method)

The best solution was to use the GlobalGraphOperations interface. As discussed before, while saving the xml as a graph, each node has it’s tag name as a label. For instance, the node representing a book node will have the label, ‘book‘ associated with it.

node-label-cropped

 

It can be seen the TAG property ‘book‘ is also a label.

This means fetching all Tags of a specified name resolves to fetching all nodes with a specific label, which is easy using GlobalGraphOperations

GlobalGraphOperations globalGraph = getGlobalGraphOperations();
Label label = DynamicLabel.label(tagName);
ResourceIterable<Node> nodes;
try (Transaction tx = graphDb.beginTx()) {
 nodes = globalGraph.getAllNodesWithLabel(label);
 for (Node node : nodes) {
   XmlElement element = getXmlElement(node);
   elements.add(element);
}
 tx.success();
} catch (IOException e) {
 LOGGER.severe(e.getMessage());
 e.printStackTrace();
}

 

Fetching the parent and children

In the graph we created, between a child tag and the parent tag in the xml, there exists a relationship, CHILD_OF from the child to the parent

child-of-crop

Implies finding the lineage is about finding the relationship of the node and fetching the element. For instance, see how the children nodes are fetched,

 

public List getChildren(XmlElement parent) {
 GraphDatabaseService graphDb = Neo4jDatabaseHandler.getGraphDatabase();
 List childElementList = new ArrayList<>();
 try(Transaction tx = graphDb.beginTx()) {
  Node node = graphDb.getNodeById(parent.getId());
  Iterable childRelations = node.getRelationships(Direction.INCOMING);
  Iterator relationshipIterator = childRelations.iterator();
  while(relationshipIterator.hasNext()) {
    Relationship relationship = relationshipIterator.next();
    Node childNode = relationship.getStartNode();
    XmlElement childElement = getXmlElement(childNode);
    childElementList.add(childElement);
  }
} catch (IOException e) {
// TODO Auto-generated catch block
 e.printStackTrace();
}

The process can be outlined as

  1. Fetch the node from GraphDatabaseService using the id which is stored in the XmlElement Object.
  2. Fetch the outgoing relationships which are INCOMING to the parent node.
  3. Iterate through the relationships and fetch the starting nodes.

The same method can be used to identify the parent node too..

Fetching Sibling Nodes

To fetch the sibling nodes, first fetch the parent node and then get the children. Also remove the node in question. Finding siblings is quite elementary.

You can always find the complete source code in github.

Summary

To roll all balls at once, let’s try to do a small test,

GraphDatabaseService graphDb = Neo4jDatabaseHandler.getGraphDatabase();
XmlTreeServiceGraph treeServiceGraph = new XmlTreeServiceGraph(graphDb);
System.out.println("-----Test for DOM like methods on XmlGraph----\n");
List<XmlElement> groupElements = treeServiceGraph.getTags("book");
System.out.println("Fetching all Book Nodes...");
for (XmlElement xmlElement : groupElements) {
  System.out.println(xmlElement.getAtrributeString());
}
XmlElement firstElement = groupElements.get(0);
System.out.println("\nFetching children of the first Book Node : " + firstElement.getAtrributeString() + "...");
List<XmlElement> childrenElements = treeServiceGraph.getChildren(firstElement);
for (XmlElement xmlElement : childrenElements) {
  System.out.println(xmlElement.getTagName()+ " : " + xmlElement.getTagValue());
}
XmlElement child = childrenElements.get(0);
System.out.println("\nFetching parent of the first child element : "+child.getTagName()+" : "+child.getTagValue() + "...");
XmlElement parentElement = treeServiceGraph.getParent(child);
System.out.println(parentElement.getTagName() + " : " + parentElement.getAtrributeString());
System.out.println("\nFetching siblings of the first Book Node : " + firstElement.getAttributes());
List<XmlElement> elementSibling = treeServiceGraph.getSiblings(firstElement);
for (XmlElement xmlElement : elementSibling) {
  System.out.println(xmlElement.getTagName() + " : " + xmlElement.getAtrributeString());
}

 

P.S : I know the test code is crappy, but for a demo test, this will do.. 🙂

So, it’s quite straightforward, I am doing the following steps

  1. Fetch all ‘book’ nodes in the xml, and print them
  2. Fetch all the children of the first ‘book’ node, and print them
  3. Fetch the parent of the first child from step #2, which should give us our first book node, and print them
  4. Fetch the siblings of the first node, and print them

Let’s see how the output looks like,

-----Test for DOM like methods on XmlGraph----

Fetching all Book Nodes...
{"id":"bk101"}
{"id":"bk102"}
{"id":"bk103"}
{"id":"bk104"}

Fetching children of the first Book Node : {"id":"bk101"}...
description : An in-depth look at creating applications with XML.
publish_date : 2000-10-01
price : 44.95
genre : Computer
title : XML Developer's Guide
author : Gambardella, Matthew

Fetching parent of the first child element : description : An in-depth look at creating applications with XML....
book : {"id":"bk101"}

Fetching siblings of the first Book Node : {id=bk101}
book : {"id":"bk104"}
book : {"id":"bk103"}
book : {"id":"bk102"}

It looks good, and everything as expected… 🙂

The HTML DOM (Document Object Model)

Anyone who has developed using  JavaScript should be familiar with the term, DOM or Document Object Model. It’s neat and interesting stuff, and how it evolved over ages is another epic in itself.. but that’s for another day..

So, in the world of Object savvy development, everything is an object. Every web page resides inside a browser Window which can be considered an object. It’s an object complete with cliche windows properties like scrollbar, frame etc..

And for the document, the content we see inside the window, is also an object. It’s quite evident that the HTML script is responsible for the content or HTML scripts are the content. The HTML document is displayed as on object within the window. Note that I used the words document and object. So, the standards by which the HTML document is displayed as an object is called the Document Object Model.

When we say Document Object Model, it means a standard for creating Objects out of a markup language. In addition to HTML, XML and XHTML has their own DOM standards.

This is what W3C says about DOM

The Document Object Model is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The document can be further processed and the results of that processing can be incorporated back into the presented page.

The DOM follows an hierarchical representation.

1

Let me just put in a few words on the high level objects..

The Window object is the root of the tree. A Window object represents one open window in the browser. Having said that, in the world of dynamic HTML(will have to write something on it), it’s very common to introduce iframes in pages. In that case, the browser constructs one window object for each iframe. The same reason why the iframes acts as separate entities. The Window object API can be referenced here.

The History object stores the list of URL visited from the page. Consider this instance, I have opened two tabs, viz  http://http://www.google.co.in and http://www.bing.com/. Both of them will have a separate Window object, let’s call them WIN1 and WIN2. And from google, I navigated to the maps tab, i.e https://maps.google.co.in/. So, the previous page(http://http://www.google.co.in) will be stored in the history object of WIN1. The other window object will be impervious to these changes. Having an array of the previous URLs gives the developer the powers of navigation. The History object has two major functions.

  • history.back
  • history.forward

Both returns no value and are similar to clicking the ‘back’ and ‘forward’ buttons on the browser. The entire API can be found here.

The Document object is the developer’s blue eyed boy. The Document object stores everything under the <HTML> tag. In other words, each HTML document loaded into the browser becomes a document Object. The Document object can be used to access anything in the HTML document. This is widely used and exploited by developers.

1

This example by W3C provides a perfect example on how the Document object is used in modern Web apps. The entire API reference can be found here.

The Location object stores the URL details of the current page. The Location object is part of the window object and is accessed through the window.location property. See the Location object dissected.

1