Navigating XML Graph using Cypher

Cypher is a neat way to manipulate a Neo4j database. It would be equally amazing if the Xml graph could be queried with Cypher as well.

Honestly, I must put credits to Michael for suggesting such a possibility here..

Well, let’s start with a simple xml file.

<library>
<author firstname="Earnest" lastname="Hemingway">
<works>
<book name="A Farewell to Arms" year="1929" />
<book name="For Whom the bell tolls" year="1940" />
<book name="The Old man and the sea" year="1951" />
</works>
<awards>
<award name = "Pulitzer Prize" category="Fiction" year="1953"></award>
<award name = "Nobel Prize" category="Literature" year="1954"></award>
</awards>
</author>
<author firstname="Victor" lastname="Hugo">
<works>
<book name="The Hunchback of Notre-Dame" year="1831" />
<book name="Les Misérables" year="1862" />
</works>
</author>
</library>

It’s a simple xml with nothing fancy in it. As explained in the previous posts here and here.. A neat neo4j graph can be made out of this…

Screenshot from 2014-07-22 21:16:46

So, let’s go about traversing this graph using Cypher.. And since we are trying to traverse an XML, let’s make a rough comparison to XPath.

Let’s fetch all the book nodes,

The Xpath to get all the book nodes, no matter where they are in the document, is

//book

For the same purpose, the Cypher query would be,

MATCH (books:book) RETURN books

This will fetch the following output for the above Graph,


bookNodes

 

Let’s now try to fetch the name of all books. The XPath will require only a slight modification,

//book/@name

The XPath will return the list as,

Attribute='name="A Farewell to Arms"'
Attribute='name="For Whom the bell tolls"'
Attribute='name="The Old man and the sea"'
Attribute='name="The Hunchback of Notre-Dame"'
Attribute='name="Les Misérables"'

The Cypher will only require a small modification. Instead of returning the entire node, fetch the ‘name’ attribute for the nodes.

MATCH (books:book) RETURN books.name

bookName Next up, let’s query the awards honoured to Earnest Hemingway,

This can be achieved via XPath as,

//author[@firstname=’Earnest’]/awards

which gives the output

<awards>
  <award name="Pulitzer Prize" category="Fiction" year="1953" />
  <award name="Nobel Prize" category="Literature" year="1954" />
</awards>

As for Cypher,

MATCH (author {firstname: “Earnest”})-[*]->(award:award) RETURN award

We try to fetch any node of the type ‘award’ connected to a node of type ‘author’ with firstname = Earnest

awards

The above examples are very much trivial, and aims to prove the possibility of using Cypher to traverse a database. I am looking for a huge and most importantly meaningful xml content which can be queried to get some useful information. Keep watching this space for more..

Advertisements

XPath for XML

XPath is a language used to navigate through the XML document. It’s used to identify elements in the XML document. It is so limber that technologies like XQuery and XPointer are built on it. XPath uses path expressions to select nodes or node-sets in an XML document. These path expressions look very much like the expressions you see when you work with a traditional computer file system.

While using XPath, the xml document is treated as tree of nodes. See the example below

<?xml version="1.0" encoding="ISO-8859-1"?></pre>
<bookstore>
  <book>
    <title lang="en">The Joke</title>
    <author>Milan Kundera</author>
    <price>350</price>
  </book>
  <book>
    <title lang="en">After Dark</title>   
    <author>Haruki Mukarami</author>   
    <price>450</price>
  </book>
</bookstore>

Here, the tag <bookstore> is the root node. The tag <author> is the element node and the attribute lang is the attribute node. The nodes also have the hierarchical properties. For instance, the nodes children of the node <book>. Also, <title>,<author> and <price> are siblings.

XPath uses the following expressions to parse through the XML Document.

/bookstore/book - returns all the &lt;book&gt; nodes which are children of &lt;bookstore&gt;
bookstore//book - returns all book elements that are descendant of the &lt;bookstore&gt; element,
//@lang - returns all attributes that are named lang

Specific nodes can also be identified in XPath

/bookstore/book[1] - Returns the first book element
/bookstore//book[last()-1] - Returns the second last element
/bookstore/book[position()&lt;3] - Returns the first two book elements
//title[@lang] - Returns all title elements that has an attribute lang
/bookstore/book[price&gt;350]/title - Returns the titles of all books which has price more than 350

XPath supports wildcard characters as well

/bookstore/* - Selects all the children of the bookstore elements
//title[@*] - Select all the title elements which has an attribute

The detailed list of parse syntax can be found here..

So, till now, everything was pretty simple. Here comes the most flexible and useful feature of XPath 

XPath Axes

An axis defines a node-set relative to the current node. When we say, ‘the children of the current node’, the children defines a nodeset and thus children is an axes. Similarly parent, sibling, attribute are all axes W3C gives the complete list here..

From the examples we saw above, there are two types of location paths in XPath – absolute and relative

An absolute location path starts with a slash ( / ) and a relative location path does not. In both cases the location path consists of one or more steps, each separated by a slash

</pre>
An absolute location path:
/step/step/...
A relative location path:
step/step/...

A step in the examples above can consist of

  • an axis (defines the tree-relationship between the selected nodes and the current node)
  • a node (identifies a node within an axis)
  • zero or more predicates (to further refine the selected node-set)

Generalizing it, a step would look like this

axis:node[predicate]

take away the axis and predicate and you are left with the kind of steps we saw in the above examples.

See some more examples


/bookstore/child::book - Select all the book nodes which are children of bookstore

/bookstore/book/attribute::* - Select all the attributes of the book node

child::*/child::price - Select all the price grandchildren of the current node

See that the first two example are absolute paths and the last one is a relative path.

XPath can be evaluated via javascript or through a PL like Java. I will soon chalk a post on that. Also, I really need to have another post dedicated to XPath axes..