# Navigating XML Graph using Cypher

Cypher is a neat way to manipulate a Neo4j database. It would be equally amazing if the Xml graph could be queried with Cypher as well.

Honestly, I must put credits to Michael for suggesting such a possibility here..

<library>
<author firstname="Earnest" lastname="Hemingway">
<works>
<book name="A Farewell to Arms" year="1929" />
<book name="For Whom the bell tolls" year="1940" />
<book name="The Old man and the sea" year="1951" />
</works>
<awards>
<award name = "Pulitzer Prize" category="Fiction" year="1953"></award>
<award name = "Nobel Prize" category="Literature" year="1954"></award>
</awards>
</author>
<author firstname="Victor" lastname="Hugo">
<works>
<book name="The Hunchback of Notre-Dame" year="1831" />
<book name="Les Misérables" year="1862" />
</works>
</author>
</library>


It’s a simple xml with nothing fancy in it. As explained in the previous posts here and here.. A neat neo4j graph can be made out of this…

So, let’s go about traversing this graph using Cypher.. And since we are trying to traverse an XML, let’s make a rough comparison to XPath.

Let’s fetch all the book nodes,

The Xpath to get all the book nodes, no matter where they are in the document, is

//book

For the same purpose, the Cypher query would be,

MATCH (books:book) RETURN books


This will fetch the following output for the above Graph,

Let’s now try to fetch the name of all books. The XPath will require only a slight modification,

//book/@name

The XPath will return the list as,

Attribute='name="A Farewell to Arms"'
Attribute='name="For Whom the bell tolls"'
Attribute='name="The Old man and the sea"'
Attribute='name="The Hunchback of Notre-Dame"'
Attribute='name="Les Misérables"'


The Cypher will only require a small modification. Instead of returning the entire node, fetch the ‘name’ attribute for the nodes.

MATCH (books:book) RETURN books.name


Next up, let’s query the awards honoured to Earnest Hemingway,

This can be achieved via XPath as,

//author[@firstname=’Earnest’]/awards

which gives the output

<awards>
<award name="Pulitzer Prize" category="Fiction" year="1953" />
<award name="Nobel Prize" category="Literature" year="1954" />
</awards>

As for Cypher,

MATCH (author {firstname: “Earnest”})-[*]->(award:award) RETURN award

We try to fetch any node of the type ‘award’ connected to a node of type ‘author’ with firstname = Earnest

The above examples are very much trivial, and aims to prove the possibility of using Cypher to traverse a database. I am looking for a huge and most importantly meaningful xml content which can be queried to get some useful information. Keep watching this space for more..

# Implementing Word Ladder game using Neo4j

Word Ladder is a pretty popular game invented by Lewis Carroll, who is better known as the author of Alice in Wonderland.

In a word ladder puzzle you must make the change occur gradually by changing one letter at a time. At each step you must transform one word into another word, you are not allowed to transform a word into a non-word. For instance, the word “FOOL” can be transformed into the word “SAGE” as

FOOL
POOL
POLL
POLE
PALE
SALE
SAGE


There are many variations of the word ladder puzzle. For example you might be given a particular number of steps in which to accomplish the transformation, or you might need to use a particular word. In this section we are interested in figuring out the smallest number of transformations needed to turn the starting word into the ending word.

However, the best possible solution of the Word Ladder problem is using graphs, and with Neo4j, it is even easier.

Here is an outline of where we are going:

• Represent the relationships between the words as a graph.
• Use the graph algorithm known as breadth first search to find an efficient path from the starting word to the ending word.

Our first problem is to figure out how to turn a large collection of words into a graph. What we would like is to have a relation from one word to another if the two words are only different by a single letter. If we can create such a graph, then any path from one word to another is a solution to the word ladder puzzle.

Writing the words onto the graph is another beautiful problem altogether.

We could use several different approaches to create the graph we need to solve this problem. Let’s start with the assumption that we have a list of words that are all the same length. As a starting point, we can create a vertex in the graph for every word in the list. To figure out how to connect the words, we could compare each word in the list with every other. When we compare we are looking to see how many letters are different. If the two words in question are different by only one letter, we can create an edge between them in the graph. For a small set of words that approach would work fine; however let’s suppose we have a list of 5,110 words. Roughly speaking, comparing one word to every other word on the list is an O(n2)algorithm. For 5,110 words, n2 is more than 26 million comparisons.

We can do much better by using the following approach. Suppose that we have a huge number of buckets, each of them with a four-letter word on the outside, except that one of the letters in the label has been replaced by an underscore. For example, consider the image below, we might have a bucket labeled “pop_.” As we process each word in our list we compare the word with each bucket, using the ‘_’ as a wildcard, so both “pope” and “pops” would match “pop_.” Every time we find a matching bucket, we put our word in that bucket. Once we have all the words in the appropriate buckets we know that all the words in the bucket must be connected.

In java, this can be implemented using a Map as the obvious solution.. see the snippet below


for (String word : words) {
for (int wordIndex = 0; wordIndex < word.length(); wordIndex++) {
String keyWord = word.substring(0, wordIndex)+"_"+word.substring(wordIndex+1,word.length());
if(!wordMap.containsKey(keyWord)){
wordMap.put(keyWord, new ArrayList<String>());
}
}
}



Once we have the ‘well crafted’ map ready, it is easy to write this into a neo4j database.

• For every word, a separate node is introduced
• For every word under the same bucket, a relationship is made

Note : Neo4j don’t support the use of non-directional relationships. So, we go ahead and create a directional relationship and ignore the direction when we traverse the graph.

To write to the graph easily, a wrapper like the one shown below can be used,

public class WordGraph {

Map<String, Node> nodeMap;
GraphDatabaseService graphDb;

public WordGraph(){
graphDb = NeoDatabaseHandler.getGraphDatabase();
nodeMap = new HashMap<String, Node>();
}

private Node getNode(String word){
if(!nodeMap.containsKey(word)){
Node node = graphDb.createNode(NeoHelper.WordLabel.WORD);
node.setProperty("word", word);
nodeMap.put(word, node);
}
return nodeMap.get(word);
}

public void addEdge(String parentWord, String childWord){
Node parentNode = getNode(parentWord);
Node childNode = getNode(childWord);
parentNode.createRelationshipTo(childNode, NeoHelper.RelationshipTypes.MOVE);
}
}

Whenever we need an edge created, we just call the method, addEdge and it will take care of the rest.

So, we iterate over the buckets and write the words and their relationships to the graph,

for (List<String> mappedWordsParent : wordMap.values()) {
for (String parentWord : mappedWordsParent) {
for (String childWord : mappedWordsParent) {
if(!childWord.equals(parentWord)){
}
}
}
}

The graph created using a minimum set of words will look like the one below, (apologies for the clutter in the relationship names)

Once, we have the graph ready, we can do a decent traversal to get the required path. Let us assume, we are trying to find a path from fool to sage

The graph algorithm we are going to use is called the “breadth first search” algorithm. Breadth first search (BFS) is one of the easiest algorithms for searching a graph.

From a very neat post that i came across

Given a graph G and a starting vertex s, a breadth first search proceeds by exploring edges in the graph to find all the vertices in G for which there is a path from s. The remarkable thing about a breadth first search is that it finds all the vertices that are a distance k from s before it finds any vertices that are a distance k+1. One good way to visualize what the breadth first search algorithm does is to imagine that it is building a tree, one level of the tree at a time. A breadth first search adds all children of the starting vertex before it begins to discover any of the grandchildren.

Implementing this on our own, let us save that for another day. Neo4j provides a neat Traversal Framework.

Lets have a look at the code,

graphDb
.relationships(NeoHelper.RelationshipTypes.MOVE)
.evaluator(Evaluators.includeWhereEndNodeIs(endNode))
.traverse(startNode)

It’s pretty straight forward. We traverse through relationships named Move ( which is the only relationship we have in the Graph ). We use an Evaluator, which decides what to be returned as the output of a traversal, and in this case, we pass the node corresponds to the endword. So, whenever the traversal reaches the endword ( in this case sage), it returns the corresponding path. The complete code can be found in github.

Once, we have the path, printing it to standard out yields,

fool
pool
poll
pall
pale
sale
sage

Which is pretty much what we expected. You could try this out own your own as the entire project is available on github.

Much content has been adopted from this book, which is the ONE place you can start to understand data structures in python.

# Adding DOM like features to Neo4j xml graph

This is an extension to my previous post on how to convert an xml format data to a neo4j graph. For instance, see the graph generated an xml file in the following format

<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
.
.
.
</catalog>


And the neo4j graph visualization below..

There are 4 <book> nodes under <catlog> and for each book, the associated metadata. Altogether it’s a pretty standard xml.

Most of our xml operations involve traversing the xml. JAXP is bread and butter for any developer working with xml in Java. I have done my bit of xml parsing and the interface provided is so wonderful (and resource intensive because of the DOM).

Similar flexibility is possible for the xmlGraph too..

I have tried to implemented the following methods..

• getTags – Fetches the element objects for the specified xml tag
• getParent – Fetches the parent element object
• getChildren – Fetches the child element object
• getSiblings – Fetches the siblings of the element

The implementation of the methods are in a separate service Facade..

public interface XmlTreeService {
public List<XmlElement> getTags(String tagName);
public List<XmlElement> getChildren(XmlElement parent);
public XmlElement getParent(XmlElement child);
public List<XmlElement> getSiblings(XmlElement element);
}


### Implementation Details

Fetching the list of Tags, (the getTags method)

The best solution was to use the GlobalGraphOperations interface. As discussed before, while saving the xml as a graph, each node has it’s tag name as a label. For instance, the node representing a book node will have the label, ‘book‘ associated with it.

It can be seen the TAG property ‘book‘ is also a label.

This means fetching all Tags of a specified name resolves to fetching all nodes with a specific label, which is easy using GlobalGraphOperations

GlobalGraphOperations globalGraph = getGlobalGraphOperations();
Label label = DynamicLabel.label(tagName);
ResourceIterable<Node> nodes;
try (Transaction tx = graphDb.beginTx()) {
nodes = globalGraph.getAllNodesWithLabel(label);
for (Node node : nodes) {
XmlElement element = getXmlElement(node);
}
tx.success();
} catch (IOException e) {
LOGGER.severe(e.getMessage());
e.printStackTrace();
}


Fetching the parent and children

In the graph we created, between a child tag and the parent tag in the xml, there exists a relationship, CHILD_OF from the child to the parent

Implies finding the lineage is about finding the relationship of the node and fetching the element. For instance, see how the children nodes are fetched,

public List getChildren(XmlElement parent) {
GraphDatabaseService graphDb = Neo4jDatabaseHandler.getGraphDatabase();
List childElementList = new ArrayList<>();
try(Transaction tx = graphDb.beginTx()) {
Node node = graphDb.getNodeById(parent.getId());
Iterable childRelations = node.getRelationships(Direction.INCOMING);
Iterator relationshipIterator = childRelations.iterator();
while(relationshipIterator.hasNext()) {
Relationship relationship = relationshipIterator.next();
Node childNode = relationship.getStartNode();
XmlElement childElement = getXmlElement(childNode);
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}


The process can be outlined as

1. Fetch the node from GraphDatabaseService using the id which is stored in the XmlElement Object.
2. Fetch the outgoing relationships which are INCOMING to the parent node.
3. Iterate through the relationships and fetch the starting nodes.

The same method can be used to identify the parent node too..

Fetching Sibling Nodes

To fetch the sibling nodes, first fetch the parent node and then get the children. Also remove the node in question. Finding siblings is quite elementary.

You can always find the complete source code in github.

### Summary

To roll all balls at once, let’s try to do a small test,

GraphDatabaseService graphDb = Neo4jDatabaseHandler.getGraphDatabase();
XmlTreeServiceGraph treeServiceGraph = new XmlTreeServiceGraph(graphDb);
System.out.println("-----Test for DOM like methods on XmlGraph----\n");
List<XmlElement> groupElements = treeServiceGraph.getTags("book");
System.out.println("Fetching all Book Nodes...");
for (XmlElement xmlElement : groupElements) {
System.out.println(xmlElement.getAtrributeString());
}
XmlElement firstElement = groupElements.get(0);
System.out.println("\nFetching children of the first Book Node : " + firstElement.getAtrributeString() + "...");
List<XmlElement> childrenElements = treeServiceGraph.getChildren(firstElement);
for (XmlElement xmlElement : childrenElements) {
System.out.println(xmlElement.getTagName()+ " : " + xmlElement.getTagValue());
}
XmlElement child = childrenElements.get(0);
System.out.println("\nFetching parent of the first child element : "+child.getTagName()+" : "+child.getTagValue() + "...");
XmlElement parentElement = treeServiceGraph.getParent(child);
System.out.println(parentElement.getTagName() + " : " + parentElement.getAtrributeString());
System.out.println("\nFetching siblings of the first Book Node : " + firstElement.getAttributes());
List<XmlElement> elementSibling = treeServiceGraph.getSiblings(firstElement);
for (XmlElement xmlElement : elementSibling) {
System.out.println(xmlElement.getTagName() + " : " + xmlElement.getAtrributeString());
}


P.S : I know the test code is crappy, but for a demo test, this will do.. 🙂

So, it’s quite straightforward, I am doing the following steps

1. Fetch all ‘book’ nodes in the xml, and print them
2. Fetch all the children of the first ‘book’ node, and print them
3. Fetch the parent of the first child from step #2, which should give us our first book node, and print them
4. Fetch the siblings of the first node, and print them

Let’s see how the output looks like,

-----Test for DOM like methods on XmlGraph----

Fetching all Book Nodes...
{"id":"bk101"}
{"id":"bk102"}
{"id":"bk103"}
{"id":"bk104"}

Fetching children of the first Book Node : {"id":"bk101"}...
description : An in-depth look at creating applications with XML.
publish_date : 2000-10-01
price : 44.95
genre : Computer
title : XML Developer's Guide
author : Gambardella, Matthew

Fetching parent of the first child element : description : An in-depth look at creating applications with XML....
book : {"id":"bk101"}

Fetching siblings of the first Book Node : {id=bk101}
book : {"id":"bk104"}
book : {"id":"bk103"}
book : {"id":"bk102"}


It looks good, and everything as expected… 🙂

# The Buendia Family Tree as a Neo4j graph – Tribute to Gabo

One of the most significant and prominent authors passed away last fortnight – Gabriel Garcia Marquez. I read One Hundred Years of Solitude during my senior college years. Magical Realism has long been associated with the vernacular literature in my home state.

The Buendia family in One Hundred Years of Solitude is another epic on it’s own. See a very nice graphic family tree below.

I always wanted to visualize a family tree using a graph, and the Buendias of Macondo seemed so perfect. My other choice was the political families of India, which is pretttyyy vaasttt..

The visualization is pretty straightforward, two types of nodes

• Male
• Female

And 4 simple relationships

• HUSBAND
• FATHER
• MOTHER
• MISTRESS (Sorry, could not find an euphemism for that)

Apologies if it is too cluttered, but you could play around with the graphgist