RegEx to match Java Comments

Most(if not all) of the regular expressions a developer needs is already available. There are numerous resources online ranging from blogs and even an ever growing library. Not to mention the endless discussions on stackoverflow.

Few days back, I had lost the sources to a project and had to decompile them from the original build. Did not expect the decompiler to throw in so much comments.


The most obivious way to remove comments would be to use the following regEx.


This seems the natural way to do it. /\* finds the start of the comment (note that the literal * needs to be escaped because * has a special meaning in regular expressions), .* finds any number of any character, and \*/ finds the end of the expression.

This will remove all the comments the decompiler put it. But, wanted to pursue more to handle any kind of comments.

The first problem with this approach is that .* does not match new lines. (Note, the text in green color is matched)

/* First comment 
 first comment—line two*/
/* Second comment */

This can be overcome easily by replacing the . with (.|[\r\n]):(\r and \n represent the return and newline characters)


This reveals a second, more serious, problem—the expression matches too much. Regular expressions are greedy, they take in as much as they can. Consider the case in which your file has two comments. This regular expression will match them both along with anything in between:

/* First comment */
/* Second comment */

To fix this, the regular expression must accept less. We cannot accept just any character with a ., we need to limit the types of characters that can be in our expressions. The new RegEx stops parsing when it sees a */ and will begin the next match only when it sees another /*.


The disadvantage is that this simplistic approach doesn’t accept any comments with a * in them, which a standard java comment.

* Common multi-line comment style. 
/* Second comment */

This is where it gets tricky. How do we accept a * without accepting the * that is part of the end comment? The solution is to still accept any character that is not *, but also accept a * and anything that follows it provided that it isn’t followed by a /. The parse will only stop if it sees a */ and will continue when it sees a *.


Also, the section in bold makes sure * comes only in pairs. It will accept any even number of *. It might even accept the * that is supposed to end the comment.

 * Common multi-line comment style.
 * Another common multi-line comment style.

We should also make sure that the comment can end in multiple * s.


This RegEx can be used to find/replace java comments using a standard text editor.

An easier Method

Most regular expression packages support non-greedy matching. This means that the pattern will only be matched if there is no other choice. We can modify our second try to use the non-greedy matcher *? instead of the greedy matcher *. With this change, the middle of our comment will only match if it doesn’t match the end:


On the beauty of Algorithms.. Part 1

As software engineers, our job is not just to find solution to problems, but to find the best solution to a problem.

You don’t aspire to be just another monkey with a fancy title..


Had the chance to comeupon a few beautifully crafted courses online. Highly reccomend to checkout this, this and this..

To demonstate the power of a good algorithms, let’s borrow a case study from Sedgewick and Wayne’s beautifully crafted book.

The Dynamic Connectivity problem 

We start with the following problem specification: The input is a sequence of pairs of integers, where each integer represents an object of some type and we are to interpret the pair p q as meaning “p is connected to q.”

We assume that “is connected to” is an equivalence relation, which means that it is
■ Reflexive : p is connected to p.
■ Symmetric : If p is connected to q, then q is connected to p.
■ Transitive : If p is connected to q and q is connected to r, then p is connected to r.

An equivalence relation partitions the objects into equivalence classes. In this case, two objects are in the same equivalence class if and only if they are connected. Our goal is to write a program to filter out extraneous pairs (pairs where both objects are in the same equivalence class) from the sequence.

See the illustration below,

union find

Assume the universe consists of 10 points numbered from 0-9. Initially none of the points are connected. The inputs are the pair of numbers on the left.

The first pair (4,3). We check if there exists a connection between the pair. So, we create  a connection between the two nodes and prints out the connection. Same for the next 4 inputs of (3,8), (6,5), (9,4) and (2,1).

But the next input, (8,9). Even though there is no direct connection between 8 and 9, they are part of a connected component, [3,4,8,9]. So, no new connection is to be made and nothing is printed. And so on..

Given a grid with numerous connected components, identifying the connected components or if two nodes are connected pose a computational problem. See the universe below,

union find2

You can quickly identify component consisting of five sites at the bottom left, but you might have difficulty verifying that all of the other sites are connected to one another. For a program, the task is
even more difficult, because it has to work just with site names and connections and has no access to the geometric placement of sites in the diagram. How can we tell quickly whether or not any given two sites in such a network are connected?

A solution to the problem provides an effective implementation for,

void union(int p, int q) – add connection between p and q
int find(int p) – component identifier for p (0 to N-1)
boolean connected(int p, int q) – return true if p and q are in the same component

The union() operation merges two components if the two sites are in different components, the find() operation returns an integer component identifier for a given site, the connected() operation determines whether two sites are in the same component.

As we shall soon see, the development of an algorithmic solution for dynamic connectivity thus reduces to the task of developing an implementation of this API. Every implementation has to
■ Define a data structure to represent the known connections
■ Develop efficient union(), find(), connected() implementations that are based on that data structure

As usual, the nature of the data structure has a direct impact on the efficiency of the algorithms, so data structure and algorithm design go hand in hand. The API already specifies the convention that both sites and components will be identified by int values between 0 and N-1, so it makes sense to use a site-indexed array id[] as our basic data structure to represent the components.

Initially, none of the sites are connected and each site corresponds to a component. Initially, we start with N components, each site in its own component, so we initialize id[i] to i for all i from 0 to N-1.

All of our implementa-tions use a one-line implementation of connected() that returns the boolean value find(p) == find(q). The find(int p) methods returns the identified for the component. So, it should make perfect sense to assert that, if two sites correspond to the same component, they are connected.


We shall consider three different implementations, all based on using the site-indexed id[] array, to determine whether two sites are in the same connected component.


One approach is to maintain the invariant that p and q are connected if and only if id[p] is equal to id[q]. In other words, all sites in a component must have the same value in id[]. This method is called quick-find because find(p) just returns id[p], which immediately implies that connected(p, q) reduces to just the test id[p] == id[q] and returns true if and only if p and q are in the same component.

For implementing union, we first check whether they are already in the same component, in which case there is nothing to do. Otherwise, we are faced with the situation that all of the id[] entries corresponding to sites in the same component as p have one value and all of the id[] entries corresponding to sites in the same component as q have another value. To combine the two components into one, we have to make all of the id[] entries corresponding to both sets of sites the same value, as shown in the example. To do so, we go through the array, changing all the entries with values equal to id[p] to the value id[q].

We could have decided to change all the entries equal to id[q] to the value id[p]—the choice between these two alternatives is arbitrary.

union find3

The implementation is pretty simple,

public int find(int p)
{ return id[p]; }
public void union(int p, int q)
{  // Put p and q into the same component.
   int pID = find(p);
   int qID = find(q);
   // Nothing to do if p and q are already in the same component.
   if (pID == qID) return;
      // Rename p’s component to q’s name.
      for (int i = 0; i < id.length; i++){
         if (id[i] == pID) {
           id[i] = qID;

The find() operation is certainly quick, as it only accesses the id[] array once in order to complete the operation. But quick-find is typically not useful for large problems because union() needs to scan through the whole id[] array for each input pair.

In particular, suppose that we use quick-find for the dynamic connectivity problem and wind up with a single component. This requires at least (N-1) calls to union(). union() iterates through all the n elements at least once, which clearly gives us a N 2 complexity. This analysis generalizes to say that quick-find is quadratic for typical applications where we end up with a small number of components. Modern computers can execute hundreds of millions or billions of instructions per second, so this cost is not noticeable if N is small, but we also might find ourselves with millions or billions of sites and connections to process in a modern application. Trying to solve a problem with millions or billions of sites will not give us an acceptable time.

Most defenitely Quick-Find can be impoved..


Visualizing an xml as a graph – Neo4j 101

Neo4j is quite awesome for pretty much everything. Here is an extract I found from this blog

Whenever someone gives you a problem, think graphs. They are the most fundamental and flexible way of representing any kind of a relationship, so it’s about a 50-50 shot that any interesting design problem has a graph involved in it. Make absolutely sure you can’t think of a way to solve it using graphs before moving on to other solution types.

As a baby step, let us try to visualize an xml as a graph.

We can start with a simple xml here,

<?xml version="1.0" encoding="UTF-8"?>
   <name attr="one">Belgian Waffles</name>
   <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>

It is a simple ‘Breakfast Menu’ with just one item, ‘Belgian Waffles‘.

Neo4j has a neat interface which lets us visualize the graph we created. The graph for the above xml looks slick..



The above XML has 6 Tags and we have 6 nodes in our graph. The nested tags/Nodes has a ‘CHILD-OF’ relationship with the Parent tag/Node.

In the graph above,

  • Node numbered 0 is the root node – Corresponds to the <breakfast_menu> tag
  • Node numbered 1 is the immediate child of the root – Corresponds to the <food> tag
  • Nodes numbered from 2-4 are the tags which come under <food> tag, viz <name>, <price>, <description> and <calories>

Transforming XML..

Let us parse the xml into a data structure which can be easily persisted using Neo4j. Personally, I would prefer SAX parser as there are serious memory constraints when creating a DOM object (really painful if you talking big data as xml). Additionally SAX parsing gives you unlimited freedom to do whatever you want to do with the XML.

This is how I represent a node, using the object XmlElement. Each XmlElement is identified by an id. This is basically an integer or long. Only thing to make sure is that the number has to be unique across all XMLElements.

public class XmlElement {

 private String tagName;
 private String tagValue;
 private Map<String, String> attributes = new HashMap<String,String>();
 private HierarchyIdentifier hierarchyIdentifier;
 private int parentId;
 private boolean persisted;

//Setters and Getters for the members

 public String getAtrributeString(){
  ObjectMapper jsonMapper = new ObjectMapper();
   try {
    return jsonMapper.writeValueAsString(this.attributes);
   } catch (JsonProcessingException e) {
  return null;


The XML Tag name and value are stored as string and the attributes are stored as a Map. Also, there is another member Object, HierarchyIdentifier

public class HierarchyIdentifier {
 private int depth;
 private int width;
 private int id;

//Getters and Setters for member element

The HierarchyIdentifier Class contains the id class which is used to identify an XmlElement, which translates to identifying an XML Tag. Each XmlElement has the id of it’s parent stored.

And after the parse, the XML should be represented as a map of <Integer,XmlElement> and each XmlElement identified by the corresponding Integer id. You can see the SAX PARSER here..

So we pass the Map to the GraphWriter object.

We create a node with the following properties

  • Value – The Value of the XML Tag
  • ID – The ID of the current node
  • Parent – The ID of the parent Tag
  • Attributes – The String representation of the Tag attributes

Also, the node has the following Labels

  • NODE – For all nodes
  • PARENT – For the seed ( highest parent ) node
  • XML tag Name – Say, the node for the tag <food> will have label FOOD

Currently, there is only a single type of relationship, ‘CHILD_OF’ , between immediate nested tags.

See the below the example for a bigger XML

  <name attr="one">Belgian Waffles</name>
  <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
  <name attr="two">Masala Dosa</name>
  <description>South India's famous slim pancake with mashed potatoes</description>
   <name first="nikhil"/>

I have added a second item to the ‘breakfast_menu’.. the legendary masala dosa.

Also, the second food item has a new child tag, <eaten>. And the resultant graph looks prettier



The sets of tags,relationships and properties can be seen in the Neo4j local server interface.


The entire codebase can be found here in github..

Visualizing as a GraphGist

An easy representation can be done in GraphGist. I have created the graph of this xml.

Here is the link to the gist. Also, the gist script can be found in the github gist here..


Comparing .csv files and generate sweet HTML reports in rendersnake

Here is a simple library to perform a simple operation. Compare two delimiter separated files and output an .html file.

Simple and straightforward. It can be used to output a ‘Comparison Result’ object, or it can be used to print out an .html report


  1. Compare methodology can be customized using a properties file
  2. Report can be customized using a properties file

Using rendersnake and it pretty awesome.. It is available on github and v1.0 is somewhere between the first cry of the baby and kindergarten.

Here is the gitlab link.

See a report sample,




Git Internals – The commit tree

Git stores the commits as a trees and blobs, and a tree object can consist of both trees and blobs as it’s children. An example of storage as blob can be seen here.

See the file structure of a project below

nikhil@nikhil-Inspiron-3537:~/dev/blog/git$ tree
|-- about.html
|-- css
| `-- simple.css
|-- index.html
`-- list.html

1 directory, 4 files

It is pretty straight forward with a couple of .html files and one .css file in a separate directory. Please note that all these files are committed, and that means it has already been written out to the underlying storage. To make sure we can use the really useful git log command,

nikhil@nikhil-Inspiron-3537:~/dev/blog/git/tree$ git log
commit 300f5c42a5aed68268547a95db4f40b6b122fb5b
Author: nikhil <nikhil@nikhil-Inspiron-3537.(none)>
Date: Sun Mar 16 16:52:54 2014 +0530

initial commit

It shows only one commit has been made. It also returns a hash value that is associated with the commit.

The cat-file command we overused and abused in the previous post can be used to see the tree corresponding to last commit. We pass the hash value of the commit as a parameter.

nikhil@nikhil-Inspiron-3537:~/dev/blog/git/tree$ git cat-file -p 300f5c42a5aed68268547a95db4f40b6b122fb5b
tree 5d0d7785b65180c195f7a1bf3cf02218b56f6f0a
author nikhil <nikhil@nikhil-Inspiron-3537.(none)> 1394968974 +0530
committer nikhil <nikhil@nikhil-Inspiron-3537.(none)> 1394968974 +0530

initial commit

So, the commit hash points to a tree, whose hash is displayed. Let us see what the tree contains,

nikhil@nikhil-Inspiron-3537:~/dev/blog/git/tree$ git cat-file -p 5d0d7785b65180c195f7a1bf3cf02218b56f6f0a
100644 blob 09e16f36b3c4993ba924b1074629283a49869be9 about.html
040000 tree 02ff2e2946f969bc640886861ff8c7039e1a2339 css
100644 blob 9015a7a32ca0681be64471d3ac2f8c1f24c1040d index.html
100644 blob b92b8b70267846c8b21b5ad412666cb99f9c9211 list.html

This tree contains three blobs for the three .html files and another tree for the css directory. Let us go into this tree,

nikhil@nikhil-Inspiron-3537:~/dev/blog/git/tree$ git cat-file -p 02ff2e2946f969bc640886861ff8c7039e1a2339
100644 blob dac138d9e013a2e9a10e67d793bd4703c1b86bd1 simple.css

It contains the .css file. So, the entire structure looks something like this.

commit one

Now, lets make a mall change to the index.html file and do a second commit. After this, this is how the log looks like,

nikhil@nikhil-Inspiron-3537:~/dev/blog/git/tree$ git log
commit ec8b103771588498923711c036ff3280c863f713
Author: nikhil <nikhil@nikhil-Inspiron-3537.(none)>
Date: Sun Mar 16 17:36:33 2014 +0530

second commit

commit 300f5c42a5aed68268547a95db4f40b6b122fb5b
Author: nikhil <nikhil@nikhil-Inspiron-3537.(none)>
Date: Sun Mar 16 16:52:54 2014 +0530

initial commit

There are two commits and both of them have two different hash values,

First Commit : 300f5c42a5aed68268547a95db4f40b6b122fb5b (Initial Commit)

Second Commit : ec8b103771588498923711c036ff3280c863f713 (Second Commit)

Let us follow the second commit tree like we did before,

nikhil@nikhil-Inspiron-3537:~/dev/blog/git/tree$ git cat-file -p ec8b103771588498923711c036ff3280c863f713
tree 455e56fbce4fe35ee64d9e7af572e5b0adef14f6
parent 300f5c42a5aed68268547a95db4f40b6b122fb5b
author nikhil <nikhil@nikhil-Inspiron-3537.(none)> 1394971593 +0530
committer nikhil <nikhil@nikhil-Inspiron-3537.(none)> 1394971593 +0530

second commit

The second commit is a child of the first commit, which is interesting, as it is exactly how we see in tools like gitk. A commit contains a reference to its parent commits. While there is usually just a single parent (for a linear history), a commit can have any number of parents in which case it’s usually called a merge commit. Most workflows will only ever make you do merges with two parents, but you can really have any other number too.

Going further deep into the tree,

nikhil@nikhil-Inspiron-3537:~/dev/blog/git/tree$ git cat-file -p 455e56fbce4fe35ee64d9e7af572e5b0adef14f6
100644 blob 09e16f36b3c4993ba924b1074629283a49869be9 about.html
040000 tree 02ff2e2946f969bc640886861ff8c7039e1a2339 css
100644 blob b110c44fd08f191062636f18cfeeaeccd5be1b73 index.html
100644 blob b92b8b70267846c8b21b5ad412666cb99f9c9211 list.html

The most interesting thing to note here is that, all the hash values, except the one for index.html remains the same.

The two commit trees look something like this.

commit two

In the second commit tree, the hash values for the unchanged files are the same as the previous commit tree. Now, just see the hash values as pointers to files. The second tree points to the unchanged files.

Well, that’s it.. commit trees…

Working with paths and files in Java 7

Well, lots happening in the latest java upgrade and ironically most of us are comfortable with Java 5 or 6. Enough said.

The changes in NIO is quite impressive. Find below the quick reference for paths. The new Classes are found in the java.nio.file package. Lets see though the most common and useful entry points.

  • java.nio.file.Files;
  • java.nio.file.Path;
  • java.nio.file.Paths;

Starting with Path,

The Path class includes various methods that can be used to obtain information about the path, access elements of the path, convert the path to other forms, or extract portions of a path. Let’s see how to create a path

// Microsoft Windows syntax
Path path = Paths.get("C:\\home\\joe\\foo");

// Solaris syntax
Path path = Paths.get("/home/joe/foo");

Path contains all sorts of useful methods to get information about the file system. For instance, the relativize method that can be used to construct a relative path between two paths. Paths can be compared, and tested against each other using the startsWith and endWith methods.

See the example for a Windows file system

Path path = Paths.get("C:\\Temp\\data\\git-new.png");
System.out.format("The path is %s%n", path.toString());
System.out.format("The path's parent is %s%n", path.getParent().toString());
System.out.format("The path's root is %s%n", path.getRoot().toString());

Path relativeRoot = Paths.get("C:\\Temp\\");

System.out.format("%nRelatice root is is %s%n", relativeRoot.toString());
Path relativePath = relativeRoot.relativize(path);

System.out.format("%nRelative path is %s%n", relativePath.toString());

And the output

The path is C:\Temp\data\git-new.png
The parent of the path is C:\Temp\data
The root of the path is C:\

The relative root path is C:\Temp

The relative path is data\git-new.png

Moving onto Files,

This class consists exclusively of static methods that operate on files, directories, or other types of files. Files works like a breeze using Path.

Lets see a couple of uses

Reading and writing from/to a file 

Let us see two methods, one which reads from a file to a byte array and the another which writes to a file from a byte array. Both of them work with Paths.

// The below methods are for small files

byte[] readSmallBinaryFile(String aFileName) throws IOException {
Path path = Paths.get(aFileName);
return Files.readAllBytes(path);

void writeSmallBinaryFile(byte[] aBytes, String aFileName) throws IOException {
Path path = Paths.get(aFileName);
Files.write(path, aBytes); //creates, overwrites

//In case of large files, use a BufferedWriter

Charset charset = Charset.forName("US-ASCII");
String s = "";
try (BufferedWriter writer = Files.newBufferedWriter(path, charset)) {
writer.write(s, 0, s.length());
} catch (IOException x) {
System.err.format("IOException: %s%n", x);

Most beautiful way to move files,

Moving files were a bit cumbersome, considering the amount of code needed for so trivial an operation, see here.

The Files class has a static method move,

public static Path move(Path source,
        Path target,
        CopyOption... options)
                 throws IOException

This takes in two path parameters, one for the source and one for the target. The third parameter being a list of CopyOptions. Right now, there are three Copy Options, and there could be more.

See the example below,

Path source = /var/tmp/source/out.log;
Path target = /vat/tmp/destination/out.log;

And the copy options are pretty cool,


Move the file as an atomic file system operation.

Copy attributes to the new file.

Replace an existing file if it exists.

String intern() function and String Pool

Let’s start with the  JVM spec..

The Java programming language requires that identical string literals (that is, literals that contain the same sequence of code points) must refer to the same instance of class String. In addition, if the method String.intern is called on any string, the result is a reference to the same class instance that would be returned if that string appeared as a literal. Thus, the following expression must have the value true:

Elucidating, if there are more than one string literal with the same value, they share the memory.

Consider the example. An application X processes the results from a questionnaire which expects an YES/NO response. Adding on, there are more than a billion replies. If you store  all those ‘YES’ and ‘NO’ values separately, you’ll end up clogging up your precious memory. So, why don’t we have a single string for ‘YES’ and let the trizillion literals point to it. This is exactly what JAVA does and the table where they store all such string literals is called the String Pool

String sampleOne = "Dylan";
String sampleTwo = "Dylan";
String sampleThree = new String("Dylan");

if(sampleOne == sampleTwo){
System.out.println("One and Two are same");

if(sampleOne == sampleThree){
System.out.println("One and Threeo are same");

In line 1, we create a String with value “Dylan”. In this process, Java automatically adds a reference of the String “Dylan” to the String Pool. In line 2, when we try to create another string, Java sees if this is already available in the String Pool and just points the literal to the value. But in line 3, we use the new operator to create a string. This is a bit off the mark, the new operator forces Java to ignore the String Pool and create a new instance. This’s the reason why line 6 is printed and line 10 is not… This very well explains why it’s not advisable to use a new operator while creating a String.

intern() method

The docs tells us that the intern() method returns a canonical representation of the string. In other words, using the intern() method, we are forcing Java to check the String pool and return the value of the String. Let me modify the above example

String sampleOne = "Dylan";
String sampleTwo = "Dylan";
String sampleThree = new String("Dylan").intern();
if(sampleOne == sampleTwo){
System.out.println("One and Two are same");
if(sampleOne == sampleThree){
System.out.println("One and Threeo are same");

Here, both line 6 and line 10 is printed.
Instead of creating a new instance in line 3, the intern() forces Java to return the String from String Pool.
Enough Said

A Requiem for Marker Interfaces

Having written about Java serialization previously, I thought I’d put in some research into the concept of ‘Marker Interfaces’ or ‘Tag Interfaces’. Well, getting matters straight, technically, the word ‘Marker’ is an invention of bloggers and technical writers. Java specification says nothing about an interface being a ‘Marker Interface’. You won’t find the word ‘Marker’ at all..

A marker interface is an interface with no method declarations. They just tell the compiler that the objects of the classes implementing the interfaces with no defined methods need to be treated differently. These Marker interfaces are used by other code to test for permission to do something.

Some well-known examples are – object implement it can be serialized using ObjectOutputStream.
java.lang.Clonable – objects clone method may be called

They are also known as tag interfaces since they tag all the derived classes into a category based on their purpose

On Interfaces..

Well, what exactly is an interface.

One, I would say an interface is a contract. It is a contract which says the class which implements the interface will implement all the methods specified by the interface.

Two, I say interface is an ‘is a’ relationship. For instance, take a class Sedan which implements the interface Car. We can say that the Seday is a Car. Better example, ArrayList implements List and we say ArrayList is a List.

Let’s take the case of a marker interface Serializable. A class, say Document implements Serializable. Document does not have to implement any methods because the interface Serializable does not specify any methods. This is not right. The Serializable interface does not behave similar to the other interfaces.

Also, it does not make sense to say Document is a Serializable. Instead, Document is Serializable. Again, the interface Serializable does not fit in with the rest. Let me add that, for a marker interface the relationship is, ‘IS’. Class Document is Serializable, or the class Person is Clonable.

Interface Serializable..

This is perhaps the most popular Marker Interface. I won’t talk about serializing an object, you can read more on that. here..

1. FileOutputStream fos = new FileOutputStream("temp.out");
2. ObjectOutputStream oos = new ObjectOutputStream(fos);
3. Match m1 = new Match();
4. oos.writeObject(m1);

In the snippet of code above, I am trying to serialize an object m1 of Type Match. The ObjectOutputStream then attempts to serialize the object. If you inspect the ObjectOutputStream source, you can see a check for Serializable happening.

private void writeObject0(Object obj, boolean unshared) trows IOException
else if (obj instanceof Enum) {
 writeEnum((Enum) obj, desc, unshared);
else if (obj instanceof Serializable) {
 writeOrdinaryObject(obj, desc, unshared);
else {
 if (extendedDebugInfo) {
  throw new NotSerializableException(
   cl.getName() + "\n" + debugInfoStack.toString());
 } else {
  throw new NotSerializableException(cl.getName());

They are doing a check to see if the Object to be serialized is of the type Serializable. If it’s not, a suitable exception is thrown.

So, what do we benfit from implementing the Serializable interface here?

‘Marker Interface Pattern’

A marker interface provides a metadata to the class. The metadata being ‘this class is serializable’. Or in cases like the Clonable interface, the marker interface states the class is ‘Clonable’. There is a wiki page on ‘Marker Interface Pattern’.

“The marker interface pattern is a design pattern in computer science, used with languages that provide run-time type information about objects. It provides a means to associate metadata with a class where the language does not have explicit support for such metadata.”

So, the concept of Marker Interfaces came about in Java at a time when there was no support for metadata. But now we have support in terms of Annotations. I can debate that, whatever is achieved through Marker interfaces, can be achieved through Annotations too.

A worthy substitute?

So, is annotation a worthy substitute for Marker Interfaces? I say yes. See the example below.

01: @interface Drivable{
02: }
03: @Drivable
04: public class Car{
05: public void performAction(Object obj){    
06:    if (!obj.getClass().isAnnotationPresent(Drivable.class)) {         
07:       throw new IllegalArgumentException("cannot perform action...");    
08:    }     
09:    else {        
10:       //do stuff as require    
11:    }
12: }

Pretty straightforward stuff. Using a custom annotation ‘Drivable’, we are checking if a class, in this case Car, is Drivable which perfectly satisfies the ‘IS’ rule.

Having said all this, I must add using annotation comes with a catch. The check for ‘IS’ can occur only at runtime, but for a marker interface, it can throw a compile time error. Plus, annotation work on reflection which is a pretty heavy operation.

As a verdict, I can say that going by core Object Oriented norms, Annotations can beat Marker Interfaces anyday.

Classloaders in Java

I’m an altruist who’s amazed by beautiful things out of my reach. My zeal increases whenever I come to know them better. One of them is the magical land of classloaders, the A-Team that loads all the classes in the JVM. Enough with the twaddling, lets get to business.

Classloaders are pivotal in the lifecycle of a JVM. Like everything in an object oriented language, each Classloader is an object – an instance of class that extends java.lang.ClassLoader. 

Let’s look at what the particular class has got. (An extract from ZeroTurnaround’s whitepaper)



So it’s an abstract class. The major methods we have above are loadClass and defineClass. 

Define class takes in the byte code for a compiled class and returns a Class instance. The byte code can be loaded from the disk or a remote location.

LoadClass is the superstar. It is responsible for loading the class to the JVM. The parameter to be passed is the Class name in the standard enforced by JAVA.(roughly the classname with the package structure)

So, making matters smooth, if you already have a class to be loaded, you can straightaway use the loadClass method, or else you’ll have to use the defineClass first. See the example



There are a couple of things that can me made out from the example above. Understanding them is critical while understanding Classloaders.

  1. Every class has a reference to it’s corresponding static class variable. The class instance was created when this class was initially loaded. 
  2. The static class variable has a reference to it’s ClassLoader. It can be seen in the classloader is fetched using the getter and consequently the loadClass method is invoked.
  3. Since the class is already available, it can be easily loaded using the loadClass method. Otherwise, we would have to use defineClass method.


ClassLoader Hierarchy

It’s not just one classloader that’s behind the play. Let’s split them into three.

  1. Bootstrap Classloader
  2. Extension Classloader
  3. System Classpath Classloader



The bootstrap class loader kicks in first and loads all the java standard classes. Then comes the extension class loader which uses the java extension functionality which loads from the ‘java.ext.dirs’ directory. I wish to write more about the extension framework, but I will save it for later. As far as a developer is concerned, the System classpath loader is the most important one. This loads the classes from the defined classpath. This will include all the user created classes ( minus the user created libraries which should ideally go under extension ).

As it’s evident from the figure above, all the classloader has a parent except the Bootstarp classloader which occupies the highest level in the hierarchy. Whenever there is need for loading a class, the classloader follows a delegation approach. The system classpath loader sees if it can be loaded by extension classloader. If extension class loader can’t, the extension loaded delegates it to the bootstrap class loader.

Having said this order of delegation, from classpath to extension to bootstrap, i have to add that the order is reversed in the case of enterprise applications. I’ll save that discussion for sometime later.

Serialization in JAVA

One of the most interesting corners in ‘Java World’ is serialization. Serialization provides the developer with the ability to store an object’s state in memory, or converting the object to a stream of binary digits, and stored in memory.

Advantages of Serialization

1: Ease of transferring data between objects : In the case of a modern enterprise application, which is spread across various platforms, the need to transmit data among the various components is very critical. In java, where everything is stored as objects, it’s reasonable to transmit data in the form of objects. But for all the platforms to communicate with each other, there should be a common means of communication, a common protocol ( which is nothing but a set of rules ).

In our case, the common protocol is sending data as a stream of bytes. When data is being transferred in the form of objects, the following steps take place in the light of the above figure:
Step 1: The object to be transferred is serialized and stored in the memory of Machine 1.
Step 2: The object stored in the form of a bit stream is transmitted across the network.
Step 3: The Machine 2 receives the bit stream, which is deserialized to get back the object.

2. Freeing up space in the RAM : Now let us see the next advantage of serialization. Making long story short, whenever u feel like your RAM is crowded with a large number of objects and they are eating up your resources, you can serialize some of them to the hard disk. only to deserialize and use them again.
Consider the following example. Suppose, you are running a online store application on your machine where each customer is stored as an object. This object contains all the login details, personal details and the status of the customer’s cart. A single customer object may be so small as a few kilobytes. But in an environment like this, when the number of customers logged in at a moment may go up to several thousands. All of the customers who logged in may not be active. So there is no point in wasting your precious RAM resource for them. So these objects can be serialized to the hard disk, and can be deserialized and used again whenever the particular customer becomes active again.

What is serialized?

In the above discussion, we have not clearly mentioned what part of the object is being serialized. In fact, only the state of the object is being preserved, i.e the values on the variables. Along with this, some metadata on the object itself is being stored. This metadata is used to store some information on the object. Consider the class shown below:



class Match implements Serializable {
public int score = 100;
public int target = 256;
When this class is serialized,

1: The two integer values, viz score and target are converted to bitstream and stored.
2: The metadata about the class is stored. The metadata includes fields such as:
1. The serialization version
2. Length of the class name
3. Number of fields in this class

Also note that, the serialized information can be stored in more than one way. Observe the diagram shown below, it can be plainly to memory, to a file, or to a dedicated database.

Implementing Serialization

Now, let us move on to the most important section, i.e. how to implement serialization.

In order to serialize an object, we have to ensure that the object implements the interface


class Match implements Serializable {
public int score = 100;
public int target = 256;

Serializable Interface and Marker Interface

The Serializable interface is, and empty interface. If u check the code of Serializable interface, you would find something like this.

public interface Serializable

But, dont get fooled by the plain looks. The serializable class does more than what it seems. Serializable is a type of interfce known as marker interface, which do not have any methods, but does the function of tagging a particular object. In this case, the serializable interface tags the object, there by communicating to the JVM that the particualar object can be serialized. An object cannot be serialized until and unless it implements the serializable interface. Other examples of marker interfaces are Clonable, SingleThreadedModel, EventListener etc.

Now that the class or object is eligible for serialization, lets get to the part where we actually serialize it.

See the snippet of code that is employed to serialize the object:

1. FileOutputStream fos = new FileOutputStream("temp.out");
2. ObjectOutputStream oos = new ObjectOutputStream(fos);
3. Match m1 = new Match();
4. oos.writeObject(m1);


The following classes are used here:

FileOutputStream: A FileOutputStream is an OutputStream which is used to write to a file. An OutputStream is a class used to transfer a byte stream. The OutputStream takes in bytes and outputs them to a sink. In the case of FileOutputStream, the sink is a file.

ObjectOutputStream: The ObjectOutputStream is used to write primitive data types and an object to an OutputStream. To make the method of serializable foolproof, only those classes which implements serializable can be written onto an OutputStream.

Now lets see how the code works:

Line 1: A new FileOutputStream instance is created which writes to a file by the name temp.out.
Line 2: A new ObjectOutputStream instance is created which writes the output to the FileOutputStream created in line 1.
Line 3: A new instance of the object to be serialized is created. The same example used above is used here too.
Line 4: The object instantiated in line 3 is written onto the OutputStream, which inturn writes to the file, temp.out. The class of the object, the signature of the class, and the values of the non-transient and non-static fields of the class and all of its supertypes are written.

That is all there to it.