Monday, June 18, 2012

Working with binary trees in Lisp


Success finally! I have been trying to create and work with binary trees in Lisp for sometime. It is really difficult in Lisp in which there are no real data structures. Trees are so obvious in a language like C where one can visualize the left & right branches with pointers. The structure of a node in C is
struct node tree{
int value
struct node *left;
struct node *right;
}
Lisp has no such structures. Lisp is a list or at best a list of lists. It is really how the program interprets the nested list of lists that makes the list a binary or a n-ary tree. As I mentioned before I have not had a whole lot of success in creating the binary tree in Lisp for quite sometime. Finally I happened to come across a small version of adding an element to a set in “Structure and Interpretation of Computer Programs (SICP)” by Harold Abelson, Gerald and Julie Sussman. This is probably one of the best books I have read in a long time and contains some truly profound insights into computer programming.
I adapted the Scheme code into my version of a adding a node. Finally I was able to code inorder, pre-order and post order traversals.
In this version the a node of a binary tree is represented as
(node left right) so
node -> (car tree)
left_branch -> (cadr tree)
right_branch is -> (caddr tree)
Here is the code
(defun entry (tree)
(car tree))
(defun left-branch (tree)
(cadr tree))
(defun right-branch (tree)
(caddr tree))
//Create node in a binary tree
(defun make-tree (entry left right)
(list entry left righta))
// Insert element into tree
add (x tree)
(cond ((null tree) (make-tree x nil nil))
((= x (entry tree)) tree)
((< x (entry tree)) (make-tree (entry tree) (add x
(left-branch tree)) (right-branch tree)))
((> x (entry tree)) (make-tree (entry tree)
(left-branch tree) (add x (right-branch tree))))))
So I can now create a tree with a create-tree function

(defun create-tree(elmnts)
(dolist (x elmnts)
(setf tree (add x tree)))
)
(setf tree nil)
NIL
(setf lst (list 23 12 1 4 5 28 4 9 10 45 89))
(23 12 1 4 5 28 4 9 10 45 89)
(create-tree lst)
NIL

Now I display the tree
tree
(23 (12 (1 NIL (4 NIL (5 NIL (9 NIL (10 NIL NIL))))) NIL) (28 NIL (45 NIL (89 NIL NIL))))

This can be represented pictorially as


Now I created the 3 types of traversals
(defun inorder (tree)
(cond ((null tree))
(t (inorder (left-branch tree))
(print (entry tree))
(inorder (right-branch tree))))))
(defun preorder (tree)
(cond ((null tree))
(t (print (entry tree))
(preorder (left-branch tree))
(preorder (right-branch tree))))))
(defun postorder (tree)
(cond ((null tree))
(t (postorder (left-branch tree))
(postorder (right-branch tree))
(print (entry tree)))))
[89]> (inorder tree)
1
4
5
9
10
12
23
28
45
89
[90]> (preorder tree)
23
12
1
4
5
9
10
28
45
89
T
[91]> (postorder tree)
10
9
5
4
1
12
89
45
28
23
23

Sunday, June 10, 2012

Test driving Apache Hadoop: Standalone & pseudo-distributed mode


The Hadoop paradigm originated from Google and is used in crunching large data sets. It is ideally suited for applications like Big Data, creating an inverted index used by search engines and other problems which require terabytes of data to processed in parallel. One key aspect of Hadoop is that it is made up of commodity servers. Hence server, disk crashes or network issues are assumed to be norm rather than an exception.
The Hadoop paradigm is made of the Map-Reduce  & the HDFS parts. The Map-Reduce has 2 major components to it. The Map part takes as input key- value pairs and emits a transformed key-value pair. For e.g Map could count the number of occurrences of words or created an inverted index of a word and its location in a document. The Reduce part takes as input the emitted key-value pairs of the Map output and performs another operation on the inputs from the Map part for.e.g summing up the counts of words. A great tutorial on Map-Reduce can be found at http://developer.yahoo.com/hadoop/tutorial/module4.html
The HDFS (Hadoop Distributed File System) is the special storage that is used in tandem with the Map-Reduce algorithm. The HDFS distributes data among Datanodes. A Namenode maintains the meta data of where individual pieces of data are stored.
To get started with Apache Hadoop download a stable release of Hadoop from
a) Install Hadoop on your system.
Apache Hadoop requires Java to be installed. Download and install Java on your machine from
b) After you have installed java set $JAVA_HOME in
/usr/share/hadoop/templates/conf/hadoop-env.sh
export JAVA_HOME=${JAVA_HOME}
c) Create a user hduser in group hadoop
For this click  Applications->Other->User & Groups
Choose Add User – hduser  &  Add Group – hadoop
Choose properties and add hduser to the hadoop group.
Standalone Operation
Under root do
/usr/sbin/sshd
then
$ssh localhost


If you cannot do a ssh to localhost with passphrase then do the following
$ ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Now  re-run
$ssh localhost – This time it should go fine
Create a directory input and copy *.xml files from conf/
$mkdir input
$cp /usr/share/hadoop/templates/conf/*.xml input
Then execute the following. This searches for the string “dfs*” in all the XML files under the input directory
$/usr/bin/hadoop jar /usr/share/hadoop/hadoop-examples-1.0.3.jar grep input output ‘dfs[a-z.]+’
You should see
[root@localhost hadoop]# /usr/bin/hadoop jar /usr/share/hadoop/hadoop-examples-1.0.3.jar grep input output ‘dfs[a-z.]+’
12/06/10 13:00:51 INFO util.NativeCodeLoader: Loaded the native-hadoop library
..
12/06/10 13:01:45 INFO mapred.JobClient:     Reduce output records=38
12/06/10 13:01:45 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
12/06/10 13:01:45 INFO mapred.JobClient:     Map output records=38
Where it indicates that there are 38 such record strings with dfs* in it.
If you get an error
java.lang.OutOfMemoryError: Java heap space
then increase the heap size from 128 to 1024 as below
mapred.child.java.opts
-server -Xmx1024m -Djava.net.preferIPv4Stack=true
….
Pseudo distributed mode
In the pseudo distributed mode separate Java processes are started for the Job Tracker which schedules tasks, the Task tracker which executes tasks and the Namenode which contains the data
Note: Though I created user “hduser” in the group hadoop and also added the user to the wheel group with admin privileges all the commands were issued under root as I was getting a permission denied in the hduser account.
a) Execute the following commands under root
$mkdir -p /home/hduser/hadoop/tmp
$chown hduser:hadoop /home/hduser/hadoop/tmp
$chmod 750 /home/hduser/hadoop/tmp
b) Do the following
Note: Files  core-site.xml, mapred-site,xml & hdfs-site.xml exist under
/usr/share/hadoop/templates/conf & /etc/hadoop
It appears that Apache hadoop gives precedence to /etc/hadoop. So add the following between
In file /etc/hadoop/core-site.xml
hadoop.tmp.dir
/home/hduser/hadoop/tmp
A base for other temporary directories.
fs.default.name
hdfs://localhost:54310
The name of the default file system.  Either the
literal string “local” or a host:port for NDFS.
true
In /etc/hadoop/mapred-site.xml
mapred.job.tracker
localhost:54311
true
In /etc/hadoop/hdfs-site.xml add
dfs.replication
1
c) Since the pseudo distributed mode will use the HDFS file system we need to format this.So run the following command
$/usr/bin/hadoop namenode -format
[root@localhost hduser]#  /usr/bin/hadoop namenode -format
12/06/10 15:48:16 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = localhost.localdomain/127.0.0.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.0.3
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192; compiled by ‘hortonfo’ on Tue May  8 20:16:59 UTC 2012
************************************************************/
12/06/10 15:48:17 INFO common.Storage: Image file of size 110 saved in 0 seconds.
12/06/10 15:48:18 INFO common.Storage: Storage directory /home/hduser/hadoop/tmp/dfs/name has been successfully formatted.
12/06/10 15:48:18 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1
d) Now start all the Hadoop processes
$/usr/sbin/start-all.sh
starting namenode, logging to /var/log/hadoop/root/hadoop-root-namenode-localhost.localdomain.out
localhost: starting datanode, logging to /var/log/hadoop/root/hadoop-root-datanode-localhost.localdomain.out
localhost: starting secondarynamenode, logging to /var/log/hadoop/root/hadoop-root-secondarynamenode-localhost.localdomain.out
starting jobtracker, logging to /var/log/hadoop/root/hadoop-root-jobtracker-localhost.localdomain.out ocalhost: starting tasktracker, logging to /var/log/hadoop/root/hadoop-root-tasktracker-localhost.localdomain.out
Verify that all processes have started by executing /usr/java/jdk1.7.0_04/bin/jps
[root@localhost hduser]# /usr/java/jdk1.7.0_04/bin/jps
10971 DataNode
10866 NameNode
11077 SecondaryNameNode
11147 JobTracker
11264 TaskTracker
11376 Jps
You will see JobTracker,taskTracker,NameNode,DataNode and SecondaryNameNode
You can also do netstat -plten | grep java
tcp        0      0 0.0.0.0:50090               0.0.0.0:*                   LISTEN      0          166832     11077/java
tcp        0      0 0.0.0.0:50060               0.0.0.0:*                   LISTEN      0          167407     11264/java
tcp        0      0 0.0.0.0:50030               0.0.0.0:*                   LISTEN      0          166747     11147/java
tcp        0      0 0.0.0.0:50070               0.0.0.0:*                   LISTEN      0          165669     10866/java
tcp        0      0 0.0.0.0:50010               0.0.0.0:*                   LISTEN      0          166951     10971/java
tcp        0      0 0.0.0.0:50075               0.0.0.0:*                   LISTEN      0          166955     10971/java
tcp        0      0 127.0.0.1:55839             0.0.0.0:*                   LISTEN      0          166816     11264/java
tcp        0      0 0.0.0.0:50020               0.0.0.0:*                   LISTEN      0          165843     10971/java
tcp        0      0 127.0.0.1:54310             0.0.0.0:*                   LISTEN      0          165535     10866/java
tcp        0      0 127.0.0.1:54311             0.0.0.0:*                   LISTEN      0          166733     11147/java
e) Now copy files from your local directory /home/hduser/input  to the HDFS file system
Now you can check the web interface for JobTracker & NameNode
This is as per mapred-site.xml & hdfs-site.xml in /conf directory. They are at
ñJobTracker – http://localhost:50030/
ñNameNode - http://localhost:50070/
f) Copy files from your local directory to HDFS
$/usr/bin/hadoop dfs -copyFromLocal /home/hduser/input /user/hduser/input
Ensure that the files have been copies by listing the contents of HDFS
g) Check that files have been copied
$/usr/bin/hadoop dfs -ls /user/hduser/input
Found 9 items
-rw-r–r–   1 root supergroup       7457 2012-06-10 10:31 /user/hduser/input/capacity-scheduler.xml
-rw-r–r–   1 root supergroup       2447 2012-06-10 10:31 /user/hduser/input/core-site.xml
-rw-r–r–   1 root supergroup       2300 2012-06-10 10:31 /user/hduser/input/core-site_old.xml
-rw-r–r–   1 root supergroup       5044 2012-06-10 10:31 /user/hduser/input/hadoop-policy.xml
-rw-r–r–   1 root supergroup       7595 2012-06-10 10:31 /user/hduser/input/hdfs-site.xml
h) Now execute the grep functionality
[root@localhost hduser]# /usr/bin/hadoop jar /usr/share/hadoop/hadoop-examples-1.0.3.jar grep /user/hduser/input /user/hduser/output ‘dfs[a-z.]+’
12/06/10 10:34:22 INFO util.NativeCodeLoader: Loaded the native-hadoop library
….
12/06/10 10:34:23 INFO mapred.JobClient: Running job: job_201206101010_0003
12/06/10 10:34:24 INFO mapred.JobClient:  map 0% reduce 0%
12/06/10 10:34:48 INFO mapred.JobClient:  map 11% reduce 0%
12/06/10 10:35:21 INFO mapred.JobClient:  map 88% reduce 22%
12/06/10 10:35:24 INFO mapred.JobClient:  map 100% reduce 22%
12/06/10 10:35:27 INFO mapred.JobClient:  map 100% reduce 29%
12/06/10 10:35:36 INFO mapred.JobClient:  map 100% reduce 100%
12/06/10 10:35:42 INFO mapred.JobClient: Job complete: job_201206101010_0003
….
12/06/10 10:36:16 INFO mapred.JobClient:     Reduce input groups=3
12/06/10 10:36:16 INFO mapred.JobClient:     Combine output records=0
12/06/10 10:36:16 INFO mapred.JobClient:     Physical memory (bytes) snapshot=180502528
12/06/10 10:36:16 INFO mapred.JobClient:     Reduce output records=36
12/06/10 10:36:16 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=695119872
12/06/10 10:36:16 INFO mapred.JobClient:     Map output records=36
i) Check the result
[root@localhost hduser]# /usr/bin/hadoop dfs -cat /user/hduser/output/*
6          dfs.data.dir
2          dfs.
2          dfs.block.access.token.enable
2          dfs.cluster.administrators
2          dfs.datanode.address
2          dfs.datanode.data.dir.perm
2          dfs.datanode.http.address
2          dfs.datanode.kerberos.principal
2          dfs.datanode.keytab.file
2          dfs.exclude
…..
j)/usr/sbin/stop-all.sh
Have fun with hadoop…

Saturday, June 9, 2012

Sneak preview of Windows 8 with VMWare Workstation 8.0.3


Here’s a sneak preview of Windows 8 evaluation version using VMWare’s Workstation 8.0.3. For those who read my earlier post “Experiences with VMWare Workstation 8.0.3 : The good, bad and the Ugly” the Windows 8 VM experience  must definitely rate as good. The setup and installation of Windows 8 in Workstation was a breeze. There was just one hiccup which is mentioned below.
The initial experience with Windows 8 is truly breath taking. The metro-style screen with its mosaic of tiles looks really great. Besides, Microsoft with Windows 8 is definitely taking the right path with a tile for the App Store and the SkyDrive. More on that later…
To get started download Windows 8 Release preview ISO image fromhttp://windows.microsoft.com/en-US/windows-8/iso. Make a note of the Product key in the page.
Start your VMWare Workstation and choose “Create a new VM”. Browse to the directory which has the ISO image start the VM. Use the product key that you made a note of in the download page. While the installation will start you are bound to run into the error “Windows cannot read the product key from the unattend answer file”. To fix this issue power off the Windows 8 VM. Now select the “Settings” of the VM and remove floppy drive from the settings. Now Power on your VM. This time things should go smoothly and you installation process should begin.
Soon you should see Windows 8 installation screen
Choose the custom option as shown below
The installation should start and you should see
Follow the prompts and pretty soon you should see a really appealing Windows 8 metro style screen. The screen has a really cool tiled look. In fact with this look icons seem almost passe.
A quick look at this screen and you will see that Microsoft has now included the Store (App Store) and the SkyDrive. I am certain both of these will be put to great use in the future. Games and apps will be downloaded from the App Store. Play around the desktop.
Windows 8 is supposed to be based on touch where the user touches the screen to select an application. To navigate between applications or to get back to the metro-style screen move the mouse to the lower left corner of the screen and you should see a small metro-style screen. The top left corner has your current running applications.
I wanted to check out the Skydrive. So I created 2 text files in my Documents folder and selected Skydrive.
You can right click the files and select them. Go the bottom right corner and right click. You should see the task bar pop up. Click add and you will get a screen as shown below
Uploading files and folders to the cloud is bound to be commonly used in the not too distant future. The Skydrive right on your desktop will be a god send for users who want to keep a back up copy on the Cloud.
The App Store is another alluring addition.
If the boot times and load times of applications are really  fast in Windows 8 then Windows 8 looks to be a clear winner.

In fact with the stylish tiled look, touch interface, app store and Skydrive Windows 8 may actually give iPad a run for its money given the fact that Windows 8 provides actual computing capability in addition to consuming content.

Friday, June 8, 2012

Taking baby steps with Lisp


Lisp can be both fascinating and frustrating. Fascinating, because you can write compact code to solve really complex problems. Frustrating, because you can easily get lost in its maze of parentheses. I, for one, have been truly smitten by Lisp. My initial encounter with Lisp did not yield much success as I tried to come to terms with its strange syntax. The books I read on the Lisp language typically gloss over the exotic features of Lisp like writing Lisp code to solve the Towers of Hanoi or the Eight Queens problem. They talk about functions returning functions, back quotes and macros that can make your head spin.

I found this approach extremely difficult to digest the language. So I decided to view Lisp through the eyes of any other regular programming language like C, C++,, Java, Perl, Python or Ruby. I was keen on being able to do regular things with Lisp before I try out its unique features. So I decided to investigate Lisp from this view point and learn how to make Lisp do mundane things like an assignment, conditional, loop, array, input and output etc.

This post is centered on this fact.

Assignment statement
The most fundamental requirement for any language is to perform an assignment. For e.g. these are assignment statements in Lisp and its equivalent in C for e.g.
$ (setf x 5)                                                         - - - > ; $ x = 5
$ (setf x (+  (* y 2) (* z 8))                               - - - ->  $x = 2y + 8z

Conditional statement

There are a couple of forms of conditional statement in Lisp. The most basic is the ‘if’ statement which is special case. You can do if-then-else without the possibility of if-then-else if-else if - else
if (condition) statement else-statement
In Lisp this is written as
$(setf x 5)
$ (if (= x 5)
(setf x  (+ x 5))
(setf  (- x 6)))
10

In C this equivalent to
$ x = 5
$ if (x == 5)
x = x + 5;
else
x = x -6;

However Lisp allows the if-then-else if – else if –else through the use of the COND statement
So we could write

$ (setf x 10)
$ (cond ((< x 5) (setf x (+ x 8)) (setf y (* 2 y)))
((= x 10) (setf x (* x 2)))
(t (setf x 8)))
20

The above statement in C would be
$ x = 2
$ y = 10
$ if (x < 5)
{
x = x + 8;
y = 2 * y;
}
else if (x == 10)
{
x = x * 2;
}
else
x = 8;

Loops
Lisp has many forms of loops dotimes, dolist, do , loop for etc. I found the following most intuitive and best to get started with
$  (setf x 5)
$ (let ((i 0))
(loop
(setf y (* x i))
(when (> i 10) (return))
(print i) (prin1 y)
(incf i
)))
In C this could be written as
$ x = 5
(for i = 0; i < 10; i++)
{
y = x * i
printf(“%d %d\n”,i,y);
}
Another easy looping construct in C is
(loop for x from 2 to 10 by 3
do (print x))
In C this would be
(for x=2; x < 10; x = x+3)
print x;

Arrays
To create an array of 10 elements with initial value of 20
(setf numarray (make-array 10 :initial-element 20))
#(20 20 20 20 20 20 20 20 20 20)
To read an array element it is
$ (aref  numarray 3)                    - - -    numarray[3]
For e.g.
(setf x (* 2 (aref numarray 4)))     - - - -   x = numarray[4] * 2

Functions
(defun square (x)
(* x x))
This is the same as
int square (x)
{
return (x * x)
}
While in C you would invoke the function as
y = square (8)
In Lisp you would write as
(setf y (square 8))
Note: In Lisp the function is invoked as (function arg1 arg2… argn) instead of (function (arg1 arg2  … argn))

Structures
a) Create a global variable *db*
(defvar *db* nil)

b) Make a function to add an employee
$(defun make-emp (name age title)
(list :name name :age age :title title))
$(add-emp (make-emp "ganesh" 49 "manager"))
$(add-emp (make-emp "manish" 50 "gm"))
$(add-emp (make-emp "ram" 46 "vp"))
$ (dump-db)
For a more complete and excellent post on managing a simple DB looks at Practical Common Lisp by Peter Siebel

Reading and writing to standard output
To write to standard output you can use
(print “This is a test”) or
(print ‘(This is a test))
To read from standard input use
(let ((temp 0))
(print '(Enter temp))
(setf temp (read))
(print (append '(the temp is) (list temp))))

Reading and writing to a file
The typical way to do this is to use
a) Read
(with-open-file (stream "C:\\acl82express\\lisp\\count.cl")
(do ((line (read-line stream nil)
(read-line stream nil)))
((null line))
(print line)))
b) Write
(with-open-file (stream "C:\\acl82express\\lisp\\test.txt"
:direction :output
:if-exists :supersede)
(write-line "test" stream)
nil)
I found the following construct a lot easier
(let ((in (open "C:\\acl82express\\lisp\\count.cl" :if-does-not-exist nil)))
(when in
(loop for line = (read-line in nil)
while line do (format t "~a~%" line))
(close in)))


With the above you can get started on Lisp. However with just the above constructs the code one writes will be very “non-Lispy”. Anyway this is definitely a start.