Monday, March 28, 2011

A Roundup of Web Technologies


The internet and the World Wide Web are woven into our daily lives so intricately that life without them is unimaginable. We use the web for our daily news, to finding directions(maps), socializing(Facebook), sending/receiving emails, and buying e-tickets and books over e-retail stores on the net. With a click, a drag and drop or by just moving the mouse over a web page we see results instantaneously. But what are the technologies that power the Web outside of the routers and hubs of the data communication world?

Actually if one peeks into the technologies that power Web 2.0 one would be amazed at the bewildering array of technological choices that one is confronted with. My curiosity was whetted when I found that there were so many possibilities that go behind different websites from Gmail, www.amazon.com. Twitter, Facebook or maps.yahoo.com.

This article tries to give a bird's eye view of the different technologies at the different layers. In many ways this article will be more of name dropping of the technologies rather than doing any real justice to each individual piece. I am merely presenting the different technologies as an interested spectator rather than as a web expert.

Presentation Layer: This is the layer which presents the web page to user. In the presentation layer most of the pages are made of elements of from HTML,CSS, PHP, Javascript, AJAX. These are diferent scripting mechanisms to display or take input from the user. Subsequently there arose the need for technologies called Rich Internet Application (RIA) to provide a much more superior user experience. These technologies are used to display video content and animations. Hence, we have Flash, Flex to more sophisticated technologies like Liferay, Primefaces, Myfaces and Java Server Faces (JSF) to the current HTML5. These technologies allow for drag-and-drop functionality, incorporating videos and animations in the web pages making the user experience similar to what he experiences on the desktop.

Enterprise Layer: At this layer the user input is processed and the client makes necessary requests to the back end server to get the appropriate results. This layer also there is a virtual explosion of technologies that make this possible. In this layer from the earlier C++, Java programs the movement was towards Enterprise Java Beans (EJB) invoked through servlets or Java Server Pages. To make the life of the web developer easier (?) there are several web frameworks that automate some of the common tasks of the developer. Some of them are Django with Python, Ruby on Rails (RoR), Groovy Grails, Perl-Catalyst, Python-Flask, JAX-RS and so on. Each web framework has it pros and cons and has different learning curves. While Python developers thrive on "there is only one way to do a thing", die-hard Ruby developers believe in the "do not repeat yourself (DRY)" philosophy. So the technology choice will be a matter of taste combined with deadlines for the project.

Persistence Layer: At the persistence layer there is Hibernate which converts a relational model to an object model and vice-versa making it easy to manipulate the rows and columns of tables. Usually this layer is coupled with Spring frameworks. Another competing technology is Struts framework.

Database Layer: While Hibernate can be used as a persistence layer it is also possible to access the database through ODBC, JDBC etc.

Exchange of Data: In the earlier days sending and receiving data or invoking remote procedure calls were through CORBA or RPC (Remote Procedure Calls). Subsequently other methods have been implemented for data exchange between servers. They are XML, JSON (Javascript Object Notation),SOAP (Simple Object Access Protocol) to the more current REST (Representational State Transfer)

Hence there are plethora of choices to make prior in the design of web sites complete with back end processing. The choices that are made will depend on the look and feel of the web site coupled with the ease of implementation of the site given the project deadlines.

INWARDi Technologies

Friday, March 25, 2011

The Anatomy of Latency

Latency is a measure of the time delay experienced in a system. In data communications, latency would be measured as the round-trip delay between sending a packet and receiving response from the destination. In the world of web applications latency is the response time of a web site. In web applications latency is dependent on both the round trip time on the communication link and also the processing time of the application, Hence we could say that

latency = 2 * round trip time + Processing time

The round trip time is probably less susceptible to increasing traffic than the processing time taken for handling the increased loads. The processing time of the application is particularly pernicious in that it susceptible to changing traffic. This article tries to analyze why the latency or response times of web applications typically increase with increasing traffic. While the latency increases exponentially as the traffic increases the throughput increases to a point and then finally starts to drop substantially. The ideal situation for all internet applications is to have the ability to scale horizontally allowing the application to handle increasing traffic by simply adding more commodity servers to the application while maintaining the response times to acceptable limits. However in the real world this never happens.







The price of Latency
Latency hurts business. Amazon found out that every 100 ms of latency cost them 1% of sales. Similarly Google realized that a 0.5 second increase in search results dropped the search traffic by 20%. Latency really matters. Reactions to bad response times in web sites range from minor annoyance to complete frustration and loss of users and business.

The cause of processing latency
One of the fundamental requirements of scalable systems is that they should be loosely coupled. The application needs to have a modular architecture with well defined interfaces with the other modules. Ideally, applications which have been designed with fairly efficient processing times of the order of O(logn) or O(nlogn) will be immune to changing loads but will be impacted by changes in number of data elements So the algorithms adopted by the applications themselves do not contribute the increasing response times for increase traffic. So finally what really is the performance bottleneck for increasing latencies and decreasing throughput for increased loads?

Contention- the culprit
One of the culprits behind the deteriorating response is the thread locking and resource contention. Assuming that application has been designed with Reader-Writer locks or message queue based synchronization mechanism then the time spent in waiting for resources to become free, while traffic increases, will result in the degraded performance.

Let us assume that the application is read-heavy, write-light and has implemented Reader-Writer synchronization mechanism. Further let us assume that a write-thread locks a resource for 250 ms. At low loads we could have 4 such threads each locking the resource for 250 ms for a total span of 1s. Hence in 1s there can be a maximum of 4 threads each of which has executed a write lock for 250 ms for a total of 1s. In this interval all reader threads will be forced to wait. When the traffic load is low the number of reader threads waiting for the lock to be released will be low and will not have much impact but as the traffic increases the number of threads that are waiting for the lock to be released will be increase. Since a write lock takes a finite amount of time to complete processing we cannot go over the 4 write threads in 1 second with the given CPU speed.

However as the traffic further increases the number of waiting threads not only increases but also consume CPU and memory. Now this adversely impacts the writer threads which find that they have lesser CPU cycles and less memory and hence take longer times to complete. This downward cycle worsens and hence results in an increase in the response time and a worsening throughput in the application.

The solution to this problem is not easy. We need to revisit the areas where the application blocks waiting for something. Locking besides causing threads to wait also adds the overhead of getting scheduled prior to being able to execute again. We need to minimize the time a thread holds a resource before allowing others threads access to it.

INWARDi Technologies

Wednesday, March 23, 2011

Getting started with memcached-libmemcached

Memcached is the free, high performance, open source distributed caching system. It was designed to alleviate a high number of database queries by caching the data in memory. Since memcached is a distributed caching system the application data is distributed across servers. Data is inserted and retrieved from the distributed cache using a key,value pair.

Memcached uses a consistent hashing scheme to distribute the keys across the servers. The consistent hashing algorithm handles server crashes and servers joining-in by redistributing the keys across the necessary servers.

This article focuses on getting started with memcached-libmecached and making the process as painless as possible. After you have downloaded and installed memcached & libmemcached you are good to go.

First start 4 memcached servers
$ memcached -p 11221 &
$ memcached -p 11222 &
$ memcached -p 11223 &
$ memcached -p 11223 &

They start on the local host. (For full options check memcached -help)
Verify they are running using ps -ef.

libmemcached is the C client which can be used to connect to the memcached servers which you have started above.
A snippet of the libmemcached code client_test1.c is shown
client_test1.c
....
const char *server_string= "localhost:11221, localhost:11222, localhost:11223, localhost:11224";
memc= memcached_create(NULL);
servers= memcached_servers_parse(server_string);
rc= memcached_server_push(memc, servers);
rc= memcached_flush(memc, 0);
rc= memcached_set(memc, key, strlen(key),in_value, strlen(in_value),(time_t)0, (uint32_t)0);
rc= memcached_append(memc, key, strlen(key)," the", strlen(" the"),(time_t)0, (uint32_t)0);
rc= memcached_append(memc, key, strlen(key)," people here", strlen(" people here"), time_t)0, (uint32_t)0);
out_value= memcached_get(memc, key, strlen(key),&value_length, &flags, &rc);
printf("Out value is: %s\n",out_value);
memcached_server_list_free(servers);
free(out_value);
....

When you execute this client you should see
$ Out value is: We the people here

You can check which server the key is stored by doing
$ memdump --servers localhost:11221
fig
$memdump --servers localhost:11222
$
This shows that the key data is stored in the 1st servers localhost:11221

Now assume that we store a lot more data through client_test2.c
client test2.c
.....
const char *server_string= "localhost:11221, localhost:11222, localhost:11223, localhost:11224";
memc= memcached_create(NULL);
servers= memcached_servers_parse(server_string);
rc= memcached_server_push(memc, servers);
rc= memcached_flush(memc, 0);
for (i=0; i < 100; i++)
{
sprintf(str,"%d",i);
sprintf(str1,"%d",2*i);
printf("String %s string1 %s\n",str,str1);
printf("reached here\n");
rc= memcached_set(memc,str, strlen(str), str1, strlen(str1),(time_t)0, (uint32_t)0);
test_true(rc == MEMCACHED_SUCCESS);
}
for(i=0; i < 10; i++)
{
printf("Input value:");
scanf("%s",testvalue);
printf("Value to search for %s",testvalue);
value= (uint32_t *)memcached_get(memc, testvalue, strlen(testvalue), &value_length, &flags, &rc)
test_true(rc == MEMCACHED_SUCCESS);
printf("Value is %s\n",value);
}
.....

After executing this when we dump the key values from the servers we will see
$ memdump --servers localhost:11221
97
94
92
89
....
....

Similarly
$ memdump --servers localhost:11222
99
98
91
87
86
....
....

Hence the keys are hashed across servers. The consistent hashing mechanism takes O(log(n)) to get to cache server as against a naive hashing scheme which would take O(1).

Happy memcaching ...

Friday, March 18, 2011

The Business of Cloud Computing

Cloud Computing is the spanking new paradigm in the world of computing. The key differentiator in this technology is that the enterprise only pays for the amount of resources used - be it CPUs, memory or databases. While it does away with Capital Expenditure for organizations by providing a utility model of pricing it results in recurring Operating Expenses for the organization. However the important thing is that the cloud grows and shrinks according to demand and hence the cost to the organization is dependent on the traffic it generates. While web based applications are prime candidates for the cloud other equally eligible candidates are batch processing jobs, nightly builds or CPU intensive analytics. Except for the case of web application, for other types of applications, a reasonable estimate can be made on the resources needed and appropriate choice be made on the cloud.

This article looks at web applications where the traffic on the site can be seasonal and can vary during periods of the day. Besides web sites should be capable of handling bursty traffic with enormous loads at particular intervals.

The important consideration for web sites is to ensure that the application is truly optimized and exhibits the property of scaling horizontally. While it appears that scaling out will occur for any reasonably designed application the issue is that as the number of hits increase on the web site the response time increases steeply but the number of transactions per second plateaus at some particular load level and does not increase after that. It can be said that for a certain CPU instance configuration the peak transaction per second will reach a particular limit and cannot be increased any further. However the cloud also provides a key component namely the load balancer along with auto scaling which create a new instances when this threshold is reached.



What are the business considerations that need to be taken while designing for the cloud?
One needs to be conservative in choosing the instance type. While larger instances will provide a better performance they also cost more. Hence the instance type should be large enough and no larger. It would be wasteful of using extremely large instances where the last instance only uses a part of the total traffic while costing a lot more.

The analogy is that if 16 units if task have to be performed it is better to have a small CPU instance capable of handling 3 units of task requiring a total of 6 CPUs (6 * 3 = 18 > 16) rather than having a large CPU instance capable of handling 5 units of task requiring a total of 4 large CPUs (5 * 4= 20> 16). The second option would result in a waste processing power.

Assuming that the upfront cost to the organization for hosting the website in-house is 'P' and the cost amortized over a period of 1 years is 'p' per hour. Further if the instance cost is 'c' and 'n' is number of instances needed to support the projected demand and the revenue to the organization hosting the website is 'r' per 1000 hits then a cloud deployment will make business sense when

(rh– n * ch) – ph > 0 where h is the hour

As long as the right hand side is positive the organization will profit. However as the traffic increases and the throughput of website plateaus the enterprise will hit a 'window of diminishing returns'.

However if the performance of the application is poor and the number of instances needed to support the traffic is disproportionately large then the above equation will be negative and will result in loss to the organization.


(rh – n * ch) – ph < 0

Hence deployment to the cloud besides requiring a strong technical background also needs a sound business sense in order to reap the benefits of the cloud.

INWARDi Technologies

Thursday, March 17, 2011

Designing for Cloud Worthiness

Cloud Computing is changing the rules of computing to the enterprise. Enterprises are no longer constrained by capital costs of upfront equipment purchase. Rather they can concentrate on the application and deploy it on the cloud and pay in a utility style based on usage. Cloud computing essentially presents a virtualized platform on which applications can be deployed.

The Cloud exhibits the property of elasticity by automatically adding more resources to the application as demand grows and shrinking the resources when the demand drops. It is this property of elasticity of the cloud and the ability to pay based on actual usage that makes Cloud Computing so alluring.

However to take full advantage of the Cloud the application must use the available cloud resources judiciously. It is important for applications that are to be deployed on the cloud to have the property of scaling horizontally. What this implies is that the application should be able to handle more transactions per second when more resources are added to application. For example if the application has been designed to run in a small CPU instance of 1.7GHz,32 bit and 160 GB of instance storage with a throughput of 800 transactions per second then one should be able to add 4 such instances and scale to handling 4000 transactions per second.

However there is a catch in this. How does one determine what should be theoretical limit of transactions per second for a single instance? Ideally we should maximize the throughput and minimize the latency for each instance prior to going to the next step of adding more instances on the cloud. One should squeeze the maximum performance from the application in the instance of choice prior to using multiple instances on the cloud. Typical applications perform reasonably well under small loads but as the traffic is increased the response time increases and the throughput also starts dipping.






There is a need to run some profiling tools and remove bottlenecks in the application. The standard refrain for applications to be deployed on the cloud is that they should be loosely coupled and also be stateless. However, most applications tend to be multi-threaded with resource sharing in various modules. The performance of the application because of locks and semaphores should be given due consideration. Typically a lot of time wasted in the wait state of threads in the application. A suitable technique should be used for providing concurrency among threads. The application should be analyzed whether it read-heavy and write-light or write-heavy and read-light. Suitable synchronization techniques like reader-Writer, message queue based exclusion or monitors should be used.

I have found callgrind for profiling and gathering performance characteristics along with KCachegrind for providing a graphical display of performance times extremely useful.

Another important technique to improve performance is the need to maintain in-memory cache of frequently accessed data. Rather than making frequent queries to the database periodic updates from the database need to be made and stored in in-memory cache. However while this technique works fine with a single instance the question of how to handle in-memory caches for multiple instances in the cloud represents quite a challenge. In the cloud when there are multiple instances there is a need for a distributed cache which is shared among multiple instances. Memcached is appropriate technique for maintaining a distributed cache in the cloud.

Once the application has been ironed out for maximum performance the application can be deployed on the cloud and stress tested for peak loads.
Some good tools that can be used for generating loads on the application are loadUI and multi-mechanize. Personally I prefer multi-mechanize as it uses test scripts that are based on Python which can be easily modified for the testing. One can simulate browser functionality to some extent with Python in multi-mechanize which can prove useful.

Hence while the cloud provides CPUs, memory and database resources on demand the enterprise needs to design applications such that the use of these resources are done judiciously. Otherwise the enterprise will not be able to reap the benefits of utility computing if it deploys inefficient applications that hog a lot of resources without appropriate revenue generating performance.

INWARDi Technologies

Friday, March 4, 2011

Optimal Cloud Computing

Published in CIOL, Jul 11, 2011 - Cloud Computing - Windows of Performance
Published in Dataquest, Nov 16, 2011 - Cloud Computing - Cloud all the way

The murmur of Cloud Computing today, is bound to build up to crescendo in the years to come simply because it makes sound business sense. Cloud Computing is a new paradigm in the world computing. The cloud essentially creates an illusion of infinite computing resources which are available on demand to the user who only pays based on the usage. While on the surface it appears extremely simple and straightforward, making an optimal use of the cloud is no trivial task.

Prior to deploying on the cloud the enterprise has to decide the CPU, memory and bandwidth usage of the application. For e.g. the Amazon EC2 provides several variants of CPUs based on different pricing schemes namely $0.085/hr, $0.34/hr or $0.68/hr for small, large or extra large CPU instances. There are different pricing schemes for memory and bandwidth usage as well.

While the technological challenge of deploying the cloud is a separate endeavor in itself, the business considerations needed for deciding the cloud computing resources, optimally, is a separate and an equally important endeavor. This article focuses on the business considerations needed for making an optimal choice of resources while deploying on the cloud.

Since the enterprise is free to choose different CPUs which typically consists of CPU processors with different clock speeds or multi core CPUs for extra large instance the choice is really complicated.

The designer needs to consider how his application scales up with respect to increasing, decreasing or burst demands in traffic. To estimate the kind of resources that would be needed would require a good understanding of how the application scales with respect to increasing traffic. Ideally it will be remarkable if the application can scale linearly with increasing traffic. The key parameters that need to be considered for application performance is application latency and throughput versus the instance type.



Also another consideration is to choose is the kind of resources types that need to be added. Ideally it would make more sense to add small CPU instances which can be added incrementally rather than adding extra large CPU instances which only handle part of the traffic. If we choose the large instance which is only partially used but has to be instantiated, nevertheless, to handle the extra traffic then it could result it wasting of precious resources.



A prime consideration is the choice of CPU resource type and the need to understand how the CPU loads up with increasing traffic with respect to latency and throughput. Once the CPU type, small, medium, large or extra large is chosen the designer needs to monitor how the loading of the CPU resource performs with increasing traffic.

Hence regardless of the choice there will be 3 windows of performance to consider





a) Window of Optimality: In the optimal window the cost of cloud computing resources, for handling the incoming traffic versus the revenue for the enterprise is truly profitable. In the optimal window the application will be capable of scaling extremely well to increasing traffic thus resulting in excellent revenue for the enterprise.

b) Window of Diminishing Returns: In this window the addition of extra resources at additional cost will not result in a proportional increase in scalability. In fact the increasing cost of adding additional resource will offset the revenue to the enterprise as the application will not scale appropriately and will result in diminishing returns.

c) Window of Loss: This is the window, in which no enterprise should not find itself in. In this window the cost of adding the extra resources will be larger than the revenue to the enterprise as an inordinate amount of resources will have to be added for small incremental increase in scalability. This will be the result of a poorly designed application. In this situation the enterprise must go back to the drawing room and re-architect the application.

Hence cloud computing, while truly alluring for the enterprise, it is a path that must be tread very carefully by the enterprise.


INWARDi Technologies

Tuesday, March 1, 2011

When NoSQL makes better sense than MySQL

In large web applications where performance and scalability are key concerns a non –relational database like NoSQL is a better choice to the more traditional databases like MySQL, ORACLE, PostgreSQL etc. While the traditional databases are designed to preserve the ACID (atomic, consistent, isolated and durable) properties of data, these databases are capable of only small and frequent reads/writes.

However when there is a need to scale the application to be capable of handling millions of transactions the NoSQL model works better. There are several examples of such databases - the more reputed are Google’s BigTable, HBase, Amazon’s Dynamo, CouchDB & MongoDB. These databases are based on a large number of regular commodity servers. Accesses to the data are based on get(key) or set(key,value) type of APIs.

The database is itself distributed across several commodity servers. Accesses to the data are based on a consistent hashing scheme for example the Distributed Hash Table (DHT) method. In this method the key is hashed efficiently to one of the servers which can be visualized as lying on the circumference of the circle. The Chord System is one such example of the DHT algorithm. Once the destination server is identified the server does a local search in its data for the key value. Hence the key benefit of the DHT is that it is able to spread the data across multiple servers rather than having a monolithic database with a hot standby present.

The ability to distribute data and the queries to one of several servers provides the key benefit of scalability. Clearly having a single database handling an enormous amount of transactions will result in performance degradation as the number of transaction increases.

However the design of distributing data across several commodity servers has its own challenges, besides the ability to have an appropriate function to distribute the queries to. For e.g. the NoSQL database has to be able handle the requirement of new servers joining the system. Similarly since the NoSQL database is based on general purpose commodity servers the DHT algorithm must be able to handle server crashes and failures. In a distributed system this is usually done as follows. The servers in the system periodically convey messages to each other in order to update and maintain their list of the active servers in the database system. This is performed through a method known as “gossip protocol”

While databases like NoSQL, HBase, Dynamo etc do not have ACID properties they generally follow the CAP postulate. The CAP (Consistency, Availability and Partition Tolerance) theorem states that it is difficult to achieve all the 3 CAP features simultaneously. The NoSQL types of databases in order to provide for availability, typically also replicates data across servers in order to be able to handle server crashes. Since data is replicated across servers there is the issue of maintaining consistency across the servers. Amazon’s Dynamo system is based on a concept called “eventual consistency” where the data becomes consistent after a few seconds. What this signifies is that there is a small interval in which it is not consistent.

The NoSQL since it is non-relational does not provide for the entire spectrum of SQL queries. Since NoSQL is not based on the relational model queries that are based on JOINs must necessarily be iterated over in these applications. Hence the design of any application that needs to leverage the benefits of such non-relational databases must clearly separate the data management layer from the data storage layer. By separating the data management layer from how the data is stored we can easily accrue the benefits of databases like NoSQL.

While NoSQL kind of databases clearly have an excellent advantage over regular relational databases where high performance and scalability are key requirements the applications must be appropriately be tailored to take full advantage of the non-relation and distributed aspect of the database.You may also find the post "To Hadoop, or not to Hadoop" interesting.