Addressing the Challenges of Big Applications

| March 2, 2012

 

By Alex Heneveld

The challenge of handling big data has spawned a wealth of new data stores and file systems, addressing the challenges of scale and distribution, wide-area and availability, consistency and latency trade-offs. If the data isn’t accessed too frequently, and if the infrastructure doesn’t have to grow, these technologies can mean “job done”. 

More often, however, solving the big data challenge is only part of the solution:  most of the time, if you have big data, you’ll have one or more a “big apps”, and sooner or later (better sooner than later) you’ll have to address three more questions:
   
The first question revolves around how you will compute and deliver this data; the second causes one to question how to operate in or across different infrastructure environments and the third invokes questions about monitoring system behavior and topology changes.  

Computing Big Data

The first question follows from the fact that every read and write against data involves compute.  In some systems, this computation adds up and quickly becomes the bottleneck; so even if the storage can scale to petabytes, the number of concurrent requests might max out at an unacceptably low level.

The solution is to design for processing scalability, and broadly speaking there are two popular approaches.  In grid computing, “jobs” or “tasks” are given to “worker” nodes, and scalability becomes a question of scaling the number of such worker nodes.  Hadoop is a powerful extension of this approach where each job is decomposed into multiple tasks each of which runs as near to its target data segment as possible.

The actor model is a more general approach to scalable processing which has lately seen a resurgence in popularity, in part because it can address grid-style compute as well as more sophisticated cases where a unique serialization point is required or shared memory / datastore lookup is too expensive.  In the actor model, application code is decomposed into individual actors, and at runtime messages are passed to relevant actors or chain of actors.  In some ways this is similar to SOA, although often on a much more finely-grained scale.  Actor model systems, such as Erlang, Smalltalk, and Akka, are frequently championed for simplifying the design of applications with good scalability and robustness characteristics — particularly in the face of concurrency and distribution — because of how they force code to be structured.

When working with big data, the important aspect is that actors can be situated near the most relevant data; and with some frameworks, these actors can be moved at will, locally or wide area, with negligible overhead.  This capability allows processing to be optimized in real time for scale, latency, or even cost of compute resource.

At a high level, the thrust of this question is to ensure that, whatever big data solution is being used, the processing fabric is also sufficiently elastic and resilient to handle the corresponding compute load.  Some data systems include some of the above approaches (such as Hadoop), but a crucial part of the architect’s job is to make sure that the compute strategy suits not just the data but also the consumers.

Operating in Hybrid and Multi-Cloud Environments

The second question is a pragmatic one, recognizing that deploying and operating at scale often means using heterogeneous resources, from local hardware or virtualization through to off-site private and public clouds.  In order to use these resources, the application — at least for its deployment — has to interact with the providers and navigate the various models of compute, storage, and networking.  When designing for resilience, or operating at scale, or optimizing for performance or cost, some of the subtleties in these implementations can be quite dramatic.

One way of tackling this is to standardize at the virtualization level across an organization, including external suppliers.  CloudStack, OpenStack, and vCloud are some of the leading choices in this area.  When working in such an environment, the application merely need be designed around that provider’s model and built against their API.

This solution brings its own difficulties, however:  these virtual infrastructure platforms are still quite rapidly evolving, and the API’s change from release to release.  Worse, these changes are not always backwards compatible.  It also requires a heavyweight dev/test environment, when developers might prefer a lightweight local implementation such as VirtualBox to test against.

A complementary approach which can deliver the best of both worlds is to use a cloud abstraction API, such as jclouds (Java), libcloud (Python), or Deltacloud (REST).  These projects present a uniform concept model and API for use in the application, with implementations that allow the code to work against a wide range of cloud providers and platforms.  This means the choice of infrastructure becomes a runtime choice, not a design-time choice; and writing big apps for portability or spanning across multiple clouds becomes natural and consistent.  Furthermore, in many cases, the provider-specific implementations include additional robustness and performance consideration which makes the application developer’s life even better.

The rise of Platform-as-a-Service is another reason to consider an abstraction API.  It is true that writing for a PaaS can greatly accelerate development and insulate an application from the subtleties of different physical and virtual infrastructure layers (and in many cases this they do this by using jclouds or libcloud).  However it can cause lock-in at a different level:  an application designed for a specific PaaS can be very tricky to port to a different PaaS.  Cloud abstraction APIs can prevent against this by making applications portable from the outset, whether targetting infrastructure or PaaS.  Some PaaS entities, such as load-balancers and blobstores, are already available in both jclouds and libcloud, and because these projects are open source, new abstractions can and almost certainly will be emerging.

Effecting Topology Change

One of the most exciting facets of cloud is the ability to have new IT infrastructure “on tap”.  Unfortunately, however, simply having this tap doesn’t mean that applications will automatically benefit.  Designing apps so that they can take advantage of flexible infrastructure — and ultimately communicate how much infrastructure they want — is one of the biggest unsung challenges of cloud.

The naïve answer is not to think outside the box, but to make the box bigger (whether a VM or VLAN or virtual disk).  But with Big Data and Big Apps, this scaling up hits its limit far too early:  the only viable option for scaling is to scale out, that is to get more boxes and to be able to use them.

Recognizing the need for more capacity is not difficult:  the application will respond slowly or not at all, and will report errors.  What is substantially more difficult is to anticipate this need, and more difficult still, to get new capacity online and ready in time.

NoSQL data fabrics, such as Gemfire and Terracotta, can simplify part of this issue, making it easier for applications to incorporate new compute instances, but they tend not to pass judgment on when or how to request these instances (or give them back).  Equally, PaaS offerings can be a good answer where an application’s shape fits a common pattern, such as at the presentation tier.  However in the realm of Big Apps, with large data volumes and multiple locations, the shape and the constraints tend to be unique and the problem remains with the application designers.

The answer in practice is almost always some combination of a CMDB, one or more monitoring frameworks, and scripts to detect and resolve pre-defined situations.  CFEngine, Puppet, Chef and Whirr are noteworthy emerging players tackling various parts of this motley strategy, but even with these tools, writing good scalability and management logic for an application is no small undertaking, even when the management policies are relatively straightforward.

That said, it is an unavoidable part of writing a modern application.  The following collection of suggested best practices is the most which can be said:

* Design so that anything can be changed, with as little disruption as possible

* Be consistent across how the application is initially rolled-out and how changes are subsequently effected

* Decentralize the monitoring and management, pushing logic out to be near the relevant application components

* Consider “infrastructure-as-code”, so that the deployment topology can be tracked and, ideally, replayable

* Consider “policy-as-code”, as the logic which drives an application’s elasticity, fault tolerance, and run-time optimization is an important part of an application (especially with big applications)

* Treat the above “code” like any other code, with version control and testing

* Keep these pieces small and hierarchical, modular and substitutable

* Watch out for new developments in this space, as the current level of difficulty is not sustainable!

Conclusion

In this article we’ve looked at the key design questions facing Big Apps, the flip side of big data: put simply, how do you ensure that having made the right choice at the data tier, the system doesn’t fall down at the processing tier or the infrastructure layer? The triangle diagram summarizes one way of addressing this, focusing on a virtuous cycle of provisioning,  middleware and management. Getting this right delivers a robust, powerful runtime environment, where Big Apps can get the most out of big data.

About the Author

Alex Heneveld, CTO and co-founder of Cloudsoft, brings twenty years experience designing software solutions in the enterprise, start-up, and academic sectors. Most recently Alex was with Enigmatec Corporation where he led the development of what is now the Monterey® Middleware Platform™. Previous to that, he founded PocketWatch Systems, commercialising results from his doctoral research. 

Alex holds a PhD (Informatics) and an MSc (Cognitive Science) from the University of Edinburgh and an AB (Mathematics) from Princeton University. Alex was both a USA Today Academic All-Star and a Marshall Scholar.

Category: Uncategorized

About the Author ()

Comments are closed.