Devoxx 2013 @ Antwerpen, Belgium - Markus Günther IT-Beratung

A very exciting week has finally come to an end - the mother of all Devoxx incarnations took place this week at the Metropolis Business Center in Antwerpen, Belgium. With roughly about 3500 fellow software engineers, over 200 presentations held by international speakers and 9 tracks the Devoxx is one of the biggest Java-related events in the world. And always a blast. This is my third time at this conference as a guest and I have to say that it is always an ideal place to mingle and exchange ideas, new technologies and trends with other developers.

The conference itself is organized by the Belgian Java User Group (BeJUG). It is spread over two floors at the Metropolis: While the lower floor hosts partners that showcase their products, a Pearson book store and - as of this year - an Internet-of-Things store (which sells Raspberry Pis and other embedded devices), the upper floor provides access to the conference rooms. Since the Metropolis is essentially a cinema, the rooms in which the sessions take place are quite comfortable. You can sit back and relax in comfy chairs while listening to the session being presented. You'll also find community whiteboards at the upper floor, which provide a great way to gather information on trending topics in the Java community. Between sessions, the big screens show information on upcoming sessions and a Twitter wall, which is a good way to provide early feedback on selected sessions or discuss topics further.

Apart from the keynotes, the sessions are divided into several tracks. Regarding my personal interests I mostly attended sessions that fall into the categories Architecture & Security, Java EE, Java SE and Cloud & Big Data. I'd like to talk about selected sessions to share some of the things I learned.

The Keynotes

Oracle

With the upcoming release of Java 8, Lambdas, Streams and all the new and shiny features of Java 8 were the prominent topic of discussion during Oracle's keynote. Mark Reinhold introduced with some insights on the history of Java and its development process. Java was designed to be a blue-collar language, a language which maximizes developer productivity and enables working developers to simply get their job done without exposing them to unnecessary technical details (e.g. pointers). Therefore, back at the time of Java's inception, the feature set of the platform seemed rather like an odd combination of risky things on the one hand and conservative things on the other. Still, customers requested features like garbage collection, JIT compilation, dynamic linkage, threading and the like. And Java succeeded in providing (almost) the right abstractions for those kinds of features. Still, Java as a language and the JVM as a platform must evolve in order to correct inconsistencies, holes or poor user experiences, but at the same time also adapt to change: Change in hardware (multicore), change in attitudes and fashions, like functional style becoming more and more popular. This is indeed not an easy task at hand, since Oracle works hard to preserve the core of the language while integrating new features in a seamless way. And although Oracle took their time, I must admit that they really did a great job integrating the new features in Java 8. Working with lambda expressions feels natural and I'm excited to see to which extent lambdas are going to get used in upcoming projects, libraries, and frameworks. I think Brian Goetz made the key point very clear when he stated: Reading code is more important than writing code. Indeed. Developers spent most of their time reading code of other developers. Code is a means of communication and simplicity matters when we want to communicate effectively. I think that lambdas are a great leap forward for Java and enable us to achieve this goal. But changing the status quo is not done with the sole addition of new features to a language. It will also require a major re-thinking on how we as developers approach the design of our programs.

Google

The second conference day started off with Google's keynote. Keynote speaker was Lars Bak, who worked in the past on the HotSpot VM and the V8 engine and is currently responsible for Google Dart. Thus, his keynote focused on Google Dart and the problem space it addresses. The Web has changed tremendously over the last five years, but our programming tools did not. Since web application are getting larger and more complex, lack of structure in JavaScript and lack of proper tool support are both major hindrances and lead to a bottleneck regarding programming productivity, especially if you want to take full advantage of the Browser as a platform. Dart aims to solve those problems, thus enabling programmers to be more productive when programming for the Web.

Dart features a rich set of reliable libraries, a productive IDE that supports type checking, refactoring and debugging. It also features a translator to JavaScript, which plays a major role if you want to run your Dart web application in today's browsers. It is supposed to be easy to pick up if you already have a proramming background in either JavaScript, Java or C#. Currently, the Dart VM is part of Chromium. For all other browsers out there, you first have to compile your Dart codebase to JavaScript. Early adopters will face this problem, because until Dart (if ever) takes off and becomes a major driver for web application programming in the future, they have to deal with a codebase written in the Dart language and bugs that might occur in environments where the JavaScript code is being run. Such additional levels of indirection are undesirable, since they can affect productivity tremendously. Google invited three companies that are already building products with Dart. According to the guy what works on Blossom - which is an agile project management tool - they migrated their existing JavaScript-codebase to Dart without that much of a hassle.

Dart indeed does the right things: The Dart VM seems to be very well-designed and the Dart language itself supports key features that will surely help to increase a developer's productivity. But even though the platform seems well-designed, I wonder if it will really take off and gets the attention it requires to thrive. Given that Google dismisses products that don't generate enough revenue rather quickly (cf. Google Wave, ...) I don't believe that it is the right time to base your products on Dart. But if you plan so, do a significant prototype beforehand in order to evaluate if Dart is really the thing you need.

Fault Tolerance Made Easy (Uwe Friedrichsen)

Achieving fault tolerancy is not a trivial endeavour and requries careful planning and design of your application. Especially scale-out systems as well as systems that have high demands regarding service availability require a design that incorporates fault tolerant mechanisms from early on. This actually is a very important topic, since most of the applications you design will at some point communicate with other systems. And since you can't guarantee that external systems fulfill their tasks and behave always correctly, you need to ensure that your application does not diminish the quality of experience for users by failing to respond. Uwe Friedrichsen discussed some typical case studies in which an application reacts in an inappropriate way in the presence of errors. He shows simple patterns that you can implement in pure Java code in order to mitigate the aforementioned problems.

Some key points I took away from his talk are (without going into too much detail):

Use timeouts when dispatching requests to an external system (other applications, databases, ...). Timeouts give you back control over your code and you can respond with an (temporary) error message in a graceful way.
If you know that you are going to fail, fail fast. Do not wait for long timeouts if you know that the ressource is unavailable.
Use a fail-fast mechanism especially when executing expensive actions that rely on multiple external resources. If you successfully queried resources A, B and C, time is wasted if the last resource D does not respond and the operation fails because of it. Keep an alive-state of those resources and query it before executing expensive operations.
Shed load if clients put too much load on servers. Whoever gets a response, gets a fair and timely response. If you do shed some requests, respond by setting the Retry-After HTTP header to indicate to a client that the overload situation is supposed to be temporary and that the request might be fulfilled at a later time.
Background processes that perform routine tasks use your resources as well, which might lead to an overload situation even if you have the capacity to handle all user requests. Limit the batch size of your background tasks to a size that guarantees that the system does not suffer from an overload situation because of those tasks. Your system still needs to respond to user requests in a timely fashion.
Error situations might be transient. Use a leaky bucket which is filled whenever an error occured. The leaky bucket is drained periodically. However, a strong indication of a persistent error situation is given if the fill state of the leaky bucket exceeds a certain threshold. Such an approach also comes with the additional benefit that you have to separate error handling from business logic.
Limit retries. If the error situation is transient, just retry after waiting some time, but limit those retries. Be careful though: Actions must be idempotent.

There are many more patterns (more complicated ones as well), but the talk gave a good overview of what you can do in order to achieve a higher resilience of your application. The simplicity of the code provided with the talk makes you think: Why didn't I do this earlier? After all, it's all about your code running safely in a production environment.

CQRS for Great Good (Oliver Wolf)

Oliver Wolf held an interesting talk concerning the architectural pattern CQRS, which is short for Command-Query Responsibility Segregation. I first heard of it while diving into the concepts behind Domain-Driven Design (DDD). In contrast to the traditional layered architecture, which does not separate actions that modify data from actions that just read the data, CQRS strives for exactly that kind of separation in order to achieve a higher level of scalability. The concept behind CQRS is based on good object-oriented design principles. You might have already heard of CQS, which stands for Command-Query Separation. When applying CQS, you divide the design of a class into methods that only read data without introducing any kind of side-effects (query) and methods that modify the internal state of objects of that class (command). CQRS is basically CQS in the large, meaning we apply that same basic principle to a bounded context. Bounded contexts are usually relatively small in the sense that they are just large enough to capture the complete ubiquitous language of an isolated business domain. I won't get into much detail here, just assume for the sake of simplicity that a bounded context might just stand for a whole application that is concerned with a single business domain. Achieving CQRS can be done by splitting our existing domain model into a query model and a command model, such that the resulting query model consists only of a thin layer between data retrieval from a data store and the delivery to the client, while the command model is used to create new or manipulate existing data. Thus, the command model is the richer of both models, since it captures all the business logic of your domain. Saying that, CQRS only makes sense if your domain model actually incorporates the business logic of your domain. CQRS is not a reasonable thing to achieve if you have an anemic domain model.

What is an anemic domain model? An anemic domain model is comprised of POJOs that don't expose behaviour, but only data in the sense of getters and setters. Domain-Driven Design is all about emphasizing behaviour over data-centric domain modeling. DDD states that you should not design data holders with simple getters and setters on their individual attributes, but rather use the language of the actual domain and expose behaviour on business objects, that change their internal attributes and thus their state. This goes together with good object-oriented design principles quite well, since we should always favor exposing behaviour on objects instead of setters and getters - even if it is just for the benefit to reduce the complexity of clients that are using our business objects.

In the Java world however, we often face anemic domain models because we have to be compliant to the Java Bean specification, for instance when we are using a ORM or expose our objects to JSPs. However, ORM frameworks like Hibernate are able to set the value of private fields of an object directly nowadays, so there is absolutely no need to expose those attributes using public setters and getters any longer.

Oliver motivated an architecture that follows the CQRS-pattern by discussing six assumptions that we as software architects often take for granted, but that are not necessarily true.

Assumption 1: Reads and writes are strongly cohesive, so they must be part of the same bounded context. Why do reads and writes have to be part of the same application? If you split them apart, they can scale independently which helps in accommodating highly asymmetric load on both operations.
Assumption 2: Reads and writes use the same, so they must be served from and applied to the same domain model. Actually, queries can benefit from a specialized query model. The query model might use data in a pre-aggregated or a de-normalized way. The command model on the other hand comprised all the behavioural logic of your business domain. Naturally, a de-normalized query model that does not manipulate any data, is relatively easy to scale-out if need be (unlike the command model!).
Assumption 3: Even for queries, we have to go through the domain model to abstract from the underlying database model. Queries are not about behaviour (mose of the time), so why do we need objects? A thin read layer that makes optimized use of the database's query capabilities (no ORM, no fuss, just plain old SQL) might be sufficient.
Assumption 4: We must use the same database for queries and commands. Most of the time, this assumption does not hold. Eventual consistency is sufficient in many cases. Thus, you can use a separate query database in combination with an event handler that gets notified by events from the command model and performs the respective updates to the query database, indepedently from all actions that are routed through the query model.
Assumption 5: Commands must be processed immediately to ensure data consistency. In many cases that users do not care if their actions have an immediate effect, as long as they eventually get some kind of feedback.
Assumption 6: The current state of domain objects must be persistent. CQRS plays well with an Event Sourcing architecture. You just store events and re-create the state of domain objects as needed. You could even hold the whole domain model in-memory.

Oliver talked a bit more about Event Sourcing as an implementation of the CQRS pattern and gave some advice as to when CQRS can be a feasible solution for your situation. You should not apply CQRS if your application follows simple CRUD-style (chances are high that your domain model is anemic anyways) or if you don't have any scaling issues. There are some frameworks which provide abstractions in order to achieve CQRS like Axon (Java) or Lokad (.NET), but since I haven't use either of them, I can't give any recommendation. However, CQRS seems a very reasonable architectural pattern if you have an asymmetric load on read and write operations and must be able to scale-out. I also encourage you to get yourself acquainted with Domain-Driven Design. The book Implementing Domain-Driven Design by Vaughn Vernon provides - in my opinion - an excellent and pragmatic introduction to the topic.

Distributed Systems using Hazelcast (Peter Veentjer)

Hazelcast is used in products like vert.x, Apache Camel and Mule ESB. It's main goal is to simplify the development of scalable and highly available systems. Since I haven't had the chance to dive into Hazelcast for myself, I decided to attend this talk by Peter Veentjer to learn a few things about it. And I wasn't disappointed at all. Peter did a great job introducing the key features of Hazelcast while providing many examples that showcase how those features actually work.

The nice thing about Hazelcast is certainly that it is just a 2.5 MB sized JAR file that you put into your classpath. There is no need to install it and there are no other external dependencies required in order to run it. Hazelcast is not a framework, but rather a library which provides distributed data structures that are highly scalable and highly available. While traditional applications used to scale-up solely, with Hazelcast you typically follow a scale-out approach, which also increases the availability of your system. Hazelcast could be your library of choice in achieving these requirements if you implement

messaging solutions,
event processing,
clustered scheduling,
cluster management,
HTTP session clustering or
caching

and need to scale-out.

Hazelcast falls into NoSQL family of data grids, but you can also use it as a computing grid by spreading-out computational operations across your cluster. It supports both XML and programmatic configurations. The distributed data structures it provides are enriched with features unknown to their traditional pendants. For instance, a distributed map also features a TTL parameter, which when set to an admissible value, ensures that items in the map reside in it only as long as that time-to-live. This turns a highly scalable and highly available distributed map into an efficient caching solution if need be.

Hazelcast ensures availability. The data stored via Hazelcast is partitioned across your cluster. However, if a machine leaves the cluster for whatever reason, the data it stored is reconstructed using replicated backups and distributed across the remaining machines. The distribution of your data across partitions is done automatically if you don't interfere. But actually, partitioning your data across a set of Hazelcast machines is a prudent design task that you should take very seriously. Data locality might be an issue if you notice too much remoting going on (data being fetched from multiple machines in order to carry out the requested operation).

Hazelcast supports a variety of distributed data structures. If you happen to implement solutions that need to be highly scalable and highly available, Hazelcast may just be the thing you should look into and I highly recommend that you do so.

Other stuff worth mentioning

The Devoxx team records all sessions and hosts them after some post-processing of the material at Parleys. Attendees get access to this content free of charge. But if you are interested in watching the sessions you can buy acccess to the whole package or just wait for another year until the content is freely accessible. The material is usually available at Parleys around the Christmas holidays.

Unfortunately I was unable to see all the talks that I wanted to see live. If you have the chance, you should definitely see the talks I talked about in the retro as well as the talks which are still on my personal watch-list:

The Bleeding Edge (Martijn Verburg, Richard Warburton)
The Habits of Highly Effective Technical Teams (Martijn Verburg)
Going Reactive: Event-Driven, Scalable & Resilient Systems (Jonas Bonér)

Conferences are a good way to gather information on new trends in the community or simply to talk about interesting topics with fellow developers. Devoxx provides just the right place if you want to get up-to-speed on new developments in the Java world. I highly recommend that you get the experience for yourself if you have the chance to.