Modularizing an existing codebase

This is the first part of A Journey to Java 9 modules. See the table underneath for links to the other parts.

Part 1: Modularizing an existing codebase
Part 2: Using service loaders
Part 3: Selecting services based on quality aspects
Part 4: Using a default service provider

Introduction

During the course of this article we will migrate an existing Java 8 application to a fully modularized Java 9 application that leverages the capabilities of the Java Platform Module System (JPMS). This article is the first installment of a series of articles that showcases a migration path towards a loosely coupled and modularized application architecture.

The domain of the example application is simple on purpose and deals with string matching. There are a lot of string matching algorithms in existence with a wide variety of differing quality characteristics in terms of runtime performance, overall comparisons necessary and the like. Our example application will implement two such algorithms:

A naive, brute-force algorithm that performs rather poorly on large bodies of text.
The more advanced Knuth-Morris-Pratt matching algorithm, which performs quite good on large bodies of text.

The output of both algorithms is a list of indices that indicates all the starting positions of a given fragment of text within the larger text.

The Java 8 application also features a CLI which enables its users to search for a fragment of text in a larger text using one of the supported algorithms. Its CL interface consumes three arguments:

The short name / ID of the algorithm. For the brute-force algorithm, this is simply naive while the Knuth-Morris-Pratt matcher is selected by providing the ID kmp.
The full text to search for fragments of text without any whitespaces.
A fragment of text to look for in the given full text.

We use Apache Maven for dependency management and for building the application.

The Java 8 application is our baseline for the migration to a modularized Java 9 application using the JPMS.

Maven modules

The baseline is a multi-module Maven project comprised of two modules.

matchers-core: Contains the Java API for string matching algorithms as well as the implementation of the aforementioned algorithms.
matchers-cli: Contains the CLI that runs the string matching algorithms against a body of text.

The compile-time dependencies on the level of our Maven modules are shown in the following diagram.

The application comprises two Maven modules: matchers-core which defines the API and strategies and matchers-cli which builds on top of the core library.

Fig. 1: The application comprises two Maven modules: matchers-core which defines the API and strategies and matchers-cli which builds on top of the core library.

Let us take a brief look at the internal structure of these Maven modules.

Both modules share the same packages.

Fig. 2: Both modules share the same packages.

Maven module matchers-core provides the interface Matcher, with a clear contract: An implementation of the Matcher interface consumes a large body of text (call it haystack) and a fragment of text (call it needle) and looks up all the starting indices of fragments of text found in the large body of text. This interface is located in package net.mguenther.matchers.

Co-located with the interface are the aforementioned implementations in classes BruteForceMatcher and KnuthMorrisPrattMatcher.

Although Maven module matchers-cli uses the interface Matcher to run a specific algorithm, it requires knowledge about the supported implementations. This is due to the fact that a user is able to select the algorithm she wants to use by its (abbreviated) name, which is mapped in class MatchersCli to an instance of the class that implements the algorithm. The listing underneath shows an excerpt of that class demonstrating the resolution and execution of the string matching algorithm.

Matcher matcher = null;

switch (algorithm) {
  case "kmp":
    System.out.println("Using Knuth-Morris-Pratt matcher");
    matcher = new KnuthMorrisPrattMatcher();
    break;
  case "naive":
  default:
    System.out.println("Using Brute-Force matcher");
    matcher = new BruteForceMatcher();
}

List<Integer> matchingPositions = matcher.match(haystack, needle);

Shortcomings

Obviously, an approach like this has a couple of flaws with regard to encapsulation, due to the limitations of the visibility and access control imposed by Java up until version 8. This results in a tight coupling of both modules. Even though we could provide a third Maven module matchers-api to separate the API from the implementation, this separation would be somewhat artificial, since the CLI would require both modules matchers-api and matchers-core nonetheless, giving it access to the implementation details in matchers-core.

This is not the only code smell. For instance, the CLI is not in compliance with the Open-Closed-Principle, since adding a new implementation in matchers-core is not available in the CLI if it is not explicitly wired into it. These smells can be mitigated though through a proper choice of design, but have nothing to do with the aforementioned limitations that the JPMS tries to solve.

With Java 9 and the JPMS we strive for a solution that provides a better encapsulation by truly hiding implementation details from consuming modules in such a way that we only need to express compile-time dependencies via JPMS (and of course via Maven on the Maven level), while being able to contribute implementation details through the module path by simply putting the respective JARs onto it.

Let's see how this goes.

Adapt the Maven build to use Java 9

So, first of all, we will update our Maven build so that it properly compiles Java 9 sources. This step is pretty basic, as we only have to adjust the configuration of the maven-compiler-plugin and - of course - environment variables, so that Maven can pick up the proper JDK.

<plugins>
  <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-compiler-plugin</artifactId>
    <version>3.7.0</version>
    <configuration>
      <source>9</source>
      <target>9</target>
      <showWarnings>true</showWarnings>
      <showDeprecation>true</showDeprecation>
    </configuration>
  </plugin>
</plugins>

Depending on the settings on your local machine, you can either use a global JAVA_HOME environment variable that points to the JDK 9 or you can use the Maven Resource File (~/.mavenrc). Either way, your PATH variable should include a reference JDK 9.

On my machine, the path to JDK 9 is /usr/lib/jvm/java-9-oracle, so the configuration looks just like this:

export JAVA_HOME=/usr/lib/jvm/java-9-oracle
export PATH=${PATH}:$JAVA_HOME/bin

Running a mvn clean install from the project source folder should still yield a success. The configuration of the maven-compiler-plugin states that we want to see any kind of warnings and deprecated usages during the build. In our particular case, there are none: Our string matching algorithms work perfectly fine up to this point.

This is expected as Java 9 can be used just like any previous version of Java. You do not have to commit to a fully modularized build to begin with. What happens if you do not is that any code outside of a specific module ends up in the so called unnamed module. The build will not break, since the unnamed module is able to read all other modules.

Please note that although you do not have to commit to the JPMS, the JDK itself is still modularized as of Java 9. This may lead to broken builds if you reference code that is not reachable through module java.se, which is the root module per default if you do not use designated modules for your code. This is for instance the case if your code uses the popular JAXB API.

Before we dive into the modularization of our existing codebase, we have to talk a bit about the implications on the JPMS when using it together with a build tool like Maven. It is perfectly viable to have multiple modules inside a single src folder, as long as module boundaries are properly set by providing the necessary module descriptors for your modules. Maven implies a restriction that enforces a 1:1-relationship between a Maven module and the Java module it contains. So for instance, if our matchers-core module comprises two Java modules within source folder src, call them matchers.api and matchers.impl, the IDE will present the following error condition:

_module-info.java already exists within module_

In this case, our solution will not compile. This is pretty self-explanatory actually: The target artifact of a Maven module is a single JAR (or POM, or WAR, ...) which is addressable by its Maven coordinates. Allowing multiple Java modules (which are packaged as JARs themselves) within a Maven module would break the semantics of the Maven coordinates.

Through the course of this migration guide, we will denote Maven modules using hyphens, like matchers-core, and Java modules using decimal points, like matchers.impl to distinguish them in the text. It is perfectly fine though if you keep both the Maven and Java module name synchronized.

Separating API from implementation

Recall the layout of Maven module matchers-core.

Both API and implementation are sharing the same package.

Fig. 3: Both API and implementation are sharing the same package.

All interfaces and classes reside in the same package. First of all, we will create a new Maven module, call it matchers-api that will host a Java module, call it matchers.api, comprising the API for our string matching algorithms. The interface Matcher is copied over to this module and we will expose it to other modules by exporting the whole package¹. We will keep the interface Matchers in its original module as well, so the build does not break just yet.

module matchers.api {
  exports net.mguenther.matchers;
}

After applying our changes, the source tree of the module looks like this:

matchers-api
├── pom.xml
└── src
    └── main
        └── java
            ├── module-info.java
            └── net
                └── mguenther
                    └── matchers
                        └── Matcher.java

As for the actual implementations of interface Matchers, we will re-use Maven module matchers-core and introduce the Java module matchers.impl to it. This is done by adding a module-info.java to the folder src/main/java of the Maven module that declares a requires-relationship with matchers.api (note that we explicitly address the Java module, not a package) and that also declares the package that it exports.

  module matchers.impl {
    requires matchers.api;
    exports net.mguenther.matchers;
}

Introducing this module-info.java will yield a couple of errors. First, the Maven module matchers-api - which contains the required dependency matchers.api - is not in the set of Maven dependencies for Maven module matchers-core. Using the JPMS does not relieve us from dealing with dependencies on the level of Maven. However, we can fix this quite easily by adding the following code to the pom.xml of the respective Maven module.

<dependency>
  <groupId>net.mguenther.matchers</groupId>
  <artifactId>matchers-api</artifactId>
  <version>0.1.0-SNAPSHOT</version>
</dependency>

After introducing this dependency, we can finally delete the interface Matcher from module matchers.impl and use the one from module matchers.api instead.

But we still have an error, though, and that error is currently breaking our build. The IDE already points it out:

Package 'net.mguenther.matchers' exists in another module: 'matchers.api'

Let us take a step back and look at what we achieved so far.

The JPMS prohibits packages that are shared across multiple modules.

Fig. 4: The JPMS prohibits packages that are shared across multiple modules.

Although the separation into two distinct modules fits quite nicely, the package boundaries do not: We introduced a so called split package. A split package is a package that spans multiple modules. The JPMS enforces that a single package has its dedicated Java module and thus prohibits the use of split packages. From a package design perspective, this seems a bit unnatural as both modules contribute to the same component and, at least from my understanding, a component can surely span multiple modules².

Anyhow, to fix the issue, we have to abide by the rule that a Java module cannot share packages it contains with another module, so we rename net.mguenther.matchers of module matchers.impl into net.mguenther.matchers.impl for the time being. We end up with a working solution that looks like this.

Each Java module has to use its own package.

Fig. 5: Each Java module has to use its own package.

The standard Maven directory layout still applies, even if we take the JUnit tests into account that exist for the matcher implementations. After applying all changes, it should look like this.

matchers-kmp
├── pom.xml
└── src
    ├── main
    │   └── java
    │       ├── module-info.java
    │       └── net
    │           └── mguenther
    │               └── matchers
    │                   └── impl
    │                       └── BruteForceMatcher.java
    │                       └── KnuthMorrisPrattMatcher.java
    └── test
        └── java
            └── net
                └── mguenther
                    └── matchers
                        └── impl
                            ├── BruteForceMatcherTest.java
                            └── KnuthMorrisPrattMatcherTest.java

Refactoring the CLI to a top-level application module

Although we provided a clean and working separation for the API and its implementation, our current application will not compile. This is due to the fact that Maven module matchers-cli does not implement a Java module. Hence, it is moved to the unnamed module and from there it has no access to the classes and interfaces that are exported from Java modules matchers.api and matchers.impl. So, if we try to compile the solution anyway, Maven will fail with a compilation error as it cannot resolve the symbols for classes that the CLI imports.

We simply execute the same procedure again as we did already when we separated the API from the implementation and introduce a new Java module as part of matchers-cli which we aptly name matchers.cli. Its module descriptor needs to declare a requires-relationship to both modules:

module matchers.cli {
  requires matchers.api;
  requires matchers.impl;
}

With this module descriptor in place, the solution comprising all three modules should build and run perfectly fine. Have a look at what we achieved so far in terms of (enclosing) Maven modules. Note that there is a 1:1-correlation between the dependency graph of the Maven modules and the dependency graph of the resp. Java modules.

The CLI still has an unwanted dependency on matchers.impl.

Fig. 6: The CLI still has an unwanted dependency on matchers.impl.

But contrary to our initial mission statement, this makes one thing quite obvious: We are still not getting rid of the unwanted dependency from CLI to the implementation. We will address this issue in the next article of this series.

This is okay, since this module comprises only classes and interfaces that contribute to the public API of our application.
The closure of a component must not even be known at compile time. Think of pluggable application architectures, in which you simply add modules (JARs) to the module path to extend the functionality of a component.