How to name modules, automatic and otherwise

This note is in reply to the concerns about automatic modules raised by Robert Scholte and Brian Fox [1], and by Stephen Colebourne and others [2]. I've collected my conclusions here rather than in separate messages because there are several distinct yet intertwined issues. Summary: - Module names should not include Maven group identifiers, because modules are more abstract than the artifacts that define them. - Module names should use the reverse-domain-name-prefix convention or, preferably, the project-name-prefix convention. - We should not abandon automatic modules, since they are a key tool for migration and adoption. - We can address the problems of automatic modules with two fairly minor technical enhancements. If any of these points strikes you as controversial, please read on! * * * Module names should not include Maven group identifiers, as Robert Scholte and Brian Fox suggest [1], even for modules declared explicitly in `module-info.java` files. Modules in JPMS are a construct of the Java programming language, implemented in both the compiler and the virtual machine. As such, they are more abstract entities than the artifacts that define them. This distinction is useful, both conceptually and practically, hence module names should remain more abstract. This distinction is useful conceptually because it makes it easier, as we read source code, to think clearly about the nature of a module. We can reason about a module's dependences, exports, services, and so forth without cluttering our minds with the details of group identifiers and version constraints. Today, e.g., we can write, and read: module foo.data { exports com.bar.foo.data; requires hibernate.core; requires hibernate.jcache; requires hibernate.validator; } If we were to extend the syntax of module names to include group identifiers, and encourage people to use them, then we'd be faced with something much more verbose: module com.bar:foo.data { exports com.bar.foo.data; requires org.hibernate:hibernate.core; requires org.hibernate:hibernate.jcache; requires org.hibernate:hibernate.validator; } Group identifiers make perfect sense in the context of a build system such as Maven, where they bring necessary structure to the names of the millions of artifacts available across different repositories. Such structure is superfluous and distracting in the context of a module system, where the number of relevant modules in any particular situation is more likely to be in the tens, or hundreds, or (rarely) thousands. All else being equal, simpler names are better. At a practical level, the distinction between modules and artifacts is useful because it leaves the entire problem of artifact selection to the build system. This allows us to switch from one artifact to another simply by editing a `pom.xml` file to adjust a version constraint or a group identifier; if module names included group identifiers then we'd also have to edit the `module-info.java` file. This flexibility can be helpful if, e.g., a project is forked and a new module with the same name and artifact identifier is published under a different group identifier. We long ago decided not to do version selection in the module system, which surprised some people but has worked out fairly well. We should treat group selection in the same manner. Another practical benefit of the module/artifact distinction is that it keeps the module system independent of any particular build system, so that build systems can continue to improve and evolve independently over time. Maven-style coordinates are the most popular way to name artifacts in repositories today, but that might not be true ten years from now. It would be unwise to adopt Maven's naming convention for module names just because it's popular now, and doubly so to bake Maven's group-identifier concept into the Java programming language. * * * If module names don't include group identifiers, then how should modules be named? What advice should we give to someone who's creating a new module from scratch, or modularizing an existing component by writing a `module-info.java` file for it? (Continue to set aside, for the moment, the problems of automatic modules.) In structuring any particular space of names we must balance (at least) three fundamental tensions: We want names that are long enough to be descriptive, short enough to be memorable, and unique enough to avoid needless conflicts. If you control all of the modules upon which your module depends, and all of the modules that depend upon it, then you can of course name your module whatever you want, and change its name at any time. If, however, you're going to publish your module for use by others -- whether just within your own organization or to a global repository such as Maven Central -- then you should take more care. There are two well-known ways to go about this. - Choose module names that start with the reversed form of an Internet domain name that you control, or are at least associated with. The Java Language Specification has long suggested this convention as a way to minimize conflicts amongst package names, and it has been widely though not universally adopted for that purpose. - Choose module names that start with the name of your project or product. Module (and package) names that start with reversed domain names are less likely to conflict but they're unnecessarily verbose, they start with the least-important information (e.g., `com`, `org`, or `net`), and they don't read well after exogenous changes such as open-source donations or corporate acquisitions (e.g., `com.sun.*`). The reversed domain-name approach was sensible in the early days of Java, before we had development tools sophisticated enough to help us deal with the occasional conflict. We have such tools now, so going forward the superior readability of short module and package names that start with project or product names is preferable to the onerous verbosity of those that start with reversed domain names. This advice will strike some readers as controversial. I respect those who will choose, for the sake of tradition or an abundance of caution, to use the reversed domain-name convention for module names and continue to use that convention for package names. I do know, however, of at least one major, well-known project whose developers intend to adopt the project-name-prefix convention for their module names. * * * If module names don't include group identifiers, then how should automatic modules be named? Or are automatic modules so troublesome that we should remove them from the design? To answer the second question first: It would be a tragic shame to drop automatic modules, since otherwise top-down migration is impossible if you're not willing to modify artifacts that you don't maintain, which most people (quite sensibly) aren't. Even if you limit your use of automatic modules to closed systems, as Stephen Colebourne suggests [2], they're still of significant value. Let's see if we can rescue them. The present algorithm for naming automatic modules has two problems: (A) Conflicts are possible across large artifact repositories, since the name of an automatic module is computed from the name of the artifact that defines it. [1] (B) It's risky to publish a module that `requires` some other module that has not yet been modularized, and hence must be used as an automatic module. If the maintainer of that module later chooses an explicit name different from the automatic name then you must publish a new version of your module with an updated `requires` directive. [2] As to (A), yes, conflicts exist, though it's worth observing that many of the conflicts in the Maven Central data are due to poorly-chosen artifact names: `parent`, `library`, `core`, and `common` top the list, which then falls off in a long-tail distribution. When conflicts are detected then build tools can rename artifacts either automatically or, preferably, to user-specified names that map to sensible automatic-module names. If renaming artifacts in the filesystem proves impractical then we could extend the syntax of the `--module-path` option to allow a module name to be specified for each specifically-named artifact, though strictly speaking that would be a feature of the JDK rather than JPMS. We can address (B) by enabling the maintainers of existing components to specify the module names that should be given to their components when used as automatic modules, without having to write `module-info.java` files. This can be done very simply, with a single new JAR-file manifest `Module-Name` attribute, as first suggested almost a year ago [3]. If we add this one feature then the maintainer of an existing component that, e.g., must still build and run on JDK 7 can choose a module name for that component, record it in the manifest by adding a few lines to the `pom.xml` file, and tell users that they can use it as an automatic module on JDK 9 without fear that the module name will change when the component is properly modularized some years from now. The actual change to the component is small and low-risk, so it can reasonably be done in a patch release. There's no need to write a `module-info.java` file, and in fact doing so may be inadvisable at this point if the component depends on other components that have not yet been given module names. This approach for (B) does add one more (optional) step to the migration path, but it will hopefully lead to a larger number of explicitly-named modules in the world -- and in particular in Maven Central -- sooner rather than later. - Mark [1] http://mail.openjdk.java.net/pipermail/jpms-spec-experts/2017-January/000537.html [2] http://mail.openjdk.java.net/pipermail/jigsaw-dev/2017-January/011106.html [3] http://openjdk.java.net/projects/jigsaw/spec/issues/#ModuleNameInManifest