Every change is an incompatible change. A

risk/benefit analysis is always required.



—Martin Buchholz



Veteran JDK Engineer

When evolving the JDK, compatibility concerns

are taken very seriously.

However, different standards are applied to evolving various aspects

of the platform. From a certain point of view, it is true that any

observable difference could potentially cause some unknown

application to break. Indeed, just changing the reported version

number is incompatible in this sense because, for example, a JNLP file

can refuse to run an application on later versions of the platform.

Therefore, since not making any changes at all is clearly not viable

for evolving the platform, changes need to be evaluated against and

managed according to a variety of compatibility contracts.

For Java programs, there are three main categories of compatibility:

Source: Source compatibility concerns translating Java

source code into class files. Binary: Binary compatibility is href="http://java.sun.com/docs/books/jls/third_edition/html/binaryComp.html#13.2"

title="JLSv3 13.2 What Binary Compatibility Is and Is Not">defined

in The Java Language Specification as preserving the

ability to link without error. Behavioral: Behavioral compatibility includes the

semantics of the code that is executed at runtime.



Note that non-source compatibility is sometimes colloquially referred

to as "binary compatibility." Such usage is incorrect since the JLS

spends an entire chapter precisely defining the term binary

compatibility; often behavioral compatibility is the intended notion

instead.

There are many other observable aspects of the JDK not related to Java

programs, such as file layout, etc. Those will not be further

discussed in this note.

The basic challenge of compatibility is the difficulty of finding and

modifying all the software and systems impacted by a change. In a

closed-world scenario where all the clients of an API are known and

can in principle be simultaneously changed, introducing "incompatible"

changes is just a matter of being able to coordinate the engineering

necessary to evaporate the liquid in a small body of water, perhaps only

a puddle or pot on a stove. In contrast, for APIs that are used as

widely as the JDK, rigorously finding all the possible programs

impacted by an incompatible change is as impractical as href="http://poetry.about.com/library/weekly/aa072997.htm">boiling the

oceans, so evolving such APIs is quite constrained by comparison.

Generally, we will consider whether a program P is compatible

is some fashion (or not) with respect to two versions of a library

L 1 and L 2 that differ in some way.

(We will not consider the compatibility impact of such changes to

independent implementers of L.)

Sometimes only a particular program is of interest; is the change from

L 1 to L 2 compatible with

this program? When evaluating how the platform should evolve,

a broader consideration of the programs of concern is used. For

example, does the change from L 1 to

L 2 cause a problem for any program that

currently exists? If so, what fraction of existing programs is

affected? Finally, the broadest consideration is does the change

affect any program that could exist? Often once a platform

version is released, the latter two notions are similar because

imperfect knowledge about the set of actual programs means it can be

more tractable to consider the worst possible outcome for any

potential program rather than estimate the impact over actual

programs. Stated more formally, depending on the change being

considered, judging the change based on the worst possible outcome for

any program is more appropriate than judging based on some other kind

of href="http://blogs.sun.com/darcy/entry/norms_how_to_measure_size"

title="Joe on Norms: How to Measure Size">norm of the disruption

over the space of known programs.

Generally each kind of compatibility has both href="http://blogs.sun.com/darcy/entry/balance_of_error" title="Joe on

Balance of Error">positive and negative aspects; that is, the

positive aspect keeping things that "work" working and the negative

aspect of keeping things that "don't work" not working. For

example, the TCK tests for Java compilers include both positive tests

of programs that must be accepted and negative tests of programs that

must be rejected.

In many circumstances, preserving or expanding the positive behavior

is more acceptable and important than maintaining the negative

behavior and we will focus on positive compatibility in this entry.

In terms of relative severity, source compatibility problems are

usually the mildest since there are often straightforward workarounds,

such as adjusting import statements or switching to fully qualified

names. Gradations of source compatibility are identified and

discussed below. Behavioral compatibility problems can have a range of

impacts while true binary compatibility issues are problematic since

linking is prevented.

The basic job of any linker or loader is simple: It binds more

abstract names to more concrete names, which permits programmers to

write code using the more abstract names. (href="http://linker.iecc.com/">Linkers and Loaders)

A Java compiler's job also includes mapping more abstract names to

more concrete ones, specifically mapping simple and qualified names

appearing in source code into binary names in class files.

Source compatibility concerns this mapping of source code into class

files, not only whether or not such a mapping is possible, but also

whether or not the resulting class files are suitable. Source

compatibility is influenced by changing the set of types available

during compilation, such as adding a new class, as well as changes

within existing types themselves, such as adding an overloaded method.

There is a large set of possible changes to href="http://java.sun.com/docs/books/jls/third_edition/html/binaryComp.html#13.4"

title="JLSv3 Evolution of Classes">classes and href="http://java.sun.com/docs/books/jls/third_edition/html/binaryComp.html#13.5"

title="JLSv3 Evolution of Interfaces">interfaces examined for

their binary compatibility impact. All these changes could also be

classified according to their source compatibility repercussions, but

only a few of kinds of changes will be analyzed below.

The most rudimentary kind of positive source compatibility is whether

code that compiles against L 1 will continue to

compile against L 2 ; however, that is not the

entirety of the space of concerns since the class file resulting from

compilation might not be equivalent.

Java source code often uses href="http://java.sun.com/docs/books/jls/third_edition/html/names.html#6.5.5.1"

title="JLSv3 6.5.5.1 Simple Type Names">simple names for types;

using information about imports, the compiler will href="http://java.sun.com/docs/books/jls/third_edition/html/names.html#6.5"

title="JLSv3 6.5 Determining the Meaning of a Name">interpret

these simple names and transform them into href="http://java.sun.com/docs/books/jls/third_edition/html/binaryComp.html#13.1"

title="JLSv3 13.1 The Form of a Binary">binary names for use in

the resulting class file(s). In a class file, the binary name of an

entity (along with its signature in the case of methods and

constructors) serves as the unique, universal identifier to allow the

entity to be referenced.

So different degrees of source compatibility can be identified:

Does the code still compile (or not compile)?

If the code still compiles, do all the names resolve to the

same binary names in the class file?

If the code still compiles and the names do not all

resolve to the same binary names, does a behaviorally

equivalent class file result?

Whether or not a program is valid can also be affected by language

changes. Usually previously invalid program are made valid, as when

generics were added, but sometimes existing programs are rendered

invalid, as when keywords were added ( href="http://java.sun.com/docs/books/jls/second_edition/html/classes.doc.html#251946"

title="JLSv3 8.1.1.3 strictfp Classes">strictfp , href="http://www.jcp.org/en/jsr/detail?id=41" title="JSR 41: A Simple

Assertion Facility">assert , and href="http://www.jcp.org/en/jsr/detail?id=201" title="JSR 201:

Extending the Java Programming Language with Enumerations, Autoboxing,

Enhanced for loops and Static Import">enum ).

The version number of the resulting class file is also an external

compatibility issue of sorts since that affects which platform

versions the code can be run on.

Full source compatibility with any existing program is usually

not achievable because of \* imports. For example,

consider L 1 with packages foo and

bar where foo includes the class

Quux . Then L 2 adds class

bar.Quux . This program

import foo.\*;

import bar.\*;

public class HelloQuux {

public static void main(String... args) {

Object o = Quux.class;

System.out.println("Hello " + o.toString());

}

}

will compile under L 1 but not under

L 2 since the name " Quux " is now

ambiguous as reported by javac :

HelloQuux.java:6: reference to Quux is ambiguous, both class bar.Quux in bar and

class foo.Quux in foo match

Object o = Quux.class;

\^

1 error



An adversarial program could almost always include \*

imports that conflict with a given library.href="#adversary">1 Therefore, judging source compatibility

by requiring all possible programs to compile is an overly

restrictive criterion. However, when naming their types, API

designers should not reuse " String ",

" Object ", and other names of core classes from packages like

java.lang and java.util to avoid this kind of

annoying name conflict.

Due to the \* import wrinkle, a more reasonable definition

of source compatibility considers programs transformed to only use href="http://java.sun.com/docs/books/jls/third_edition/html/names.html#6.7"

title="6.7 Fully Qualified Names and Canonical Names">fully qualified

names. Let FQN(P, L) be program P

where each name is replaced by its fully qualified form in the context

of libraries L. Call such a library transformation from

L 1 to L 2 binary-preserving

source compatible with source program P if

FQN(P, L 1 ) equals FQN(P,

L 2 ). This is a strict form of source

compatibility that will usually result in class files for P

using the same binary names when compiled against both versions of the

library. Class files with the same binary names will result when each

type has a distinct fully qualified name. Multiple types can have the

same fully qualified name but differing binary names; those cases do

not arise when the standard naming conventions are being

followed.2

Adding overloaded methods has the potential to change method

resolution and thus change the signatures of the method call sites in

the resulting class file. Whether or not such a change is problematic

with respect to source compatibility depends on what semantics are

required and how the different overloaded methods operate on the same

inputs, which interacts with behavioral equivalence notions. Assume

class C originally has a method void m(T t)

and then an overload void m(S s) is added. Some cases

of interest include:

S and T are both reference types: If there is no typing relationship between S and

T , overload resolution will not be affected. If there is a typing relationship between S and

T , such as T is a subtype of S ,

call sites in existing source may now resolve to the new method.

Well-written programs will follow the href="http://en.wikipedia.org/wiki/Liskov_substitution_principle"

title="Wikipedia on Liskov substitution principle">Liskov substitution

principle and C will do "the same" operation on the

argument no matter which overloaded method is called. Less than

well-written programs may fail to follow this principle.

S and T are both primitive

types: By extension, if a numerical value can be represented in

multiple primitive types, overloaded methods taking a type with that

value should usually perform an equivalent operation. However, the

silent loss of precision in primitive widening conversion can affect

the actual value that gets passed to an overloaded method. Concretely, consider class C with methods

m(int) and m(double) . The call site

" m(123L) " will undergo primitive widening conversion,

converting the argument value to double before

m(double) is called. Now if m(long) is

added to C , the call site will resolve to the new method.

Even assuming each m method does an equivalent operation

when passed a numerically equal value, there can still be differences

after the third method is added since some long values

lose precision when converted to double , for example,

Long.MAX_VALUE . Therefore, a client when compiled

against the two version of C can have different runtime

behavior even if each m method behaves reasonably.

This kind of subtle change in overloading behavior occurred with the

addition of a BigDecimal constructor

taking as long as part of href="http://www.jcp.org/en/jsr/detail?id=13" title="JSR 13: Decimal

Arithmetic Enhancement">JSR 13

One of S and T is a reference

type, the other is primitive: Before generics were added to the

language, two methods which differed in the primitive/reference status

of the i th parameter could not

possibly be applicable to the same arguments. But, along with

generics came href="http://java.sun.com/docs/books/jls/third_edition/html/conversions.html#5.1.7"

title="JLSv3 5.1.7 Boxing Conversion">boxing and href="http://java.sun.com/docs/books/jls/third_edition/html/conversions.html#5.1.8"

title="JLSv3 5.1.8 Unboxing Conversion">unboxing conversions that

can map, for example, a value of an int primitive type to a

java.lang.Integer object with a reference type, and vice

versa. These mapping have the potential to introduce ambiguities in

method resolution such that adding a method could introduce an

ambiguity that prevented previously valid code from compiling;

however, the rules for method invocation expressions were href="http://java.sun.com/docs/books/jls/third_edition/html/expressions.html#15.12.2"

title="JLSv3 15.12.2 Compile-Time Step 2: Determine Method

Signature">updated to avoid such potential ambiguities from

boxing/unboxing as well as var-args.

If a new method cannot change resolution, then it is a

binary-preserving source transformation. If a new method can change

resolution, if the different class file that results has acceptably

similar behavior, the change may still be acceptable, while changing

resolution in such a way that does not preserve semantics is

likely problematic. Changing a library in such a way that current

clients no longer compile is seldom appropriate.

JLSv3 §13.2

What Binary Compatibility Is and Is Not



A change to a type is binary compatible with (equivalently,

does not break binary compatibility with) preexisting binaries

if preexisting binaries that previously linked without error will

continue to link without error.

The JLS defines binary compatibility strictly according to linkage; it

P links with L 1 and continues to link with

L 2 , the change made in L 2 is

binary compatible. The runtime behavior after linking is not

included in binary compatibility:





JLSv3 13.4.22 Method and Constructor Body



Changes to the body of a method or constructor do not break [binary]

compatibility with pre-existing binaries.

As an extreme example, if the body of a method is changed to throw an

error instead of compute a useful result, while the change is

certainly a compatibility issue, it is not a binary

compatibility issue since client classes would continue to link.

Also, it is href="http://java.sun.com/docs/books/jls/third_edition/html/binaryComp.html#13.5.3"

title="JLSv3 13.5.3 The Interface Members">not a binary

compatibility issue to add methods to an interface. Class files

compiled against the old version of the interface will still link

against the new interface despite the class not having an

implementation of the new method. If the new method is called at

runtime, an href="http://java.sun.com/javase/6/docs/api/java/lang/AbstractMethodError.html"

title="Java SE 6 Specification for

AbstractMethodError"> AbstractMethodError is thrown; if

the new method is not called, the existing methods can be used without

incident. (Adding a method to an interface is a source

incompatibility that can break compilation though.)

A design requirement from the addition of generics via href="http://www.jcp.org/en/jsr/detail?id=14" title="JSR 14: Add

Generic Types To The Java Programming Language">JSR 14 was

migration compatibility. Migration compatibility requires that

a library can be generified and existing (nongeneric) clients can

continue to compile and link against the generic version.

Meeting this constraint led to the use of erasure, href="http://gafter.blogspot.com/2004/09/puzzling-through-erasure-answer.html"

title="Neal on Puzzling Through Erasure: answer section">a

controversial aspect of the generics design. During JSR 14, it

was not known how to add generics in a way that supported both

reification and migration compatibility; href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5098163"

title="Sun bug 5098163 Add reification of generic type parameters to

the Java programming language">future work might address this

shortcoming.

Intuitively, behavioral compatibility should mean that with the same

inputs program P does "the same" or an "equivalent" operation

under different versions of libraries or the platform. Defining

equivalence can be a bit involved; for example, even just defining a

proper equals method in a class can be nontrivial. In this

case, to formalize this concept would require an href="http://en.wikipedia.org/wiki/Operational_semantics"

title="Wikipedia on Operational Semantics">operational

semantics for the JVM for the aspects of the system a program

was interested in. For example, there is a fundamental difference in

visible changes between programs that introspect on the system and

those that do not. Examples of introspection include calling core

reflection, relying on stack trace output, using timing measurements

to influence code execution, and so on. For programs that do not use,

say, core reflection, changes to the structure of libraries, such as

adding new public methods, is entirely transparent. In

contrast, a (poorly behaved) program could use reflection to look up

the set of public methods on a library class and throw an

exception if any unexpected methods were present. A tricky program

could even make decisions based on information like a timing href="http://en.wikipedia.org/wiki/Side_channel_attack">side

channel. For example, two threads could repeatedly run different

operations and make some indication of progress, for example, href="http://java.sun.com/javase/6/docs/api/java/util/concurrent/atomic/AtomicInteger.html#incrementAndGet()"

title="Java SE 6 Specification for

AtomicInteger.incrementAndGet">incrementing an atomic counter, and

the relative rates of progress could be compared. If the ratio is

over a certain threshold, some unrelated action could be taken, or

not. This allows a program to create a dependence on the optimization

capabilities of a particular JVM, which is generally outside a

reasonable behavioral compatibility contract.

The evolution of a library is constrained by the library's contract

included in its specification; for final classes this

contract doesn't usually include a prohibition of adding new public

methods! While an end-user may not care why a program does not work

with a newer version of a library, what contracts are being followed

or broken should determine which party has the onus for fixing the

problem. That said, there are times in evolving the JDK when

differences are found between the specified behavior and the actual

behavior (for example href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4707389"

title="Sun bug 4707389 {Float, Double}.valueOf erroneously accepts

integer strings">4707389, href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6365176"

title="Sun bug 6365176

java.math.BigInteger.ZERO.multiply(null)">6365176). The two basic

approaches to fixing these bugs are to change the implementation to

match the specified behavior or to change the specification (in a

platform release) to match the implementation's (perhaps

long-standing) behavior; often the latter option is chosen since it

has a lower de facto impact on behavioral compatibility.

Consider two versions of a simple enum representing the crew of the

USS Enterprise, one for the first season:

public enum StarTrekCast {

JAMES_T_KIRK("Jim"),

LEONARD_MCCOY("Bones"),

JANICE_RAND("Yeoman Rand"),

MONTGOMERY_SCOTT("Scotty"),

SPOCK("Spock"),

HIKARU_SULU("Sulu"),

UHURA("Uhura"); // Any first name for Uhura is non-canon.

private String nickname;

StarTrekCast(String nickname) {

this.nickname=nickname;

}

public String nickname() { return nickname;}

}

and another for the second season:

public enum StarTrekCast {

JAMES_T_KIRK("Jim"),

SPOCK("Spock"),

MONTGOMERY_SCOTT("Scotty"),

LEONARD_MCCOY("Bones"),

/\* JANICE_RAND("Yeoman Rand"), \*/ // Only in 8 episodes!

HIKARU_SULU("Sulu"),

PAVEL_CHEKOV("Chekov"), // Introduced in season 2.

UHURA("Uhura"); // Any first name for Uhura is non-canon.

private String nickname;

StarTrekCast(String nickname) {

this.nickname=nickname;

}

public String nickname() { return nickname;}

}

Compared to the first reason, the second season:

Deletes yeoman Janice Rand Adds Pavel Chekov Reorders Bones, Scotty, and Spock to better reflect the order of

who commands the ship if the Captain and others are unavailable.



These changes have varying source, binary, and behavioral compatibility

effects:

Deleting JANICE_RAND is source incompatible, able to break

compilations. The deletion is also binary incompatible. Besides

being observable via reflection, the deletion affects the behavior of

various built-in methods on the enum, including values and

valueOf . In addition, the deletion will break previously

serialized streams with this constant. Adding CHEKOV is binary-preserving source compatible.

Likewise, the addition of a new public static final field is binary

compatible. However, the addition of a new constant is visible to

reflection and alters the behavior of built-in enum methods. Existing

serialized instance continue to work after a new constant is

added. Reordering McCoy, Scotty, and Spock is a binary-preserving

source compatible and binary compatible change, but the reordering

changes the behavior of built-in methods, most notably

compareTo .



The title="JDK 6 Release Notes on Compatibility">compatibility

policies we apply to platform releases, like JDK 7, differ from

those applied to maintenance and update releases, like JDK 6 updates.

For both kinds of releases, binary compatibility must be maintained

for JCP-managed APIs. Update releases must maintain source

compatibility, but platform releases are able to break source

compatibility given sufficient justification. In update releases,

behavioral compatibility is regarded as very important; programs may

be relying on specified-to-be-unspecified behavior of a particular

implementation and switching to another update in the same release

family should be seamless whenever possible. In contrast, platform releases have fewer

restrictions on changing such behavior. So, for example, modifying the

order of iteration of elements in a HashMap to allow faster

hashing algorithms, would be quite appropriate for a platform release

(href="http://java.sun.com/javase/6/docs/api/java/util/HashMap.html"

title="Java SE 6 specification for HashMap">"This class makes no

guarantees as to the order of the map; in particular, it does not

guarantee that the order will remain constant over time."), but

would be much less suited to an update release.





Original Preface to JLS



Except for timing dependencies or other non-determinisms and given

sufficient time and sufficient memory space, a program written in the

Java programming language should compute the same result on all

machines and in all implementations.

The above statement from the original JLS could be regarded as

vacuously true about any platform: except for the non-determinisms, a

program is deterministic. The difference was that in Java, with

programmer discipline, the set of deterministic programs was nontrivial

and the set of predictable programs was quite large. In other

words, the platform provider and the programmer both have

responsibilities in making programs portable in practice; the platform

should abide by the specification and conversely programs should

tolerate any valid implementation of the specification.

To make continued evolution of the platform more tractable, it may be

helpful to introduce more structured ways of tracking behavioral

changes so that programs could in principle by audited for depending

on aspects of the platform in ways that are not recommended. For

example, potentially annotations could be used to:

Mark classes and methods whose specification has changed in a

release (analogous to change bars in a written document). Record stability information about a method's contract,

deterministic, non-deterministic, volatile (expected to change over

time), etc., for example whether the hashCode of a class is

specified to return particular values or just obey href="http://java.sun.com/javase/6/docs/api/java/lang/Object.html#hashCode()"

title="Object.hashCode in Java SE 6">the general contract. Using com.sun.\* annotations, annotate constructs whose

implementations we have changed in our specific implementation in a

particular release, such as HashMap ordering.



Annotation processing is a general purpose meta-programming framework,

standardized as part of the

platform as of JDK 6. Annotation processors, probably also using

the href="http://java.sun.com/javase/6/docs/jdk/api/javac/tree/index.html"

title="javac tree API">tree API, could be written to check for

usage of changed or problematic APIs in source code. The D compiler in DTrace can enforce analogous limits on the stability levels and dependency classes of D scripts.

While there would be considerable cost and complication to designing

such a scheme and retrofitting it onto at least a subset of the JDK,

the ability to define and then programmatically test policies for

behavioral compatibility issues could enable platform providers and

programmers to have a smoother joint stewardship of keeping

applications running and Java usage growing.

Conclusion

Compatibility is a multifaceted concept, with nuances within each

broad category. In the future, annotation processors or other

program analyzers might help manage source, binary, and behavioral

analysis by direct analysis or program markup.

Acknowledgments

Éamonn McManus gave useful feedback on a draft of this entry.

Notes

1

There are some cases where such an adversarial program could be

thwarted in practice. For example, when the Unicode version supported

by JDK platform is upgraded previously illegal identifier strings are

often allowed. A new JDK platform class could use the newly valid

names not open to preexisting malicious clients; although new

adversaries could afterward use the new name. This assumes the

compatibility threat model only includes class files generated from

Java sources. As of class file version 49.0 for JDK 5 and later, at

the JVM level many more identifiers are legal than those accepted in

Java source.

2

Even code that always uses fully qualified names is not completely

immune from ambiguities and unintended (or malicious) changes in the

meaning of names stemming from changes in the library environment

since distinct types can have the same fully qualified name. For

example, the type name " a.b.C " could refer to: class C in package a.b :

package a.b;

public class C {}

class C nested inside class b where

class b is a member of package a :

package a;

public class b {

public static class C{}

}

class C nested inside class b which

is in turn nested inside class a where class

a is a member of an unnamed package (href="http://java.sun.com/docs/books/jls/third_edition/html/packages.html#7.4.2"

title="JLSv3 7.4.2 Unnamed Packages">unnamed packages are not href="http://en.wikipedia.org/wiki/Highlander_(film)" title="There can

be only one!">Immortal): public class a {

public static class b {

public static class C{}

}

} These three classes cannot all be compiled together (" package a.b

clashes with class of same name "); however, they can be compiled

separately to the same output location and so can all appear on a

classpath when another file is compiled. If all three are on the

classpath together, when other code is compiled the qualified name

" a.b.C " resolves to the doubly-nested class C in an

unnamed package. To avoid such name collisions, binary names use " $ "

instead of " . " to separate the name of an enclosing class

from a nested class, leading to the distinct binary names

" a.b.C ", " a.b$C ", and " a$b$C ",

respectively, for the classes in question. Following the recommended

href="http://java.sun.com/docs/books/jls/third_edition/html/names.html#6.8"

title="JLSv3 6.8 Naming Conventions">naming conventions avoids

such name clashes. Therefore, such name clashes should be rare in

practice when compiling against libraries following the conventions,

as JCP moderated java.\* and javax.\* APIs

should do. As an extreme case, do not write this program:

public class java {

public static class lang {

public static class String {

String(Object o){}

}

}

public static void main(String... args) {

java.lang.String s =

new java.lang.String("I don't think this means " +

"what you think it means.");

if (!s.getClass().getName().equals("java.lang.String"))

System.out.println("Inconceivable!");

}

} In this perverse example, the nested class java.lang href="http://java.sun.com/docs/books/jls/third_edition/html/names.html#6.3.2"

title="JLSv3 6.3.2 Obscured Declarations">obscures the

venerable java.lang package and the local

java.lang.String declaration href="http://java.sun.com/docs/books/jls/third_edition/html/names.html#6.3.1"

title="JLSv3 6.3.1 Shadowing Declarations">shadows the

standard href="http://java.sun.com/javase/6/docs/api/java/lang/String.html"

title="Good old java.lang.String from Java SE

6">java.lang.String .

Further Reading