constant pool futures

On Jun 25, 2018, at 9:37 AM, Remi Forax <forax at univ-mlv.fr> wrote: > > Hi Peter, > you can simulate the equivalent of a Constant_Bytes by base64 encoding your bytes > and pass the resulting string to a Constant Dynamic which at runtime will load it, > decode it and return it as a byte array. You can also simulate a CONSTANT_Group by using a long array of arguments to the bootstrap method. As of 10, the limit of 255 arguments has been lifted; the class file format can handle up to 2^16-1 extra arguments. The BSM has to be a varargs method of course, since no method can get more than 255 positional arguments on the stack, including receiver and (if present) method handle. As of 11, the CONSTANT_Dynamic constant allows direct "ldc" of a constant computed directly by a BSM (without an intermediate indy or CallSite). The constant can be of any type expressible by the JVM. Crucially, such a constant can be an input to a BSM for a larger constant or call site; this introduces something like expression trees into the constant pool. For example, if you combine List.of with ConstantBootstraps.invoke plus a list of constants, you get a constant List of anything expressible in the constant pool (including other Lists). That's the present. The future of Java is shaped by the people who work hard to create it, but even they can't predict it accurately, although they try to write roadmaps and discuss their aspirations with the community. (We are not hiding secret plans or schedules!) I can say a few things about the how constant pool features fit into the big picture, but I can't predict which releases they will land in. The future Bytes and Group constants are envisioned as helpers for scaling complex class files toward lower overheads and higher limits. The existing workarounds have certain overheads and limits that we would probably like to remove in the longer term. For example, a base64 encoding string has a maximum payload length of about 0.75 * 2^16 octets. *Also*, using base64 requires one or more throwaway intermediate copies of the envelope and payload; this is the overhead of using a simulation instead of direct expression of a sequence of octets. Personally, I'd like to see more *zero copy* data structures in classfiles, just like Linux object files support zero copy read-only data, using file mapping. But zero copy data requires every layer of the system to support either immutability or (at worst) lazy copy on write, and also support "views" on the original data. This means we have to tune certain Java APIs to avoid statefulness and avoid certain data types. Java arrays can't do zero copy views for the same reasons they can't do slices. Value types will eventually help reduce the overheads of views, reducing zero copy API overheads. The Bytes constant is envisioned as scaling to at least 2^31-1 and perhaps beyond, and will not require decoding or copying. As such it is a potential replacement for resource files, as well as a carrier for short "binary string" data. Before we can do this in a copy-free manner, we need either an interface like CharSequence for bytes, or else a kind of ByteBuffer which has no state and is read-only. IMO the ByteBuffer improvements should play out a little longer before we decide what is the type of an "ldc" of a CONSTANT_Bytes constant. The requirements on this structure are close to (but not identical with) a ByteBuffer. My money is on a simple ByteSequence interface which ByteBuffer (and other types) will implement for interoperability, but the "ldc" of a Bytes constant should load a flyweight object (perhaps a value type) which does little more than hold the virtual address of a slice of a classfile (perhaps mapped from disk); this is probably lighter than any ByteBuffer. Likewise, the simulation of groups using BSM arguments has a maximum payload length of 2^16-1, and burns a constant pool index for each (distinct) BSM argument. Since there are only 2^16-1 possible CP entries, this is a serious cost. The Group format is envisioned as expressing a range of CP entries which do *not* consume global CP entry indexes. Uses of Group constants which don't have large number of resolved constants might be better coded as serialized bundles of bits, wrapped in a CONSTANT_Bytes envelope and deserialized from an ad hoc encoding. You can see how the simulation overheads and limits stack up if you then require such a bundle in a base64 string. It's not obvious on first glance, but if you think about it a long group of constants has a likely scalability requirement that the constants not be resolved eagerly for their bootstrap method. I.e,. the BSM should somehow be able to control the sequencing of constant resolution, including even deferring some constants to be resolved *after* the BSM returns (at some future point when the BSM logic needs the resolved constant). This requires another bit of VM functionality, the BootstrapCallInfo API, which allows a BSM to fully control resolution. This is will a WIP although it is (non-public) in the sources. In order for the BootstrapCallInfo API to properly express unresolved constants, we first need to land the JVM Constants API (JEP 334). Given the number of people available to work on various projects, we are working on these features as quickly as we can. There's a mix of sizes: Small things like ByteBuffer upgrades, medium like CONSTANT_Dynamic, and large like value types. Because such features tend to be interrelated, there's also a natural order in which we are approaching them. Also, the speed of development depends not only on the number of hands working on the technology but also on the surprising complexity of properly developing core JVM features, even supposedly simple one like CONSTANT_Dynamic. Coding is a fraction of the required work; there is also test development, integration, spec. development, several kinds of review, and iterative polishing; are are necessary to get a result our users will enjoy, and we'll be proud of, even a decade later. So even the simplest JVM feature requires a total of years of labor. We'll talk more about this at the JVM Language Summit and Oracle Committers' Workshop. I hope we can continue to spread the work around more to current partners like Intel and Red Hat. I will be delighted to talk to conference attenders in great detail about this stuff. HTH — John