Intel have announced the next big instruction set extension, AVX512 to be implemented in 2015 or 2016. The details are defined in Intel Architecture Instruction Set Extensions Programming Reference. There are many interesting extensions: The size of vector registers are extended from 256 bits (YMM registers) to 512 bits (ZMM) registers. There is room for further extensions to at least 1024 bits (what will they be called?)

The number of vector registers is doubled to 32 registers in 64-bit mode. There will still be only 8 vector registers in 32-bit mode.

Eight new mask registers k0 - k7 allow masked and conditional operations. Most vector instructions can be masked so that it only operates on selected vector elements while the remaining vector elements are unchanged or zeroed. This will replace the use of vector registers as masks.

Most vector instructions with a memory operand have an option for broadcasting a scalar operand.

Floating point vector instructions have options for specifying the rounding mode and for suppressing exceptions.

There is a new addressing mode called compressed displacement. Where instructions have a memory operand with a pointer and an 8-bit sign-extended displacement, the displacement is multiplied by the size of the operand. This makes it possible to address a larger interval with just a single byte displacement as long as the memory operands are properly aligned. This makes the instructions smaller in some cases to compensate for the longer prefix.

More than 100 new instructions

The 512-bit registers can do vector operations on 32-bit and 64-bit signed and unsigned integers and single and double precision floats, but unfortunately not on 8-bit and 16-bit integers. A year ago, Intel announced a similar instruction set with 512-bit registers in Intel Xeon Phi Coprocessor Instruction Set Architecture Reference Manual. The two instruction sets are very similar, both are backwards compatible, but they are not compatible with each other. The two instruction sets differ by a single prefix bit, even for otherwise identical instructions. I assume that the Knights Corner or Xeon Phi instruction set will have a short life and be replaced by AVX512. The AVX512 instruction set uses a new 4-bytes prefix named EVEX, which is similar to the 2- or 3-bytes VEX prefix, but with 62 (hexadecimal) as the first byte. (Actually, I predicted several years ago that the 62 byte would be used for such a prefix because it was the only remaining byte that could be used in the same way as the VEX prefix bytes). The extra bits in the EVEX prefix are used for doubling the number of registers, for specifying vector size, and for the extra features of broadcasting, masking, zeroing, specifying rounding mode, and suppressing floating point exceptions. The calling conventions for the new registers are partially defined in a draft ABI, but it is still discussed whether the new registers should have callee save status, see Gnu libc-alpha mailing list. I have commented on the AVX512 instruction set and suggested various improvements at Intel's blog and Intel's forum. The new instruction sets are supported by my objconv disassembler.