GCC 4.1 Release Series

Changes, New Features, and Fixes

The latest release in the 4.1 release series is GCC 4.1.2.

Caveats

General Optimizer Improvements

GCC now has infrastructure for inter-procedural optimizations and the following inter-procedural optimizations are implemented:

Profile guided inlining. When doing profile feedback guided optimization, GCC can now use the profile to make better informed decisions on whether inlining of a function is profitable or not. This means that GCC will no longer inline functions at call sites that are not executed very often, and that functions at hot call sites are more likely to be inlined.

A new parameter min-inline-recursive-probability is also now available to throttle recursive inlining of functions with small average recursive depths. Discovery of pure and const functions, a form of side-effects analysis. While older GCC releases could also discover such special functions, the new IPA-based pass runs earlier so that the results are available to more optimizers. The pass is also simply more powerful than the old one. Analysis of references to static variables and type escape analysis, also forms of side-effects analysis. The results of these passes allow the compiler to be less conservative about call-clobbered variables and references. This results in more redundant loads being eliminated and in making static variables candidates for register promotion. Improvement of RTL-based alias analysis. The results of type escape analysis are fed to the RTL type-based alias analyzer, allowing it to disambiguate more memory references. Interprocedural constant propagation and function versioning. This pass looks for functions that are always called with the same constant value for one or more of the function arguments, and propagates those constants into those functions. GCC will now eliminate static variables whose usage was optimized out. -fwhole-program --combine can now be used to make all functions in program static allowing whole program optimization. As an exception, the main function and all functions marked with the new externally_visible attribute are kept global so that programs can link with runtime libraries.

GCC can now do a form of partial dead code elimination (PDCE) that allows code motion of expressions to the paths where the result of the expression is actually needed. This is not always a win, so the pass has been limited to only consider profitable cases. Here is an example: int foo (int *, int *); int bar (int d) { int a, b, c; b = d + 1; c = d + 2; a = b + c; if (d) { foo (&b, &c); a = b + c; } printf ("%d

", a); } The a = b + c can be sunk to right before the printf . Normal code sinking will not do this, it will sink the first one above into the else-branch of the conditional jump, which still gives you two copies of the code.

The can be sunk to right before the . Normal code sinking will not do this, it will sink the first one above into the else-branch of the conditional jump, which still gives you two copies of the code. GCC now has a value range propagation pass. This allows the compiler to eliminate bounds checks and branches. The results of the pass can also be used to accurately compute branch probabilities.

The pass to convert PHI nodes to straight-line code (a form of if-conversion for GIMPLE) has been improved significantly. The two most significant improvements are an improved algorithm to determine the order in which the PHI nodes are considered, and an improvement that allow the pass to consider if-conversions of basic blocks with more than two predecessors.

Alias analysis improvements. GCC can now differentiate between different fields of structures in Tree-SSA's virtual operands form. This lets stores/loads from non-overlapping structure fields not conflict. A new algorithm to compute points-to sets was contributed that can allows GCC to see now that p->a and p->b , where p is a pointer to a structure, can never point to the same field.

and , where is a pointer to a structure, can never point to the same field. Various enhancements to auto-vectorization:

Incrementally preserve SSA form when vectorizing. Incrementally preserve loop-closed form when vectorizing. Improvements to peeling for alignment: generate better code when the misalignment of an access is known at compile time, or when different accesses are known to have the same misalignment, even if the misalignment amount itself is unknown. Consider dependence distance in the vectorizer. Externalize generic parts of data reference analysis to make this analysis available to other passes. Vectorization of conditional code. Reduction support.

GCC can now partition functions in sections of hot and cold code. This can significantly improve performance due to better instruction cache locality. This feature works best together with profile feedback driven optimization.

A new pass to avoid saving of unneeded arguments to the stack in vararg functions if the compiler can prove that they will not be needed.

functions if the compiler can prove that they will not be needed. Transition of basic block profiling to tree level implementation has been completed. The new implementation should be considerably more reliable (hopefully avoiding profile mismatch errors when using -fprofile-use or -fbranch-probabilities ) and can be used to drive higher level optimizations, such as inlining. The -ftree-based-profiling command-line option was removed and -fprofile-use now implies disabling old RTL level loop optimizer ( -fno-loop-optimize ). Speculative prefetching optimization (originally enabled by -fspeculative-prefetching ) was removed.

New Languages and Language specific improvements

C and Objective-C

The old Bison-based C and Objective-C parser has been replaced by a new, faster hand-written recursive-descent parser.

Ada

The build infrastructure for the Ada runtime library and tools has been changed to be better integrated with the rest of the build infrastructure of GCC. This should make doing cross builds of Ada a bit easier.

C++

ARM-style name-injection of friend declarations is no longer the default. For example: struct S { friend void f(); }; void g() { f(); } will not be accepted; instead a declaration of f will need to be present outside of the scope of S . The new -ffriend-injection option will enable the old behavior.

will not be accepted; instead a declaration of will need to be present outside of the scope of . The new option will enable the old behavior. The (undocumented) extension which permitted templates with default arguments to be bound to template template parameters with fewer parameters has been deprecated, and will be removed in the next major release of G++. For example: template <template <typename> class C> void f(C<double>) {} template <typename T, typename U = int> struct S {}; template void f(S<double>); makes use of the deprecated extension. The reason this code is not valid ISO C++ is that S is a template with two parameters; therefore, it cannot be bound to C which has only one parameter.

Runtime Library (libstdc++)

Optimization work: A new implementation of std::search_n is provided, better performing in case of random access iterators. Added further efficient specializations of istream functions, i.e., character array and string extractors. Other smaller improvements throughout.

Policy-based associative containers, designed for high-performance, flexibility and semantic safety are delivered in ext/pb_assoc .

. A versatile string class, __gnu_cxx::__versa_string , providing facilities conforming to the standard requirements for basic_string , is delivered in <ext/vstring.h> . In particular: Two base classes are provided: the default one avoids reference counting and is optimized for short strings; the alternate one, still uses it while improving in a few low level areas (e.g., alignment). See vstring_fwd.h for some useful typedefs. Various algorithms have been rewritten (e.g., replace), the code streamlined and simple optimizations added. Option 3 of DR 431 is implemented for both available bases, thus improving the support for stateful allocators.

, providing facilities conforming to the standard requirements for , is delivered in . In particular: As usual, many bugs have been fixed (e.g., libstdc++/13583, libstdc++/23953) and LWG resolutions put into effect for the first time (e.g., DR 280, DR 464, N1780 recommendations for DR 233, TR1 Issue 6.19). The implementation status of TR1 is now tracked in the docs in tr1.html .

Objective-C++

A new language front end for Objective-C++ has been added. This language allows users to mix the object oriented features of Objective-C with those of C++.

Java (GCJ)

Core library (libgcj) updates based on GNU Classpath 0.15 - 0.19 features (plus some 0.20 bug-fixes) Networking The java.net.HttpURLConnection implementation no longer buffers the entire response body in memory. This means that response bodies larger than available memory can now be handled. (N)IO NIO FileChannel.map implementation, fast bulk put implementation for DirectByteBuffer (speeds up this method 10x). FileChannel.lock() and FileChannel.force() implemented. XML gnu.xml fix for nodes created outside a namespace context. Add support for output indenting and cdata-section-elements output instruction in xml.transform . xml.xpath corrections for cases where elements/attributes might have been created in non-namespace-aware mode. Corrections to handling of XSL variables and minor conformance updates. AWT GNU JAWT implementation, the AWT Native Interface, which allows direct access to native screen resources from within a Canvas's paint method. GNU Classpath Examples comes with a Demo, see libjava/classpath/examples/README . awt.datatransfer updated to 1.5 with support for FlavorEvents . The gtk+ awt peers now allow copy/paste of text, images, URIs/files and serialized objects with other applications and tracking clipboard change events with gtk+ 2.6 (for gtk+ 2.4 only text and serialized objects are supported). A GNU Classpath Examples datatransfer Demo was added to show the new functionality. Split gtk+ awt peers event handling in two threads and improve gdk lock handling (solves several awt lock ups). Speed up awt Image loading. Better gtk+ scrollbar peer implementation when using gtk+ >= 2.6. Handle image loading errors correctly for gdkpixbuf and MediaTracker . Better handle GDK lock. Properly prefix gtkpeer native functions ( cp_gtk ). GdkGraphics2D has been updated to use Cairo 0.5.x or higher. BufferedImage and GtkImage rewrites. All image drawing operations should now work correctly (flipping requires gtk+ >= 2.6) Future Graphics2D , image and text work is documented at: http://developer.classpath.org/mediation/ClasspathGraphicsImagesText When gtk+ 2.6 or higher is installed the default log handler will produce stack traces whenever a WARNING, CRITICAL or ERROR message is produced. Free Swing The RepaintManager has been reworked for more efficient painting, especially for large GUIs. The layout manager OverlayLayout has been implemented, the BoxLayout has been rewritten to make use of the SizeRequirements utility class and caching for more efficient layout. Improved accessibility support. Significant progress has been made in the implementation of the javax.swing.plaf.metal package, with most UI delegates in a working state now. Please test this with your own applications and provide feedback that will help us to improve this package. The GUI demo ( gnu.classpath.examples.swing.Demo ) has been extended to highlight various features in our Free Swing implementation. And it includes a look and feel switcher for Metal (default), Ocean and GNU themes. The javax.swing.plaf.multi package is now implemented. Editing and several key actions for JTree and JTable were implemented. Lots of icons and look and feel improvements for Free Swing basic and metal themes were added. Try running the GNU Classpath Swing Demo in examples ( gnu.classpath.examples.swing.Demo ) with: -Dswing.defaultlaf=javax.swing.plaf.basic.BasicLookAndFeel or -Dswing.defaultlaf=javax.swing.plaf.metal.MetalLookAndFeel Start of styled text capabilites for java.swing.text . DefaultMutableTreeNode pre-order, post-order, depth-first and breadth-first traversal enumerations implemented. JInternalFrame colors and titlebar draw properly. JTree is working up to par (icons, selection and keyboard traversal). JMenus were made more compatible in visual and programmatic behavior. JTable changeSelection and multiple selections implemented. JButton and JToggleButton change states work properly now. JFileChooser fixes. revalidate() and repaint() fixes which make Free Swing much more responsive. MetalIconFactory implemented. Free Swing Top-Level Compatibility. JFrame , JDialog , JApplet , JInternalFrame , and JWindow are now 1.5 compatible in the sense that you can call add() and setLayout() directly on them, which will have the same effect as calling getContentPane().add() and getContentPane().setLayout() . The JTree interface has been completed. JTree s now recognizes mouse clicks and selections work. BoxLayout works properly now. Fixed GrayFilter to actually work. Metal SplitPane implemented. Lots of Free Swing text and editor stuff work now. Free RMI and Corba Andrew Watson, Vice President and Technical Director of the Object Management Group, has officially assigned us 20 bit Vendor Minor Code Id: 0x47430 ("GC") that will mark remote classpath-specific system exceptions. Obtaining the VMCID means that GNU Classpath now is a recogniseable type of node in a highly interoperable CORBA world. GNU Classpath now includes the first working draft to support the RMI over IIOP protocol. The current implementation is capable of remote invocations, transferring various Serializables and Externalizables via RMI-IIOP protocol. It can flatten graphs and, at least for the simple cases, is interoperable with 1.5 JDKs. org.omg.PortableInterceptor and related functionality in other packages is now implemented: The sever and client interceptors work as required since 1.4. The IOR interceptor works as needed for 1.5. The org.omg.DynamicAny package is completed and passes the prepared tests. The Portable Object Adapter should now support the output of the recent IDL to java compilers. These compilers now generate servants and not CORBA objects as before, making the output depend on the existing POA implementation. Completing POA means that such code can already be tried to run on Classpath. Our POA is tested for the following usager scenarios: POA converts servant to the CORBA object. Servant provides to the CORBA object. POA activates new CORBA object with the given Object Id (byte array) that is later accessible for the servant. During the first call, the ServantActivator provides servant for this and all subsequent calls on the current object. During each call, the ServantLocator provides servant for this call only. ServantLocator or ServantActivator forwards call to another server. POA has a single servant, responsible for all objects. POA has a default servant, but some objects are explicitly connected to they specific servants. The POA is verified using tests from the former cost.omg.org . The CORBA implementation is now a working prototype that should support features up to 1.3 inclusive. We invite groups writing CORBA dependent applications to try Classpath implementation, reporting any possible bugs. The CORBA prototype is interoperable with Sun's implementation v 1.4, transferring object references, primitive types, narrow and wide strings, arrays, structures, trees, abstract interfaces and value types (feature of CORBA 2.3) between these two platforms. Remote exceptions are transferred and handled correctly. The stringified object references (IORs) from various sources are parsed as required. The transient (for current session) and permanent (till jre restart) redirections work. Both Little and Big Endian encoded messages are accepted. The implementation is verified using tests from the former cost.omg.org. The current release includes working examples (see the examples directory), demonstrating the client-server communication, using either CORBA Request or IDL-based stub (usually generated by a IDL to java compiler). These examples also show how to use the Classpath CORBA naming service. The IDL to java compiler is not yet written, but as our library must be compatible, it naturally accepts the output of other idlj implementations. Misc Updated TimeZone data against Olson tzdata2005l. Make zip and jar packages UTF-8 clean. "native" code builds and compiles (warning free) on Darwin and Solaris. java.util.logging.FileHandler now rotates files. Start of a generic JDWP framework in gnu/classpath/jdwp . This is unfinished, but feedback (at classpath@gnu.org ) from runtime hackers is greatly appreciated. Although most of the work is currently being done around gcj/gij we want this framework to be as VM neutral as possible. Early design is described in: https://gcc.gnu.org/ml/java/2005-05/msg00260.html QT4 AWT peers, enable by giving configure --enable-qt-peer . Included, but not ready for production yet. They are explicitly disabled and not supported. But if you want to help with the development of these new features we are interested in feedback. You will have to explicitly enable them to try them out (and they will most likely contain bugs). Documentation fixes all over the place. See http://developer.classpath.org/doc/



New Targets and Target Specific Improvements

IA-32/x86-64

The x86-64 medium model (that allows building applications whose data segment exceeds 4GB) was redesigned to match latest ABI draft. New implementation split large datastructures into separate segment improving performance of accesses to small datastructures and also allows linking of small model libraries into medium model programs as long as the libraries are not accessing the large datastructures directly. Medium model is also supported in position independent code now. The ABI change results in partial incompatibility among medium model objects. Linking medium model libraries (or objects) compiled with new compiler into medium model program compiled with older will likely result in exceeding ranges of relocations. Binutils 2.16.91 or newer are required for compiling medium model now.

RS6000 (POWER/PowerPC)

The AltiVec vector primitives in <altivec.h> are now implemented in a way that puts a smaller burden on the preprocessor, instead processing the "overloading" in the front ends. This should benefit compilation speed on AltiVec vector code.

are now implemented in a way that puts a smaller burden on the preprocessor, instead processing the "overloading" in the front ends. This should benefit compilation speed on AltiVec vector code. AltiVec initializers now are generated more efficiently.

The popcountb instruction available on POWER5 now is generated.

The floating point round to integer instructions available on POWER5+ now is generated.

Floating point divides can be synthesized using the floating point reciprocal estimate instructions.

Double precision floating point constants are initialized as single precision values if they can be represented exactly.

S/390, zSeries and System z9

Support for the IBM System z9 109 processor has been added. When using the -march=z9-109 option, the compiler will generate code making use of instructions provided by the extended immediate facility.

option, the compiler will generate code making use of instructions provided by the extended immediate facility. Support for 128-bit IEEE floating point has been added. When using the -mlong-double-128 option, the compiler will map the long double data type to 128-bit IEEE floating point. Using this option constitutes an ABI change, and requires glibc support.

option, the compiler will map the data type to 128-bit IEEE floating point. Using this option constitutes an ABI change, and requires glibc support. Various changes to improve performance of generated code have been implemented, including: In functions that do not require a literal pool, register %r13 (which is traditionally reserved as literal pool pointer), can now be freely used for other purposes by the compiler. More precise tracking of register use allows the compiler to generate more efficient function prolog and epilog code in certain cases. The SEARCH STRING , COMPARE LOGICAL STRING , and MOVE STRING instructions are now used to implement C string functions. The MOVE CHARACTER instruction with single byte overlap is now used to implement the memset function with non-zero fill byte. The LOAD ZERO instructions are now used where appropriate. The INSERT CHARACTERS UNDER MASK , STORE CHARACTERS UNDER MASK , and INSERT IMMEDIATE instructions are now used more frequently to optimize bitfield operations. The BRANCH ON COUNT instruction is now used more frequently. In particular, the fact that a loop contains a subroutine call no longer prevents the compiler from using this instruction. The compiler is now aware that all shift and rotate instructions implicitly truncate the shift count to six bits.

Back-end support for the following generic features has been implemented: The full set of built-in functions for atomic memory access. The -fstack-protector feature. The optimization pass avoiding unnecessary stores of incoming argument registers in functions with variable argument list.



SPARC

The default code model in 64-bit mode has been changed from Medium/Anywhere to Medium/Middle on Solaris.

TLS support is disabled by default on Solaris prior to release 10. It can be enabled on TLS-capable Solaris 9 versions (4/04 release and later) by specifying --enable-tls at configure time.

MorphoSys

Support has been added for this new architecture.

Obsolete Systems

Documentation improvements

Other significant improvements

GCC can now emit code for protecting applications from stack-smashing attacks. The protection is realized by buffer overflow detection and reordering of stack variables to avoid pointer corruption.

Some built-in functions have been fortified to protect them against various buffer overflow (and format string) vulnerabilities. Compared to the mudflap bounds checking feature, the safe builtins have far smaller overhead. This means that programs built using safe builtins should not experience any measurable slowdown.

GCC 4.1.2

This is the list of problem reports (PRs) from GCC's bug tracking system that are known to be fixed in the 4.1.2 release. This list might not be complete (that is, it is possible that some PRs that have been fixed are not listed here).

When generating code for a shared library, GCC now recognizes that global functions may be replaced when the program runs. Therefore, it is now more conservative in deducing information from the bodies of functions. For example, in this example:

void f() {} void g() { try { f(); } catch (...) { cout << "Exception"; } }

G++ would previously have optimized away the catch clause, since it would have concluded that f cannot throw exceptions. Because users may replace f with another function in the main body of the program, this optimization is unsafe, and is no longer performed. If you wish G++ to continue to optimize as before, you must add a throw() clause to the declaration of f to make clear that it does not throw exceptions.