Hi Everyone,This is the first in what I hope will be a continuing series of blogs describing monthly changes in the GNU Toolchain (gcc, binutils, newlib and possibly gdb as well). One of my jobs at Red Hat is to take the changes in the public versions of the toolchain sources and copy them into our internal repository. I do this on a monthly basis and I produce a short report each time detailing what has happened. One of my friends here suggested that people outside of Red Hat might be interested in these monthly reports and so that is why I have started this blog.So here is the report for September 2008, suitably sanitized to remove any internal-only details:Whilst there has been very little change in the binutils this month,there has been a lot going on in gcc-land...Firstly a new register allocator has been added, with the promisethat this will bring a performance boost to all of the ports.(Register allocation is one of the hardest tasks for any compiler,and GCC has had several different allocators during its history).The new allocator is also being used as a test to trim away anyunmaintained ports. Any port that has not been switched over to thenew register allocator by the end of the month will be marked asdeprecated and removed after the next release of gcc.The new allocator, called IRA or the Integrated Register Allocatorbrings some new command line options with, to control its behaviour:-fira-algorithm= Use specified algorithm for the integrated register allocator.The argument should be one of "regional", "CB",or "mixed". The first algorithm can give the best result formachines with a small number of registers, the second one isfaster and generates the smallest size code, but the thirdalgorithm usually give the best results in most cases and formost architectures. That is why it is the default.-fira-coalesceDo optimistic register coalescing. This option might beprofitable for architectures with big regular register sets.Secondly a new set of loop transformation optimizations has beenadded to GCC. Supported by yet another internal representation,called "Graphite" this time, these transformations allow for somesimple, but potentially very useful optimizations. And the newinternal representation should make it easier to add more loopoptimizations in the future. The new optimizations are:-floop-interchangePerform loop interchange transformations on loops.Interchanging two nested loops switches the inner andouter loops. For example, given a loop like:DO J = 1, MDO I = 1, NA(J, I) = A(J, I) * CENDDOENDDOloop interchange will transform the loop as if the userhad written:DO I = 1, NDO J = 1, MA(J, I) = A(J, I) * CENDDOENDDOwhich can be beneficial when N is larger than the datacache(s). This example uses Fortran, where the arraysare stored by columns, not rows, so the first version issub-optimal since it is accessing the data on a per-rowbasis. The optimization itself is not restricted toFortran however, and be useful in any language.-floop-strip-minePerform loop strip mining transformations on loops.Strip mining splits a loop into two nested loops.The outer loop has strides equal to the strip sizeand the inner loop has strides of the original loopwithin a strip. For example, given a loop like:DO I = 1, NA(I) = A(I) + CENDDOloop strip mining will transform the loop as if theuser had written:DO II = 1, N, 4DO I = II, min (II + 4, N)A(I) = A(I) + CENDDOENDDO-floop-blockPerform loop blocking transformations on loops.Blocking strip mines each loop in the loop nestsuch that the memory accesses of the element loopsfit inside caches. For example, given a loop like:DO I = 1, NDO J = 1, MA(J, I) = B(I) + C(J)ENDDOENDDOloop blocking will transform the loop as if the userhad written:DO II = 1, N, 64DO JJ = 1, M, 64DO I = II, min (II + 63, N)DO J = JJ, min (JJ + 63, M)A(J, I) = B(I) + C(J)ENDDOENDDOENDDOENDDOwhich can be beneficial when M is larger than thedata cache(s), because the innermost loop will iterateover a smaller amount of data.But wait, there's more. More new optimizations that is. Thefollowing are not based on the Graphite representation or the newregister allocator, but have been added separately:-fipa-cp-clonePerform function cloning to make interprocedural constantpropagation stronger. When enabled, externally visiblefunctions that take constant arguments are cloned so thatone version exists for each known set of possible arguments.This then allows more opportunities for constant propagation.Since this optimization can create multiple copies offunctions, it may significantly increase code size and so itis only enabled by default at -O3.-fselective-schedulingSchedule instructions using selective scheduling algorithm.The selective instruction scheduler is an alternative toGCC's default scheduler which may produce better results insome cases. This option enables the selective scheduler forthe first instruction scheduling pass.-fselective-scheduling2This option enables the selective scheduler for the secondinstruction scheduling pass.-fsel-sched-pipeliningEnable software pipelining of innermost loops duringselective scheduling. This option has no effect until oneof -fselective-scheduling or -fselective-scheduling2 isturned on.-fsel-sched-pipelining-outer-loopsWhen pipelining loops during selective scheduling, alsopipeline outer loops. This option has no effect until-fsel-sched-pipelining is turned on.-fprofile-correctionProfiles collected using an instrumented binary formulti-threaded programs may be inconsistent due to missedcounter updates. When this option is specified, GCC willuse heuristics to correct or smooth out such inconsistencies.By default, GCC will emit an error message when aninconsistent profile is detected.Finally a new port has been partially added to GCC. The "picoChip"port for the cpu created by picoChip Designs Ltd. http://www.picochip.com exists in the GCC sources, but not yet in the binutils sources. Soyou can compile code for it, but not assemble or link this code.