Google Summer Of Code

Thanks for your interest in the GNU Compiler Collection as your mentoring organization in Google's Summer of Code (GSoC).

GCC has applied independently to be a GSoC mentoring organization in 2020. The primary org-admin is Martin Jambor. In the past, GCC has also applied under the umbrella of GNU project mentoring organization.

If you are a student with a project idea or want to work on any of the ideas below, please discuss it as soon as possible (way before the application) via the mailing list and feel free to raise it on IRC. Also make sure you have read the Before you apply and Application sections on this page.

The GCC is owned by the Free Software Foundation (FSF). As such, all contributors must assign their copyright to the FSF before any of their changes are accepted. The copyright assignment process is described in Contributing to GCC. See also GettingStarted.

Selected Project Ideas for 2020

When discussing GSoC project ideas for 2020 in the community, we have found out we are especially interested in the following few. One of their main advantages, apart from their particular relevance and usefulness this year, is that we are confident we can find mentors for them. We will however also consider other projects and we will be happy to discuss with you your own ideas. Nevertheless, please do consider applying for the following:

Bypass assembler when generating LTO object files . Currently we create Link-time-optimization (LTO) object files with the help of assembler, which however only creates ELF files with the provided binary contents. The aim of this patch is to create them directly from the compiler. Preliminary patch is at https://gcc.gnu.org/ml/gcc/2014-09/msg00340.html. Finishing this would require work on libiberty simple object file handling, in the GCC wrapper and in GCC itself. If finished, the compile time performance should improve by several percent. This project would be mentored by Jan Hubička. Required skills include: C/C++, working with ELF file format.

Create a general jobserver client/server library which will enable modern build systems like Ninja or Meson to manage the degree of parallelism when GCC spawns LTO ltrans jobs and in the future when it does other things in parallel. See https://make.mad-scientist.net/papers/jobserver-implementation/. This project would be mentored by Martin Liška or Martin Jambor. Required skills include C/C++ and designing portable and reusable shared libraries.

Enable incremental Link Time Optimization (LTO) linking . At the moment, LTO re-optimizes and generates code for the whole program or library if just one object file changes, even an insignificant way. The student working on this project will write code determining what changes in LTO input are significant for various stages of LTO processing and try not to redo the work which is would lead to the same results as before. The mentor of this project would be Jan Hubička. Required skills include C/C++ and familiarity with the LTO model.

Implement something similar to Clang's -ftime-trace feature which generates performance reports that show where the compiler spends compile time. For more information, please check the following blog post. There's also an existing bugzilla entry for this (if this becomes a GSoC project, the assignee will of course change). The project would be mentored by Martin Liška. Required skills include C/C++ and finding a way through a large code-base.

Extend the static analysis pass GCC 10 has gained an experimental static analysis pass which performs some rudimentary checking of malloc/free and the stdio FILE stream API. There is plenty of scope for extending this pass in ways that may interest a student, such as (a) generalizing the double-free checker to attribute-marking of acquire/release API entrypoints so that the user can mark the entrypoints and get a checker for that API "for free", (b) checking of the POSIX file-descriptor APIs (int rather than FILE *), or some other POSIX API that we're not yet checking, (c) adding plugin support, and write a plugin to add a project specific-checker for a project of interest to the student (Linux kernel?), (d) C++ support (new/delete checking, exceptions, etc). This project would be mentored by David Malcolm. Required skills include C/C++ and finding a way through a large code-base.

Fortran – run-time argument checking. – In particular older Fortran code, which does not use modules, but also code which uses implizit-size or explicit-size arrays is prone to argument mismatches. The goal of this item is to add an optional run-time test which works by storing the argument-type/size data before the call in a global variable – and check against it in the callee. (A pointer to the called function is stored alongside to permit calls from uninstrumented code to instrumented code.) This project would be mentored by Tobias Burnus. Required skills include C/C++; some knowledge of Fortran helps, but is not needed.

Fortran – shared-memory coarrays – Coarrays are a means of parallizing code; conceptually, all memory is local memory, except for coarrays which are on multiple processes ("images") and remote can be directly accessed. (Internally: one-sided communication.) GCC/gfortran supports "single" (compiles but does not do any actual parallelization) and "lib" (requires a communication library). The goal of this task is to add a shared-memory implementation – such that parallel coarray programs runs out of the box without additional external libraries. This project would be mentored by Tobias Burnus. This project consists of work mostly on a run-time library written in C but also on the compiler itself written in C/C++. Hence, required skills include C/C++, knowledge about POSIX Threads; some knowledge of Fortran helps, but is not needed.

Fortran – User-defined Parameterized Derived Types (PDT) – In Fortran, derived types can be declared such that some details can be deferred to the declaration of a variable, e.g. which kind of real type should be used ("kind" parameter, value known at compile time) or which array sizes or string length shall be used ("len" parameter, might be known only at run time). This project would be mentored by Tobias Burnus. Required skills include C/C++; some knowledge of Fortran helps, but is not needed.

Parallelize compilation using threads, part two . In 2019 we had our first and successful Summer Of Code project on GCC parallelization. In this continuation, the student will further the work to avoid issues with global state as much as possible by partitioning the compilation pipeline in pieces that share as little global state as possible and ensure each thread only works in one of those partitions. The biggest roadblock will be the not thread-safe memory allocator of GCC garbage collector. The goal of this project is to have a compilation pipeline driven by a scheduler assigning functions to be optimized to the partitions in the pipeline. This project would be mentored by Richard Biener. Required skills include: C/C++, ability to analyze big complex code base, parallelization.

Implementation of OMPD in GCC, libgomp and GDB . OMPD is a standard to make OpenMP (parallel) programs more easy to debug, presenting the programmers with the state of the program in terms of the OpenMP standard (e.g. OpenMP threads, teams, tasks and so on) as opposed to underlying layers such as pthreads. The aim of this project is to create a prototype implementation in the GNU toolchain which may not cover all use cases but is well designed so that it can be finished afterwards. Jakub Jelinek would mentor the GCC side (we plan to reach out to the GDB community for a co-mentor). Required skills include: C/C++, parallel computing in general and at least rudimentary OpenMP in particular.

Binutils support for AIX 7.2. GNU Binutils. Binutils (Gas, Gld, etc.) currently support AIX 4.3.3 and partially support AIX 5.1. GNU BFD library has existing support for XCOFF and GDB functions. This project would update AIX support in GNU Binutils support for the latest release of AIX 7.2. The main goal is the ability to bootstrap GCC with GNU Binutils (accept all current GCC instructions, directives and options for AIX Assembler and Linker) and produce correct, functioning GCC executable and GCC runtime shared object libraries (libgcc, libstdc++). This project would be mentored by David Edelsohn. Required skills include: C/C++, Binutils, AIX or at least willingness to learn it (access to an AIX system will be provided).

If we had to estimate difficulty of the above projects we would probably put all of them as hard. GCC is a production compiler and working on one of those is hard, especially if you are new. On the other hand, the community of GCC developers is very nice and helpful and goes out of its way to assist newcomers with the various difficulties they inevitably encounter.

Other/older Project Ideas

Note that some of the ideas found below might be fully or partially obsolete. This is another reason why it is always a good idea to discuss the project of interest on the mailing list and/or via IRC before submitting a GSoC proposal.

Link-time and interprocedural optimization improvements

Link-time optimization (LTO) is powerful infrastructure in GCC and there are many areas how to make it better, for example:

Implement tree level section anchors to improve code generation at ARM/PPC.

Language front-ends and run-time libraries

New optimization passes

Implement code motion of stores towards entry (and use this to improve code for int to float conversion on rs6000-based targets)

Implement a prototype for early instruction selection

Propagate interprocedural dataflow from GIMPLE to RTL

Add Factored Use-Def (FUD) chains to RTL

Loop optimizations and automatic parallelization based on Graphite

Implement a basic-block local scheduling pass to improve SSA name coalescing opportunities at RTL expansion time

Implement a (prototype) addressing mode selection (AMS) pass as a replacement of auto-inc-dec. For more details see PR 56590.

Other projects and project ideas

Type Sanitizer. Both LLVM and GCC compilers do share a common sanitizer library called libsanitizer. The library has recently received support of typed-based sanitization (TySan). Goal of the task would be to investigate and prototype usage of type-based aliasing rules information provided by GCC in order to detect violations of strict aliasing rules.

Replace libiberty with gnulib. See http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00362.html Initial work was done in GSoC 2016 (replacelibibertywithgnulib).

Finish the implementation of a stable introspection plugin API (with the possibility of extending it to cover non-introspection cases)

Modify any GCC optimization decisions externally through plugins (see MILEPOST GCC, for example). -- G. Fursin, 2014.

Systematize learning of optimal optimization decisions for multiple benchmarks, data sets and architectures (see c-mind.org/repo, for example). -- G. Fursin, 2014.

Extend GCC plugin framework to enable code instrumentation (insert calls to external function after individual instructions) for dynamic code analysis. We need it to extend our TM/TLS models. -- G. Fursin, 2014.

Fix -ftrapv so that it works.

Improve the regression testing system, for example to detect places where the generated code changed (useful for refactoring).

Promote C++ operator new to alloca when pointer does not escape and user allows non-conformance to C++ standard

Improve loop unrolling heuristics and enable loop unrolling with default optimization

Analyze and improve inlining, loop unrolling, reassociation and predictive commoning heuristics for PowerPC architecture

Use TARGET_EXPAND_TO_RTL_HOOK for pipelined divide on PowerPC

Support AIX XCOFF file format for LTO (David Edelsohn)

There are several pages with general ideas for GCC, many of which we linked below for easy access. These ideas usually are not just one project but a group of distinct projects.

Or invent your own project. We're always open to good ideas. But note that we are probably not too interested in projects to add new extensions to the C or C++ languages. We've found over time that these tend to introduce more problems than they solve.

Thanks, and we look forward to your submissions!

Improving GCC Developer Documentation

The rules of the GSoC program does not allow projects to consist of documentation improvements only. Nevertheless, note that writing documentation may be an important part of your project or even an essential one if you introduce user-visible changes and plan your work accordingly.

Before you apply

...and perhaps before you even reach out to us on the mailing list, make sure that you can check out the GCC source code from its Git repository, build GCC from it, run the testsuite, save results from a testsuite run, build and run the testsuite and compare the results of the two runs (this is something that would need doing very many times in the course of any project working on GCC).

The following links should help you:

How to checkout our sources using Git is described at https://gcc.gnu.org/git.html.

Steps linked from https://gcc.gnu.org/install/ show you how to configure, build and test GCC (look for --disable-bootstrap, among other things). The Installing GCC page shows an easy way to obtain the libraries required to build GCC which people often find most problematic and other advice related to building and installing GCC for the first time.

Make sure you also look at Getting Started wiki page.

Wiki page DebuggingGCC, David Malcolm's blogpost on Debugging GCC and the manual page about Developer options are of particular interest. Read through those, compile a simple but non-trivial program with

-O3 -S -fdump-tree-all -fdump-ipa-all -fdump-rtl-all

and look through the generated files. Look at the the source code, especially in the gcc subdirectory and try to set a breakpoint somewhere and hit it. Then look around in gdb.

If you have done all of the above and still find it a little bit intimidating or if you have difficulties figuring out where to start looking for particular things, do not despair. That is something the mentors and the community at large are willing to help you with.

Application

Students applying for a GCC Google Summer of Code project need to have experience coding in C/C++ and should have at least some theoretical background in the area of compilers and compiler optimizations.

First, you need to select a project. If you have been following GCC development, you might have an idea of your own, otherwise look at the suggested projects above and try to pick one there. In the course of selecting a project, do not hesitate to ask questions or request more details from the community by email to the gcc@gcc.gnu.org mailing list with the string "GSoC" in the email subject or on our #gcc IRC channel at irc.oftc.net. Please note that the mailing list does not accept HTML messages, you must set your email client to plain text. We also encourage you to browse through our web site at https://gcc.gnu.org/ and of course this wiki.

After you you have chosen your project, please make sure you send us an email about your intention to apply to the gcc@gcc.gnu.org mailing list with the string "GSoC" in the email subject, in addition to any general required steps to apply to the GSoC program.

Last but not least, the GCC is owned by the Free Software Foundation (FSF), as such, all contributors must assign their copyright to the FSF before any of their changes are accepted. The copyright assignment process is described on pages:

Formal application document

GCC does not have any application form or a mandatory application format to follow.

In the formal application document that you submit to GSoC you should primarily describe the project and clearly define its goals. Generally speaking, it is probably a good idea to accompany the proposed project description with a brief motivation, an expected time-line (we understand it is likely to change) and a brief introduction of your technical background, skills and/or accomplishments. The project description is the most important part however and each project is perhaps best explained differently. We will mostly judge your ability to finish the project from your interactions with us, on mailing lists and IRC, rather than from a CV.

Further tips and guidelines

A gcc Summer of Code participant for 2006, Laurynas Biveinis, wrote a blog about it.

The Drupal project has a great page on How to write an SOC application.

Be honest and realistic. We prefer a smaller project with clearly defined goals to a far-reaching but vague proposal (that is likely never going to be finished by the student).

Students that have already submitted good patches give a much better impression to reviewers and potential mentors.

Starting with some small patch for the area you are interested in before the proposal submittal period can help (ask for guidance and a simple enough project): It helps you to get the code known and to decide whether you really want to do the project, it shows how the development procedure is, and helps potential mentors to judge the proposal based on actual work. Besides: Also small fixes are good and getting people known by email (or IRC) exchange is nice by itself

And let's stress again that you need to present your project in the mailing list gcc@gcc.gnu.org to be sure it is a good idea. Prepend "GSoC" to the subject.

Accepted GCC Projects

2019

2018

Project Student Mentor LTO dump tool (project page) Hrishikesh Kulkarni Martin Liška and Jan Hubička

2016

2015

2014

Project Student Mentor Coarray support in GNU GFortran Alessandro Fanfarillo Tobias Burnus Concepts Separate Checking Braden Obrzut Andrew Sutton Integration of ISL code generator into Graphite Roman Gareev Tobias Grosser Generating folding patterns from meta description Prathamesh Kulkarni Richard Biener GCC Go escape analysis Ray Li Ian Lance Taylor

2013

2012

2011

2010

The source code for finished projects can be found at Google's code hosting site and their respective SVN branches.

2009

The source code for finished projects can be found at Google's code hosting site.

Project Student Mentor Automatic parallelization in Graphite Li Feng Tobias Grosser Enable generic function cloning and program instrumentation in GCC to be able to create static binaries adaptable to varying program and system behavior or different architectures at run-time Liang Peng Grigori Fursin gfortran: Procedure Pointer Components & OOP Janus Weil Tobias Burnus Traditional Loop Transformations pranav garg Sebastian Pop Make the OpenCL Platform Layer API and Runtime API for the Cell Processor and CPUs phil prattszeliga Paolo Bonzini Provide fine-grain optimization selection and tuning abilities in GCC to be able to tune default optimization heuristic of the compiler or fine optimizations for a given program on a given architecture entirely automatically using statistical and machine learning techniques from the MILEPOST project. Yuanjie Huang Grigori Fursin

2008

The source code for finished projects can be found at Google's code hosting site.

2007

The source code for finished projects can be found at Google's code hosting site.

Project Student Mentor Propagating array data dependence information from Tree-SSA to RTL Alexander Monakov Daniel Berlin Better_Uninitialized_Warnings Manuel López-Ibáñez Diego Novillo Speeding up GCC for fun and profit James Webber Eric Marshall Christopher Fortran 2003 features for GCC Janus Weil Steven Bosscher Open Mutliprogramming Interprocedural Analasis and Optimalizations Jakub Staszak Daniel Berlin Integrating OpenJDK's javac bytecode compiler into gcj Dalibor Topic Mark J. Wielaard New static scheduling heuristic for GCC Dmitry Zhurikhin Vladimir Makarov GCC support for Windows-compatible Structured Exception Handling (SEH) on the i386 platform Michele Cicciotti Ian Lance Taylor

2006