Here it is a new guide, to collect and organize all the knowledge that you need to create your programming language from scratch.

Creating a programming language is one of the most fascinating challenge you can dream of as a developer.

The problem is that there are a lot of moving parts, a lot of things to do right and it is difficult to find a well detailed map, to show you the way. Sure, you can find a tutorial on writing half a parser there, an half-baked list of advices on language design, an example of a naive interpreter. To find those things you will need to spend hours navigating forums and following links.

We thought it was the case to save you some time by collecting relevant resources, evaluate them and organize them. So you can spend time using good resources, not looking for them.

We organized the resources around the three stages in the creation of a programming language: design, parsing, and execution.

Section 1

Designing the Language

When creating a programming language you need to take ideas and transform them in decisions. This is what you do during the design phase.

Before You Start…

Some good resources to beef up your culture on language design.

Articles

Designing the next programming language? Understand how people learn!, a few considerations on how to design a programming language that it’s easy to understand.

Five Questions About Language Design, (some good and some random) notes on programming language design by Paul Graham.

Programming Paradigms for Dummies: What Every Programmer Should Know (PDF), this is actually a chapter of a book and it’s not really for dummies, unless they are the kind of dummies with a degree in Computer Science. Apart from that, it’s a great overview of the different programming paradigms, which can be useful to help you understand where your language will fit.

Books

Design Concepts in Programming Languages, if you want to make deliberate choices in the creation of your programming language, this is the book you need. Otherwise, if you don’t already have the necessary theoretical background, you risk doing things the way everybody else does them. It’s also useful to develop a general framework to understand how the different programming languages behave and why.

Practical Foundations for Programming Languages, this is, for the most part, a book about studying and classifying programming languages. But by understanding the different options available it can also be used to guide the implementation of your programming language.

Programming Language Pragmatics, 4th Edition, this is the most comprehensive book to understand contemporary programming languages. It discusses different aspects, of everything from C# to OCaml, and even the different kinds of programming languages such as functional and logical ones. It also covers the several steps and parts of the implementation, such as an intermediate language, linking, virtual machines, etc.

Structure and Interpretation of Computer Programs, Second Edition, an introduction to computer science for people that already have a degree in it. A book widely praised by programmers, including Paul Graham directly on the Amazon Page, that helps you developing a new way to think about programming language. It’s quite abstract and examples are proposed in Scheme. It also covers many different aspect of programming languages including advanced topics like garbage collection.

Type Systems

Long discussions and infinite disputes are fought around type systems. Whatever choices you end up making it make sense to know the different positions.

Articles

These are two good introductory articles on the subject of type systems. The first discuss the dichotomy Static/Dynamic and the second one dive into Introspection.

What To Know Before Debating Type Systems, if you already know the basics of type systems this article is for you. It will permit you to understand them better by going into definitions and details.

Type Systems (PDF), a paper on the formalization of type systems that also introduces more precise definitions of the different type systems.

Books

Types and Programming Languages, a comprehensive book on understanding type systems. It will impact your ability to design programming languages and compilers. It has a strong theoretical support, but it also explains the practical importance of individual concepts.

Functional programming and type systems, an interesting university course on type systems for functional programming. It is used in a well known French university. There are also notes and presentation material available. It is as advanced as you would expect.

Type Systems for Programming Language, is a simpler course on type system for (functional) programming languages.

Section 2

Parsing

Parsing transforms the concrete syntax in a form that is more easily manageable by computers. This usually means transforming text written by humans in a more useful representation of the source code, an Abstract Syntax Tree.

There are usually two components in parsing: a lexical analyzer and the proper parser. Lexers, which are also known as tokenizers or scanners, transform the individual characters in tokens, the atom of meaning. Parsers instead organize the tokens in the proper Abstract Syntax Tree for the program. But since they are usually meant to work together you may use a single tool that does both the tasks.

Flex, as a lexer generator and (Berkeley) Yacc or Bison, for the generation of the proper parser, are the venerable choices to generate a complete parser. They are a few decades old and they are still maintained as open source software. They are written in and thought for C/C++. They still works, but they have limitations in features and support for other languages.

ANTLR, is both a lexer and a parser generator. It’s also more actively developed as open source software. It is written in Java, but it can generate both the lexer and the parser in languages as varied as C#, C++, Python, Javascript, Go, etc.

Your own lexer and parser. If you need the best performance and you can create your own parser. You just need to have the necessary computer science knowledge.

Tutorials

Flex and Bison tutorial, a good introduction to the two tools with bonus tips.

Lex and Yacc Tutorial, at 40 pages this is the ideal starting point to learn how to put together lex and yacc in a few hours.

Video Tutorial on lex/yacc in two parts, in an hour of video you can learn the basics of using lex and yacc.

ANTLR Mega Tutorial, the renown and beloved tutorial that explains everything you need to know about ANTLR, with bonus tips and tricks and even resources to know more.

Books

lex & yacc, despite being a book written in 1992 it’s still the most recommended book on the subject. Some people say because the lack of competition, others because it is good enough.

flex & bison: Text Processing Tools, the best book on the subject written in this millennium.

The Definitive ANTLR 4 Reference, written by the main author of the tool this is really the definitive book on ANTLR 4. It explains all of its secrets and it’s also a good introduction about how the whole parsing thing works.

Parsing Techniques, 2nd edition, a comprehensive, advanced and costly book to know more than you possibly need about parsing.

Section 3

Execution

To implement your programming language, that is to say to actually making something happens, you can build one of two things: a compiler or an interpreter. You could also build both of them if you want. Here you can find a good overview if you need it: Compiled and Interpreted Languages.

The resources here are dedicated to explaining how compilers and/or interpreters are built, but for practical reasons often they also explain the basics of creating lexers and parsers.

Compilers

A compiler transforms the original code into something else, usually machine code, but it could also be simply any lower level language, such as C. In the latter case some people prefer to use the term transpiler.

LLVM, a collection of modular and reusable compiler and toolchain technologies used to create compilers.

CLR, is the virtual machine part of the .NET technologies, that permits to execute different languages transformed in a common intermediate language.

JVM, the Java Virtual Machine that powers the Java execution.

Babel, a JavaScript compiler. It has a very modern structure, based upon plugins, which means that it can be easily adapted, but it doesn’t do anything by default, really, that is what the documentation says. Its creators present it as a tool to support new featuers of JavaScript in old environment, but it can do much more. For instance, you can use to create JavaScript-based languages, such as LightScript.

Articles & Tutorials

Books

Compilers: Principles, Techniques, and Tools, 2nd Edition, this is the widely known Dragon book (because of the cover) in the 2nd edition (purple dragon). There is a paperback edition, which probably costs less but it has no dragon on it, so you cannot buy that. It is a theoretical book, so don’t expect the techniques to actually include a lot of reusable code.

Modern Compiler Implementation in ML, this is known as a the Tiger book and a competitor of the Dragon book. It is a book that teaches the structure and the elements of a compiler in detail. It’s a theoretical book, although it explains the concept with code. There are other versions of the same book written using Java and C, but it’s widely agreed that the ML one is the best.

Engineering a Compiler, 2nd edition, it is another compiler book with a theoretical approach, but that it covers a more modern approach and it is more readable. It’s also more dedicated to the optimization of the compiler. So if you need a theoretical foundation and an engineering approach this is the best book to get.

Interpreters

An interpreter directly executes the language without transforming it in another form.

Articles & Tutorials

A simple interpreter from scratch in Python, a four-parts series of articles on how to create an interpreter in Python, simple yet good.

Let’s Build A Simple Interpreter, a twelve-parts series that explains how to create a interpreter for a subset of Pascal. The source code is in Python, but it has the necessary amount of theory to apply to another language. It also has a lot of funny images.

How to write a simple interpreter in JavaScript, a well organized article that explains how to create a simple interpreter in JavaScript.

Books

Writing An Interpreter In Go, despite the title it actually shows everything from parsing to creating an interpreter. It’s contemporary book both in the sense that is recent (a few months old), and it is a short one with a learn-by-doing attitude full of code, testing and without 3-rd party libraries. We have interviewed the author, Thorsten Ball.

Crafting Interpreters, a work-in-progress and free book that already has good reviews. It is focused on making interpreters that works well, and in fact it will builds two of them. It plan to have just the right amount of theory to be able to fit in at a party of programming language creators.

Section 4: General

This are resources that cover a wide range of the process of creating a programming language. They may be comprehensive or just give the general overview.

In this section we include tools that cover the whole spectrum of building a programming language and that are usually used as standalone tools.

Xtext, is a framework part of several related technologies to develop programming languages and especially Domain Specific Languages. It allows you to build everything from the parser, to the editor, to validation rules. You can use it to build great IDE support for your language. It simplifies the whole language building process by reusing and linking existing technologies under the hood, such as the ANTLR parser generator.

JetBrains MPS, is a projectional language workbench. Projectional means that the Abstract Syntax Tree is saved on disk and a projection is presented to the user. The projection could be text-like, or be a table or diagram or anything else you can imagine. One side effect of this is that you will not need to do any parsing, because it is not necessary. The term Language Workbench indicates that Jetbrains MPS is a whole system of technologies created to help you create your own programming language: everything from the language itself to IDE and supporting tools designed for your language. You can use it to build every kind of language, but the possibility and need to create everything makes it ideal to create Domain Specific Languages that are used for specific purposes, by specific audiences.

Racket, is described by its authors as “a general-purpose programming language as well as the world’s first ecosystem for developing and deploying new languages”. It’s a pedagogical tool developed with practical ambitions that has even a manifesto. It is a language made to create other languages that has everything: from libraries to develop GUI applications to an IDE and the tools to develop logic languages. It’s part of the Lisp family of languages, and this tells everything you need to know: it’s all or nothing and always the Lisp-way.

Articles

Tutorials

Books

Section 5: Summary

Here you have the most complete collection of high-quality resources on creating programming languages. You have just to decide what you are going to read first.

At this point we have two advices for you:

Get started. It does not matter how many amazing resources we will send you, if you do not take the time to practice, trying and learning from your mistake you will never create a programming language If you are interested in building programming languages you should subscribe to our newsletter. You will receive updates on new articles, more resources, ideas, advices and ultimately become part of a community that share your interests on building languages

You should have all you need to get started. If you have questions, advices or ideas to share feel free to write at [email protected]. We read and answer every email.

Our thanks to Krishna and the people on Hacker News for a few good suggestions.