How to Write a Emacs Major Mode for Syntax Coloring

This page shows you how to write a emacs major mode to do syntax coloring of your own language.

syntax color your own language

Problem

You are writing a major mode for a new language. You want keywords of the language syntax colored.

Suppose your language source code looks like this:

Sin[x]^2 + Cos[y]^2 == 1 Pi^2/6 == Sum[1/x^2,{x,1,Infinity}]

You want the words “Sin”, “Cos”, “Sum”, colored as functions, and “Pi” and “Infinity” colored as constants.

Solution

Save the following in a file.

( setq mymath-highlights '(( "Sin\\|Cos\\|Sum" . font-lock-function-name-face) ( "Pi\\|Infinity" . font-lock-constant-face))) ( define-derived-mode mymath-mode fundamental-mode "mymath" "major mode for editing mymath language code." ( setq font-lock-defaults '(mymath-highlights)))

Now, copy and paste the above code into a buffer, then Alt + x eval-buffer .

Now, type following code into a buffer:

Sin[x]^2 + Cos[y]^2 == 1 Pi^2/6 == Sum[1/x^2,{x,1,Infinity}]

Now, M-x mymath-mode, you see words colored.

How Does it Work?

The string "Sin\\|Cos\\|Sum" is a regex, the font-lock-function-name-face is a predefined variable that holds the value for the default font and coloring spec used for function keywords.

[see Elisp: Regex Tutorial]

The line define-derived-mode defines your mode, named “mymath-mode”, based on the fundamental-mode . fundamental-mode is the most basic mode.

The line (setq font-lock-defaults '(mymath-highlights)) sets up the syntax highlighting for your mode.

Writing a Mode for a Language that Has Hundreds of Keywords

Typically, a language has hundreds of keywords. Elisp has a way to generate regex for your keywords.

Suppose you are writing a mode for the Linden Scripting Language (LSL). LSL has about 553 keywords. First, here's a sample of LSL source code so you get some idea of how we want it colored.

integer score = 0; string mySay = "i ♥ you" ; vector v = <3,4,5>; list myList= [2,4,7,3]; integer sum( integer a, integer b) { integer result = a + b; return result; } default { state_entry () { llSay (0, mySay); } touch_start ( integer total_number) { if (score == 1) { llSay (0, mySay); } else { llWhisper (0, "Ouch!" ); } } }

Each type of keyword uses a different color.

Here's the code.

( setq mylsl-font-lock-keywords ( let* ( (x-keywords '( "break" "default" "do" "else" "for" "if" "return" "state" "while" )) (x-types '( "float" "integer" "key" "list" "rotation" "string" "vector" )) (x-constants '( "ACTIVE" "AGENT" "ALL_SIDES" "ATTACH_BACK" )) (x-events '( "at_rot_target" "at_target" "attach" )) (x-functions '( "llAbs" "llAcos" "llAddToLandBanList" "llAddToLandPassList" )) (x-keywords-regexp ( regexp-opt x-keywords 'words)) (x-types-regexp ( regexp-opt x-types 'words)) (x-constants-regexp ( regexp-opt x-constants 'words)) (x-events-regexp ( regexp-opt x-events 'words)) (x-functions-regexp ( regexp-opt x-functions 'words))) `( (,x-types-regexp . font-lock-type-face) (,x-constants-regexp . font-lock-constant-face) (,x-events-regexp . font-lock-builtin-face) (,x-functions-regexp . font-lock-function-name-face) (,x-keywords-regexp . font-lock-keyword-face) ))) ( define-derived-mode mylsl-mode c-mode "lsl mode" "Major mode for editing LSL (Linden Scripting Language)…" ( setq font-lock-defaults '((mylsl-font-lock-keywords)))) ( provide 'mylsl-mode)

Note that the highlighting mechanism of font-lock-defaults is based on first-come-first-serve basis. Once a sequence of characters is colored, it won't be changed. So, the order of your list is important. In general, put longer length keywords first. (this won't fix all cases where a keyword matches part of other keywords. If your language has a lot such keywords, you need to use other forms to solve this problem. (info "(elisp) Search-based Fontification"))

The `( ,a ,b …) is a lisp special syntax to evaluate parts of elements inside the list. Inside the paren, elements preceded by a , will be evaluated.

In the above, we based our mode on c-mode , because the syntax is similar. Basing on a similar language's mode will save you time in coding many features, such as handling comment and indentation.

The line:

( provide 'mylsl-mode)

adds the symbol mylsl-mode to the variable features list. [see Elisp: provide, require, features]

Now, to run the code, Alt + x eval-buffer . [see Evaluate Emacs Lisp Code]

Open the LSL language sample file given above, then Alt + x mylsl-mode . Here's the result:

sample mylsl-mode syntax highlighting result.

Complex Syntax Coloring

For many languages, the syntax coloring are not fixed set of strings. For example, in XML, you have <xyz>…</xyz> pattern where the “xyz” can be anything.

emacs html-mode syntax coloring screenshot

Font Lock Mode Basics

To handle more complex syntax coloring, continue to

Elisp: Font Lock Mode Basics