To answer this question, we should look at two things; any potentially relevant specifications, and what is actually done in the real world. You've already mentioned what the relevant specifications have said on the lang attribute; it is generally used for indicating the human language of the content referenced, not the programming language. While BCP 47 mentions the zxx tag for non-linguistic content, I don't believe that it is really appropriate to use the lang attribute and zxx subtag for specifying the programming language. The reason is that most source code does actually have some linguistic content, which is in a natural language; comments, variable names, strings, and the like. The lang attribute should probably be used to indicate these, especially in cases like use of CJK characters where font selection might be based on the lang attribute. The programming language contained within a code example is really orthogonal to the human language contained within it; conflating the two will likely lead to confusion, not clarity.

So, let's check the specs for an alternative to the lang attribute. As Pekka points out in another answer, the <code> element is more semantically meaningful for marking up source code than the <pre> element, so let's check there. According to the HTML5 spec:

The code element represents a fragment of computer code. This could be an XML element name, a filename, a computer program, or any other string that a computer would recognize. Although there is no formal way to indicate the language of computer code being marked up, authors who wish to mark code elements with the language used, e.g. so that syntax highlighting scripts can use the right rules, may do so by adding a class prefixed with " language- " to the element. ... The following example shows how a block of code could be marked up using the pre and code elements. <pre><code class="language-pascal">var i: Integer; begin i := 1; end.</code></pre> A class is used in that example to indicate the language used.

Now, this isn't a formal specification, just an informal recommendation for how you could use a class to indicate the language represented. The example also shows how to use both a <pre> tag and <code> tag to mark up a block of code.

We can look elsewhere for any sort of standards, but I haven't found any; there are no microformats for code formatting, and I haven't found any other specs that mention it. So, we move on to what people actually do. The best way to discover this is to look at what HTML syntax highlighting libraries do, since they are the main producers and consumers of code embedded in web pages in which the language actually matters.

There are two main types of HTML syntax highlighters; those that run on the server or offline, in Ruby or Python or PHP, and produce static HTML and CSS to be displayed by the browser, and those written in JavaScript, which find and highlight <pre> or <code> elements on the client side. The second category is more interesting, as they need to detect the language from the HTML provided to them; in the first category, you usually specify the language manually through the API or through some mechanism specific to your wiki, blog, or CMS syntax, and so there is no actual consumer of any language information that might be embedded in the HTML. We'll take a look at both categories for the sake of completeness.

For JavaScript syntax highlighters, I've found the following, with examples of their syntax for specifying a code block and its language:

SyntaxHighligher: <pre class="brush: html">...</pre> . Appears to completely ignore how class should be used by introducing its own syntax for class attributes based on CSS syntax with the brush keyword used to indicate the language. Also has an option for using the <script> tag, to make it easier to copy and paste code in without having to escape < , using the same class syntax.

. Appears to completely ignore how should be used by introducing its own syntax for attributes based on CSS syntax with the keyword used to indicate the language. Also has an option for using the tag, to make it easier to copy and paste code in without having to escape , using the same syntax. Highlight.js: <pre><code class="html">...</code></pre> or class="language-html" or the same on <pre> . This gives you several options, one of which corresponds to the recommendation in the HTML5 spec, the other simply uses the bare language name as the class name.

or or the same on . This gives you several options, one of which corresponds to the recommendation in the HTML5 spec, the other simply uses the bare language name as the class name. SHJS: <pre class="sh_html">...</pre> . Uses its own prefix for language names in the class, and only works on <pre> , not other elements.

. Uses its own prefix for language names in the class, and only works on , not other elements. beautyOfCode: <pre class="code"><code class="html">...</code></pre> . Based on SyntaxHighlighter, but with a somewhat less weird syntax. Requires a the <pre> tag with class code and the code tag with a class indicating the language.

. Based on SyntaxHighlighter, but with a somewhat less weird syntax. Requires a the tag with class and the tag with a class indicating the language. Chili: <code class="html">...</code> . Uses just the <code> tag, and uses the bare language as a class name.

. Uses just the tag, and uses the bare language as a class name. Lighter.js: <pre class="html">...</code> . Uses the bare language as a class name. You select the elements it will apply to using the API, but the example demonstrates it on <pre> tags.

. Uses the bare language as a class name. You select the elements it will apply to using the API, but the example demonstrates it on tags. DlHighlight: <pre name="code" class="html">...</pre> . Uses the bare language as a class name. You choose via the API what type of element to highlight (the example used pre ) and the value of the name attribute to look for to indicate that you want syntax highlighting. I believe that this is an abuse of the name attribute.

. Uses the bare language as a class name. You choose via the API what type of element to highlight (the example used ) and the value of the attribute to look for to indicate that you want syntax highlighting. I believe that this is an abuse of the attribute. google-code-prettify: <pre class="prettyprint lang-html"> . Uses class names prefixed with lang- to specify the language, and the class prettyprint to indicate that you want syntax highlighting. The language class is optional; it will try to auto-detect the language if not specified.

. Uses class names prefixed with to specify the language, and the class to indicate that you want syntax highlighting. The language class is optional; it will try to auto-detect the language if not specified. JUSH: <code class="jush-html">...</code> or <code class="language-html">...</code> . Uses the code tag, with languages in a class prefixed by jush- or language- .

or . Uses the tag, with languages in a class prefixed by or . Rainbow: <pre><code data-language="javascript">...</code></pre> uses the custom attribute data-language , applied to either a <code> element, or a <pre> element, in order to support sites like Tumblr which strip out <code> elements.

uses the custom attribute , applied to either a element, or a element, in order to support sites like Tumblr which strip out elements. Prism: <pre><code class="language-css">...</code></pre> follows the HTML5 spec for nested <pre> and <code> , and the recommendation for the class name.

For server-based and offline syntax highlighters, the majority (CodeRay, UltraViolet, Pygments, Highlight) do not embed any language information in the HTML they output at all. GeSHi is the only one I found that embeds the language, as <pre class="html">...</pre> , a <pre> tag with a bare language name as the class.

Out of that list, there seems to be no real consensus. The most popular option is just using the bare language name as a class. The next most popular is using some form of prefixed language name, either prefixed by the library name, lang- , or language- . There are a few that have their own strange conventions, or don't specify the language in the HTML at all.

While the only thing common enough to be a de-facto standard is using the bare language name as a class, I would recommend going with what the HTML5 spec recommends, a class name of language- followed by the name of the language. This is supported by a few of the syntax highlighters, the rest could probably be easily modified to support it. It is less ambiguous and less likely to conflict with other classes than just the bare language name as a class. And, even if not formally specified, it is at least mentioned in a spec.

I would also use the <code> tag to indicate source code, either bare or embedded in a <pre> tag; the combination of a <code> tag and language- prefixed class can be used to indicate that you have source code in a particular language, and could be used to indicate you want it to be highlighted, and is clearer and better matches the semantics of the elements than some of the other indicators used by syntax highlighting libraries. For cases in which a <code> tag can't be used, such as embedding in sites that accept only a limited HTML subset like Tumblr, just using the <pre> tag with the same class convention is probably best.

edit to add: The CommonMark specification, which attempts to standardize Markdown so that implementations can be interoperable, producing the same HTML given the same input, has also adopted this suggested convention. It adds fenced code blocks to Markdown, surrounded with ``` or ~~~ , which can be easier to use than indentation based code blocks. Immediately following the opening fence can be an info string, which is defined as:

An info string can be provided after the opening code fence. Opening and closing spaces will be stripped, and the first word, prefixed with language- , is used as the value for the class attribute of the code element within the enclosing pre element.

It can be instructive also the check what actual implementations do. Trying out a fenced code block on Babelmark shows that of those implementations that support fenced code blocks (not all do as it's an extension to the original Markdown), we see the following breakdown:

showdown, blakfriday, haskell markdown: <pre><code class="python">...</code></pre>

marked: <pre><code class="lang-python">...</code></pre>

commonmark, parsedown, cebe/markdown: <pre><code class="language-python">...</code></pre>

cheapskate, minima: <pre class="python">...</pre>

pandoc: <div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python">...</code></pre></div> (quite the overkill)

(quite the overkill) Maruku: <pre class="python"><code class="python">...</code></pre>

Looking at other document markup languages that convert to HTML and have some understanding of code blocks:

AsciiDoc: <pre>...</pre> ; simply uses Pygments to highlight and does not include language information in the HTML.

; simply uses Pygments to highlight and does not include language information in the HTML. rst2html gave me <pre class="code python literal-block">...</pre> , highlighted with Pygments.

gave me , highlighted with Pygments. Sphinx: <div class="highlight-python"><div class="highlight"><pre>...</pre></div></div> , also highlighted with Pygments.