The Magical Power of Attributes in Spirit – Primitives By Hartmut Kaiser

Spirit V2.1 has been released only a couple of weeks ago. The first feedback we have gotten so far was overwhelming! Many people seem to quickly understand the concepts of the new library, being able to apply its power to their needs. We try to follow people’s comments and watch for reported problems. There is one theme recurring over and over again: attributes. This new (for Spirit) concept turns out to be the most difficult idea to understand, especially for somebody starting to work with the library.

While writing the article I realized that it may be too long for a single entry. So, I decided to split this article into several parts which will be published over the next couple of days. The additional benefit of splitting the article is to have more room for detailed descriptions and examples. My plan is to write three articles: about primitives, operators, and non-terminals.

Introduction

So here is the breaking news: all components in Spirit expose their specific attributes, all grammars are fully attributed. This is different from earlier versions of the library, where almost all parsers exposed the pair of iterators pointing to the matched input sequence. Only very few parsers exposed an attribute (most notably the numeric parsers). Additionally, Spirit is not only about parsers anymore, so I will explain the concept of attributes applied to Qi (the parser library) and Karma (the generator library) at the same time.

Generally, the idea to work with attribute grammars is not new. Let me quote Wikipedia:

An attribute grammar is a formal way to define attributes for the productions of a formal grammar, associating these attributes to values. The evaluation occurs in the nodes of the abstract syntax tree, when the language is processed by some parser or compiler. The attributes are divided into two groups: synthesized attributes and inherited attributes. The synthesized attributes are the result of the attribute evaluation rules, and may also use the values of the inherited attributes. The inherited attributes are passed down from parent nodes.

The key to understanding all this is that all elements of a grammar have their associated, synthesized attribute. For parsers (Spirit.Qi) this is the value matched from the input. For generators (Spirit.Karma) this is the value for which to emit output, and as in this case it is not really ‘synthesized’ anymore we call it the consumed attribute.

As quoted above, in attribute grammars all elements additionally may accept inherited attributes. Inherited attributes are additional arbitrary values passed down from parent nodes during parsing or output generation. These are used to pass along additional contextual information. In Spirit almost all components support receiving inherited attributes as well. We will see examples of that later.

Different Spirit components handle attributes differently. Some parser components directly generate their attribute from the input, other parsers combine attributes synthesized by more primitive components based on well defined rules. Generators are similar. Some generators directly consume their attribute converting it into the character stream passed to the output, other, more complex generators, split their attribute into smaller pieces. These smaller pieces serve as consumed attributes of more primitive generator components being invoked as a child. All this happens based on well defined attribute propagation rules. Below we will explain those attribute propagation rules for all different types of components (parsers and generators).

We will use the following functions in the examples below. The first pair of functions will be used to demonstrate different concepts for parsers. They take some input, a parser component and (optionally) the expected attribute. Both will return true if the given input has been successfully matched.

template <typename P> bool test_parser(P const& p, char const* input) { using boost::spirit::qi::parse; char const* f(input); char const* l(f + strlen(f)); return parse(f, l, p); } template <typename P, typename Attr> bool test_parser(P const& p, char const* input, Attr const& expected) { using boost::spirit::qi::parse; char const* f(input); char const* l(f + strlen(f)); Attr attr; return parse(f, l, p, attr) && f == l && attr == expected; }

The second pair of functions are simple wrappers to demonstrate generators. They take the generator, the (optional) attribute to output, and the expected output. The function succeeds if the expected output has been successfully emitted.

template <typename G> bool test_generator(G const& g, char const* expected) { using boost::spirit::karma::generate; std::string s; std::back_insert_iterator<std::string> sink(s); return generate(sink, g) && s == expected; } template <typename G, typename Attr> bool test_generator(G const& g, Attr const& attr, char const* expected) { using boost::spirit::karma::generate; std::string s; std::back_insert_iterator<std::string> sink(s); return generate(sink, g, attr) && s == expected; }

Primitive Components

The primitive components are the basic building blocks for any parser or generator. Primitive parsers are used to match the input stream based on their settings and usually convert the matched data into their attribute. Primitive generators do the opposite: they convert their attribute into a character stream based on their settings. The attribute type is component specific.

For example the primitive parser qi::int_ tries to match a character sequence comprising an C++ integer (int). In case of a match it exposes the matched integer as an int attribute. At the same time the primitive generator karma::int_ expects an int attribute which will be converted into a character sequence. Here are two examples using int_:

assert(test_parser(qi::int_, "1234", 1234)); assert(test_generator(karma::int_, -12, "-12"));

Other primitive components have equally intuitive attributes: char_ has char, double_ has double, and ulong_ has an unsigned long. For a full list of available components and their attributes please see the documentation (Qi parsers and Karma generators).

In addition to the type of the synthesized attribute for parsers and the consumed attribute for generators we have to consider whether a given (user supplied) attribute is compatible with the component. For primitive components this means that the usual C++ convertibility rules apply. For parsers you can use any attribute type as long as the attribute of the component is (C++) convertible to it. For generators any attribute type can be used that is (C++) convertible to the generators consumed attribute. For instance, it is possible to pass a long to an int_ parser (parsing an integer into a long), or to pass a short to an int_ generator (emitting output for a short using the integer component):

assert(test_parser(qi::int_, "1234", long(1234))); assert(test_generator(karma::int_, short(-12), "-12"));

Some primitive components do not expose any attribute. Well, actually these expose a special attribute type spirit::unused_type, but all parts of the library treat this as if the component didn’t expose any attribute at all. The most notable example is the lit() component which takes character, integer, or string arguments. The qi::lit parser matches it’s literal parameter and the karma::lit generator emits it. Both versions do not expose/consume an attribute. We will see later why this is needed and very useful. Here are two small examples:

assert(test_parser(qi::lit("abc"), "abc")); assert(test_generator(karma::lit(-12), "-12"));

Semantic Actions

I’m planning to write an entire article about Spirit’s semantic actions, which are the major means of attaching code to components. Semantic actions give you the ability to execute user defined code either after a parser component has successfully matched its input (qi) or before a generator gets invoked (karma). Their relation to attributes is simple: semantic actions for parsers are getting called with the parser’s synthesized attribute, while those attached to generators are responsible to return the value the generator is supposed to emit output from (the generator’s consumed attribute). The syntax for attaching semantic actions is: qi::int_[…your code here…].

In the context of the articles about attributes I will demonstrate their usage and the implications for attributes based on constructs implemented with Boost.Phoenix. In short:

The Phoenix library enables functional programming (FP) techniques such as higher order functions, lambda (unnamed functions), currying (partial function application) and lazy evaluation in C++.

If you need more information about Phoenix, please refer to the documentation. Let me give you an example:

assert(test_parser(qi::int_[std::cout << qi::_1], "1234")); // 1234 assert(test_generator(karma::int_[karma::_1 = -12], "-12"));

Spirit introduces special constructs allowing to seamlessly build semantic actions with Phoenix. The qi::_1 and karma::_1 used in the example above are Phoenix placeholders referring to the parser’s synthesized attribute (and the generator’s consumed attribute).

The expression std::cout << qi::_1 will be executed after the parser qi::int_ matched its input and, as the placeholder qi::_1 refers to the synthesized attribute, it will print 1234 to the standard output.

Correspondingly, the expression karma::_1 = –12 will be executed before the generator is invoked and, as the placeholder karma::_1 refers to the consumed attribute of the generator, it will emit –12.

Those placeholders represent the corresponding attribute literally, that means they have the type of the attribute they refer to. You can think of the Phoenix expressions inside the semantic actions as of special purpose lambda functions which ‘know’ how to access the attributes of the components they are attached to.

Conclusion

So far no magic has shown up, but we covered the basics needed to understand the attribute handling and propagation rules of Spirit’s operators, directives and non-terminals. Stay tuned!