Below is an example of the final product:

Note how we call the jsx() function. Here’s a fiddle for you to try.

Before we go ahead and rush to implementing the parser let’s understand what we’re aiming for. JSX simply takes an HTML-like syntax and transforms it into nested React.createElement() calls. What makes JSX unique is that we can use string interpolation within our HTML templates, so we can provide it with data which doesn’t necessarily has to be serialized, things like functions, arrays, or objects.

So given the following code:

Raw JSX input.

We should get the following output once compiling it with Babel:

JSX compilation output.

Just aquick reminder — the compiled result should be used internally by ReactDOM to differentiate changes in the virtual DOM and then render them. This is something which is React specific and has nothing to do with JSX, so at this point we have achieved our goal.

Essentially there are 3 things we should figure out when parsing a JSX code:

The name / component of the React element.

The props of the React element.

The children of the React element, for each this process should repeat itself recursively.

As I mentioned earlier, it would be best if we could break down the code into nodes first and represent it as an AST. Looking at the input of the example above, we can roughly visualize how we would pluck the nodes from the code:

Analyzing the JSX code.

And to put things simple, here’s a schematic representation of the analysis above:

A schematic representation of the analysis.

Accordingly, we’re gonna have 3 types of nodes:

Element node.

Props node.

Value node.

Let’s decide that each node has a base schema with the following properties:

node.type — which will represent the type name of the node, e.g. element , props and value . Based on the node type we can also determine that additional properties that the node’s gonna carry. In our parser, each node type should have the following additional properties:

Node type schemas.

node.length —which represents the length of the sub-string in the code that the node occupies. This will help us trim the code string as we go with the parsing process so we can always focus on relevant parts of the string for the current node:

Any time we parse a small part of the string, we slice the part we’ve just parsed.

In the function that we’re gonna build we’ll be taking advantage of ES6’s tagged templates. Tagged templates are string literals which can be processed by a custom handler according to our needs (see MDN docs).

So essentially the signature of our function should look like this:

JSX function signature

Since we’re gonna heavily rely on regular expression, it will be much easier to deal with a consistent string, so we can unleash the regexp full potential. For now let’s focus on the string part without the literal, and parse regular HTML string. Once we have that logic, we can implement string interpolation handling on top of it.

Starting with the core — an HTML parser

As I already mentioned, our AST will be consisted of 3 node types, which means that we will have to create an ENUM that will contain the values element , props and value . This way the node types won't be hardcoded and patching the code can be very easy:

Since we had 3 node types, it means that for each of them we should have a dedicated parsing function:

Each function creates the basic node type and returns it. Note that at the begnning of the scope of each function I’ve defined a couple of variables:

let match - which will be used to store regular expression matches on the fly.

- which will be used to store regular expression matches on the fly. let length - which will be used to store the length of the match so we can trim the JSX code string right after and accumulate it in node.length .

For now the parseValue() function is pretty straight forward and just returns a node which wraps the given string.

We will begin with the implementation of the element node and we will branch out to other nodes as we go. First we will try to figure out the name of the element. If an element tag opener was not found, we will assume that the current part of the code is a value:

Up next, we need to parse the props. To make things more efficient, we will need to first find the tag closer so we can provide the parseProps() method the relevant part of the string:

Now that we’ve plucked the right substring, we can go ahead and implement the parseProps() function logic:

The logic is pretty straight forward — we iterate through the string, and each time we try match the next key->value pair. Once a pair wasn’t found, we return the node with the accumulated props. Note that providing only an attribute with no value is also a valid syntax which will set its value to true by default, thus the / *\w+/ regexp. Let's proceed where we left of with the element parsing implementation.

We need to figure out whether the current element is self closing or not. If it is, we will return the node, and otherwise we will continue to parsing its children:

Accordingly, we’re gonna implement the children parsing logic:

Children parsing is recursive. We keep calling the parseElement() method for the current substring until there's no more match. Once we've gone through all the children, we can finish the process by finding the closing tag:

The HTML parsing part is finished! Now we can call the parseElement() for any given HTML string and we should get a JSON output which represents an AST, like the following:

{

"type": "element",

"props": {

"type": "props",

"length": 20,

"props": {

"onclick": "onclick()"

}

},

"children": [

{

"type": "element",

"props": {

"type": "props",

"length": 15,

"props": {

"src": "icon.svg"

}

},

"children": [],

"length": 18,

"name": "img"

},

{

"type": "element",

"props": {

"type": "props",

"length": 0,

"props": {}

},

"children": [

{

"type": "value",

"length": 4,

"value": "text"

}

],

"length": 12,

"name": "span"

}

],

"length": 74,

"name": "div"

}

Leveling up — string interpolation

Now we’re gonna add string interpolation on top of the HTML string parsing logic. Since we still wanna use the power of regexp at its full potential, we’re gonna assume that the given string would be a template with placeholders, where each of them should be replaced with a value. That would be the easiest and most efficient way, rather than accepting an array of string splits.

[

"<__jsxPlaceholder>Hello __jsxPlaceholder</__jsxPlaceholder>",

[MyComponent, "World", MyComponent]

]

Accordingly, we will update the parsing functions’ signature and their calls, and we will define a placeholder constant:

Note how I used the Date.now() function to define a postfix for the placeholder. This we can be sure that the same value won't be given by the user as a string (possible, very unlikely). Now we will go through each parsing function and we'll make sure that it knows how to deal with placeholders correctly. We will start with the parseElement() function.

We will add an additional property to the node called: node.tag . The tag property is the component that will be used to create the React element. It can either be a string or a React.Component. If node.name is a placeholder, we will be taking the next value in the given values stack:

We also made sure that the closing tag matches the opening tag. I’ve decided to “swallow” errors rather than throwing them for the sake of simplicity, but generally speaking it would make a lot of sense to implement error throws within the parsing functions.

Up next would be the props node. This is fairly simple, we’re only gonna add an additional regexp to the array of matchers, and that regexp will check for placeholders. If a placeholder was detected, we’re gonna replace it with the next value in the values stack:

Last but not least, would be the value node. This is the most complex to handle out of the 3 nodes, since it requires us to split the input string and create a dedicated value node out of each split. So now, instead of returning a single node value, we will return an array of them. Accordingly, we will also be changing the name of the function from parseValue() to parseValues() :

The reason why I’ve decided to return an array of nodes and not a singe node which contains an array of values, just like the props node, is because it matches the signature of React.createElement() perfectly. The values will be passed as children with a spread operator ( ... ), and you should see further this tutorial how this well it fits.

Note that we’ve also changed the way we accumulate children in the parseElement() function. Since parseValues() returns an array now, and not a single node, we flatten it using an empty array concatenation ( [].concat() ), and we only push the children whose contents are not empty.

The grand finale — execution

At this point we should have a function which can transform a JSX code into an AST, including string interpolation. The only thing which is left to do now is build a function which will recursively create React elements out of the nodes in the tree.

The main function of the module should be called with a template tag. If you went through the previous step, you should know that a consistent string has an advantage over an array of splits of strings, since we can unleash the full potential of a regexp with ease. Accordingly, we will take all the given splits and join them with the placeholder constant.

['<', '> Hello ', '</', '>'] -> '<__jsxPlaceholder>Hello __jsxPlaceholder</__jsxPlaceholder>'

Once we join the string we can create React elements recursively: