Introduction to Abstract Syntax Trees for Web Developers

Abstract Syntax Trees (ASTs) are not a common topic for discussion among web developers, as we are often so busy trying to determine which tools we should use that we don’t dive very deeply into how any of these tools work. In this post, I will try to help you come to a basic understanding of what ASTs are, why you might care about them as an application developer, and walk through an example of how I’m attempting to use this knowledge to make web development easier.

What is an Abstract Syntax Tree?

An AST is a data structure used to represent code. In order for our javascript code to run, it has to be compiled or interpreted by another program. This program, some kind of compiler or an interpreter, will translate our high level javascript code into machine instructions which can be executed by our computer’s CPU. To do this translation, it will first parse your code into an AST. So why is this step necessary, and what information does the final AST actually contain?

To answer these questions, we should look at it from the perspective of a compiler author. As the author of a compiler, we need our compiler to work for any javascript code that is valid as defined by the language specification. The clearest way to accomplish this is by abstracting from the specific javascript code that is given as input. We can do this by considering what all valid javascript code has in common. At the highest level, all javascript code basically consists of a collection of declarations (i.e. variable and function declarations) and expressions. It is quite common for a declaration or expression to somehow encapsulate other declarations or expressions, which is how we arrive at using a tree structure to represent our code. For example, a function declaration might contain another function declaration and so on. So at the most fundamental level, an AST is simply a hierarchy of nodes where each node has a type and may also have descendant nodes.

To help us further understand how an AST is created and what kind of information it contains, let’s walk through what happens when the parser encounters a simple statement like const myVar = 30. First, the parser reads the keyword const. At this point it will create a new node in the AST with the type VariableDeclaration. It will also specify what kind of variable declaration it is. In this case, it’s a const declaration, but it might also be a let or var declaration. Now that the parser knows that this is a variable declaration, it will use the next symbol, myVar, for the declaration’s identifier. This will be stored as an additional property on the node object that was just created. Finally, the parser will look for the initial value of the variable and store that in the node along with the node type and the variable id. In this case, the initial value is the numerical expression 30. We’ll end up with a data structure like this:

When an entire program is parsed in this manner, it makes it much easier for us to do useful operations on our code. For example, we could check whether a variable declaration has the same identifier as a previous declaration in the same scope and, if so, warn the author.

Another important detail about ASTs is that while they are often used only to analyze code, they can also be used to modify and generate code. Just as your code can be parsed into an AST, an AST can be translated back into code. This is an extremely powerful characteristic of ASTs that is used in compilers and transpilers in order to convert code from one language to another. This is essentially a 3 step process: parse the source code into an AST, convert this AST to an AST in the target language, and finally convert the resulting AST back into code.

As application developers, why should we care about ASTs?

Now that you have a better idea of what an AST is, you might be wondering why, as application developers, we should even concern ourselves with this topic. After all, we’re primarily writing application code, and chances are you’ve never had to work on an application that takes external code as input. While such cases may come up, the importance of ASTs in our day to day work lies primarily in our tooling. Many important tools, such as babel and eslint, make use of ASTs under the hood. Babel is widely used to transform code that’s written to the latest javascript specs into code that’s compatible with browsers that have not yet implemented those specs. It does this by first parsing your code into an AST, and then replacing any nodes which don’t exist in the target javascript version with functionally equivalent nodes. It can then use this modified AST to generate new code that does the same thing as the source code but uses older syntax. Eslint allows developers to create rules about syntax and style in a project which will be automatically enforced. It does this by parsing code into an AST and searching for nodes or groups of nodes that violate the rules. Using eslint is a great way to keep the code in your project consistent and readable even when many developers are involved.

So your tools make use of ASTs, but you don’t develop tools, you develop applications. This may be true, but there is value in understanding how your tools work in case things go wrong, and having the potential to develop your own tools when it would save you time gives you a serious advantage. Granted, most of the time you’ll want to stick to convention in order to make things easy for other developers to understand, but as you tackle more projects you are more likely to come across a scenario where developing a project-specific tool might save a huge amount of time. You’ll also have the opportunity to contribute to the tools that you use on a daily basis in order to make them work better for you and your team. In the end, being able to treat your javascript tooling as something that you can shape to your needs as opposed to a black box is only going to make your life as an application developer easier.

An example use case

To demonstrate a possible use of ASTs in web development, I’ve written a component tree visualizer for React. This project is intended to grow into a visual editor for React components, but for now it only allows you to view the component hierarchy of your project in your browser.

The project has both front end and backend components which you are free to look over and run in one of your own React projects. There are certain limitations which may stop some of your components from being added to the tree, but for most apps it will do just fine.

You can find all the code concerning the component tree generation in src/componentTreeGenerator.js in the backend repo.

The high level approach that I took for the tree generator can be outlined as follows:

I. Find all the React components in the target project, and store the file path to each one
A. Testing whether a file is a React component
1. Check whether the file exports a class that extends React.Component
2. Check whether the file exports a function with a single argument, props
II. Iterate over the components and generate a component tree for each one. If a tree appears in another tree, remove it from the top level.
A. Tree generation
1. Find other React components from the project that are imported by the current component
2. Find the current component’s render function
3. If a component is rendered by this component, make a recursive call to the tree generator and add the result to the current node’s children

To complete the lower level steps, such as checking whether a given file exports a class that extends React.Component, we should first parse the file into an AST to help us make sense of it. Writing our own parser would be quite a challenge, so it’s probably best to start with an existing one. Luckily, babel provides their parser in the module @babel/parser. It’s very simple to use. Take a look the getFileAST function in componentTreeGenerator.js.

  async function getFileAST(filePath) {
    const code = await readfile(filePath, 'utf8');

    return babelParser.parse(code, {
      sourceType: 'module',
      plugins: ['jsx'],
    });
  }

All we have to do is read our file into a variable and then pass it to the babel parser with a few options. Since we’re dealing with React components, we include the jsx plugin.

While I’m not going to walk through every step of the tree generation process, I’ll break down the first step so that you start to get the idea and can continue on your own if you’d like. For this step, our goal is to find other React components from the target project that are imported by the component that we’re currently analyzing. We start by observing that the other React components in the project will (in most cases) be imported using a relative path, whereas external modules and components will imported using absolute paths. Relative paths always start with a ., so we need to look for import declarations where the source of the import is a path that starts with a ..

After parsing the file into an AST, we will see that the top level node has the type File, and that this file node contains a program property. The value of this property turns out to be another node of type Program which contains a body property. The program body is an array of all the top level declarations in our program, and luckily all import declarations are going to occur at the top level. That means we can get the local imports in the file with some very simple code:

  function getLocalImports(programBody) {
    return programBody.filter(node => (
      node.type === 'ImportDeclaration'
      && node.source
      && node.source.value.length > 0
      && node.source.value[0] === '.'
    ));
  }

We filter all the top level nodes so that we keep only that ones that are import declarations with a non-null, non-empty, source that begins with the character .. Not so hard, right?

If you decide to do your own experiments with ASTs, it’s important to have a good way to visualize and explore some example ASTs so that you understand what your code will need to do. Since the babel parser outputs a javascript object, you can easily write this to a file as json and then use any json viewer to explore the AST.

I hope you found this post helpful or at least interesting. I plan to write another post about code generation using ASTs after I make more progress on my React component editor. Happy coding!