3. Handmade Interpreter - Parsingby Hasib

Lecture thumbnail 0:00 / 0:00 In the previous part of our handmade interpreter.

What we did is we made a lexer.

So a lexer is something which takes a string input and turns it into a sequence of tokens.

And now we have to turn the sequence of tokens into some object oriented representation of the expression.

Now if you think about this expression, it’s really a binary expression.

So for example, 13 plus four is a binary operation.

It has the left hand side, which is 13.

It has the right hand side, which is four.

It has some operator in the middle, which is plus.

And then we have a binary expression here in the minus composed of two binary expressions on the left

and right hand side.

So you can see it’s kind of a recursive definition.

And we’re going to define all the object oriented structures for actually representing these kinds of

expressions.

We’re going to begin by defining an interface.

This interface is going to be called I element.

And the point of this interface is to define some operation which is common to all the types of expressions.

So what is common to both addition as well as individual tokens?

Because we’re going to have individual tokens like 13 represented as literals, and we’re also going

to have binary expressions like 13 plus four.

And the common thing between a single literal as well as an expression is that they all evaluate to

a numeric result.

So we’re going to have in the interface a property get called value.

So here is a getter for property called value and everyone has to implement this.

Now we’re only going to have two types of elements in our case, the first is an integer like 14, for

example.

So we’re going to make a class called Integer, and this class is going to implement this I element

interface.

Now the whole thing about an integer is it just keeps a number so we can get rid of this here, we can

get rid of all this stuff and we can just put a getter in here and then we can initialize the value

in the constructor as well.

And that’s it.

That’s all that an integer actually does.

What’s more interesting, of course, is a binary operation because in this case we have to specify

whether it’s addition or multiplication, and we have to specify the left and right hand side.

So I’m going to make a class here called Binary Operation, which is also going to be an I element.

I element like so.

And let’s once again implement the interface.

But this time around, things are going to be a bit more exciting because essentially what we have is,

first of all, we have to define the type.

So I’m going to make an enum here called Type, and we have two types of binary operations.

We have addition and subtraction depending on whether we have the plus or minus.

Now of course, we have the type public type, my type.

Notice I cannot make it type.

Unfortunately, I cannot call it type.

It has to be my type or the type or something like that.

In addition, we have the left and right hand side of this binary operation so public I element and

we have left and right.

Okay, so I’m not going to make a constructor here.

We’re going to keep the default constructor because remember, when you are at the very top level,

when you are just interpreting the expression at the very top level, you don’t know what the left and

right hand sides are.

You don’t know what the type is.

So we need a default constructor.

I’m going to omit having any constructors so that I get an automatically generated default constructor

for me.

Now the key thing here is how to calculate the value.

How do we calculate this value?

Now we know what kind of expression it is.

It’s either addition or subtraction, and we know the left and right hand sides and left and right hand

sides are I elements, which means they have values of their own.

So essentially we can do a switch here.

We can do a switch on my type.

Like.

So that’s an enum.

So we can alt enter and generate the missing labels.

Now in the case of addition, we return left dot value plus right dot value.

And in the case of subtraction, well, you’ve guessed it, it’s left value minus right dot value,

and that’s pretty much it.

So we’ve now supported two types of nodes in our abstract syntax tree.

We have the binary operation which can be either addition or subtraction, and we have an integer which

just keeps a number.

Notice it’s an integer, not a string.

We’re not keeping the string representation, although we could keep that as well in the abstract syntax

tree if we wanted to, but we’re going to ignore it in this case.

So now the question is, well, given that we have a list of tokens, how do we turn it into a top level

binary expression which contains subexpressions and subexpressions and so on and so forth, this is

actually difficult to pronounce.

Okay, so let’s implement the parse method that’s going to be a bit more complicated than what we have

in the Lex method.

So we’re going to have a static method which returns an element obviously, and it’s going to be called

parse.

And what it does is it takes a set of tokens.

So an I list of tokens or maybe you want an I read only list of tokens.

So we’re going to keep it read only so that nobody tries to modify this list.

So here is a list of tokens.

And now we have to go through every single one.

And depending on what kind of token it is, we perform a certain operation.

So we have Var result and the result is obviously a binary operation.

In our case, if you look at this expression, you can see that there is a top level minus here.

So that’s a binary operation.

So we’re going to have new binary operation using that default constructor that was auto generated for

us.

And then we will also have a Boolean flag indicating whether the left hand side of this binary operation

has already been initialized or not.

So we’ll have bool have left hand side equals false to begin with.

Okay, so now I’m going to make a for loop where I go through each of the tokens.

So from zero to tokens dot count, I have this counter I and I’m going to get the token itself.

Var token equals tokens at I like so and then I will check the token type.

So I will switch on the token dot my type and I’m going to once again alt enter and generate the switch

labels.

So here are the labels, the kind of tokens we can get.

Our integer plus minus left, parentheses and right parentheses.

Although I’m going to get rid of right parentheses, you’ll see why in a moment.

So in the case when we have an integer, so we have an integer token, remember the integer token keeps

the actual value.

As a string, we have to parse it.

So we have to say var integer equals new integer.

So that’s our AST element.

And here we have to parse, we have to say int, dot parse, and we take the token and we take from

the token the actual text.

So we parse the text, we make a new integer.

And then of course we need to decide where this integer goes.

So if we have, if we don’t have the left hand side of the expression, then we put it as the left hand

side of the expression.

So we say result dot left, result, dot left is equal to integer.

And then we have the right hand, left hand side rather.

So now we have the left hand side.

Otherwise if we already have the left hand side, then we said the right hand side.

So we say result dot right equals integer.

There we go.

So this is how we interpret an integer coming in.

Now for the plus and minus, things are really easy because all we have to do is we have to set the

type of the result.

So we say result dot my type equals and then we do addition here.

So if it’s a plus, it’s obviously addition and let’s put a break in here.

And similarly, in the case of a minus, it’s subtraction.

So let’s get rid of this and add subtraction in here instead.

Okay.

So here is the challenging part.

What happens when you hit an opening parentheses?

Now, what I’m going to do is I’m going to parse until I find the closing parentheses and then I’ll

take the sub expression and parse it recursively.

So I’ll define a new variable called J with an initial value of I, and then I make a loop where there

is no initialization.

But then I go through all the values of J in the number of tokens.

So I’ll go through each of the tokens using the index.

J So plus plus J here.

And I say, Well, is it a closing bracket?

Is it the right parenthesis?

If tokens at J dot my type is in fact equal to the right parentheses, then what I’m going to do is

I’m going to break out of this loop like so.

And now my j variable points to the end of the loop, so it points to the closing round bracket.

And now what I can do is I can make a sub expression, which is a parse of a sub list starting at the

opening bracket and terminating at the closing bracket.

So I can say var sub expression.

Equals and then I can get the tokens, I can skip the initial number of tokens up to the current one,

and that has to include the opening bracket as well.

So that would be I plus one because the token at I happens to be the opening bracket.

And then of course I have to take the number of tokens at J minus I and also minus one because I don’t

want the closing bracket either.

So it’s J minus I and minus one like so.

Okay so we have this I enumerable, I’m going to turn it into a list and now we have a subexpression

list.

So once again a list of tokens and that list can be parsed.

So we can say var element equals and we parse that Subexpression we parse that Subexpression And then

of course we perform the same operation that we did here.

So we check whether we have the left hand side.

I’m going to do it like this.

So we check if we have the left hand side and if we don’t, then instead of assigning integer, we assign

the element.

Otherwise we assign the element here, assign the element here.

And I’m going to put a break.

And of course, we already have the default or rather, no, let’s keep the default as is.

So essentially we’re not handling right parentheses because in the left parentheses we find the right

parentheses anyway and we handle it.

We kind of skip it.

And the way to skip it, by the way, should be down here below.

So we set I equal to the value of J, thereby continuing this iteration of all the different tokens.

And here towards the end we return the result.

So we return the result and we can actually try and look at the structure of what it is that we are

parsing.

So I’m going to say var parsed equals parse the tokens like so I’ll put a break point here.

So let’s put F9 in here and let’s actually do a debug as opposed to just executing this.

And we’re going to take a look at what all the tokens have actually been parsed into, provided of course

that I typed the program correctly, which I hope I did.

All right.

So we are now in a breakpoint and we have passed our expression.

So let’s take a look at what we’re getting.

So the parse expression is a binary expression, which has the left and right hand sides, and the type

is subtraction.

Just like above you can see minus the left part is addition, which is of a left integer and a right

integer.

So the left integer has the value 13, the right integer has the value four So everything is correct.

We have this nice layered structure of expressions, and what we can do now is we can actually evaluate

this because remember, we define this interface.

We defined an interface called I element, which yields an integral value, and then the integer just

yields its own value, whereas a binary expression performs the appropriate calculation.

So we can now calculate the value of this expression after having interpreted this expression and we

can output it to the command line.

So let’s write line and I’m going to say that the input is equal to parsed dot value.

That’s it.

Now let’s execute this.

And here is the final result.

13 plus four, 17.

12 plus one is 13.

17 -13 is in fact equal to four.

So that’s it.

We’ve used the interpreter design pattern and handcrafted both a lexer as well as a parser to interpret

a numeric expression and evaluate its result.

Play Play Stop Play Start Play information alert

Src Pro

Explorer

3. Handmade Interpreter - Parsing

Graph View

Backlinks