0:00 / 0:00 We are going to start rolling our own interpreter for a very simple language consisting of simple mathematical

expressions.

So we’re just going to have brackets, numbers, pluses and minuses, and we’re going to try and parse

those expressions and evaluate them as well.

So I have split this whole implementation into two parts, and in the first part we’re going to build

the lexer.

So the lexer is something which operates on individual tokens and you might be forgiven for thinking

that a single token maps to a single character, but that’s not exactly the case.

And also we need more information about the tokens as well.

So that’s what we’re going to start with effectively.

So let’s begin by defining some sort of token class.

So I’m going to define a class called Token, and that’s going to be mainly an enumeration of the kind

of tokens that we can have.

So the kind of expressions we’re going to be parsing are something like the following.

So we’re going to have input similar to, let’s say, 13 plus four.

I forgot the double quotes here, 13 plus four -12 plus one.

So something like this.

So we need to be able to split this string into separate tokens.

So we’re going to have a token for the opening bracket.

We’re going to have a token for 13.

Notice, it’s not a single character.

We’re keeping the characters which are related to a number together.

Then there’s going to be a token plus, then the token for and so on and so forth.

So as you may have guessed, we need to write some sort of string processing algorithm to split this

into separate tokens.

So speaking of the tokens, the way you would typically define them is as follows.

First of all, you can have an enumeration of the different token types.

So here I’ll have an enum called Type, and the kind of types you can have are integer plus minus left,

parentheses and right parenthesis.

So these are the token types and we can have a public type, my type here.

We can also have the actual text inside the token because that’s what we’re going to need later on to

actually evaluate the whole thing so we can have public string text.

Once again, I’m using public fields just for the sake of simplicity.

Let me make a constructor for this token type and I will once again simplify this part.

And then what we’re going to do is we’re going to override two string.

So let’s make an override of two string.

And here I’m just going to return essentially the text with back quotes in it.

So just take the text.

It should be a capital C and return it in backwards.

So this way we have a token, we have a selection of types that’s going to be set here for my type and

we’re going to have the token text fairly obvious stuff.

So the real problem here is how to get this list of tokens from this input string here.

And for that we’re going to define a method.

So here we’re going to have a method called Lex, and this is the Lexing part of interpretation.

And for this method we’re going to return a list of tokens.

All this stuff is going to be a static method, by the way.

So static list of tokens is going to be called Lex and it would take a string as input.

Okay.

So the idea is that generally if we find something like an opening bracket or a plus or a minus, that’s

easy.

The only real problem is joining the integer parts together and making sure that one and three do not

become separate tokens but become a single token.

So let me show you how this works.

So we’re going to go through a for loop going through each of the characters.

So for input length, we’re going to go through each of the characters and we’re going to switch input

at this particular character, input at Position I and we’re going to check what the input is.

Now remember, we’re returning a list of tokens.

And by the way, this probably should be a list systems collection, generic import here.

So we need a result which is going to be a new list of tokens.

You can use an enumerable here if you want.

You can yield return things and at the end we will return this result.

Okay, so we switch on the input and for example, when you meet a plus, so when the character is a

plus, you say result dot add and you add a new token where the token type is plus and the actual text

is also a plus.

So this is how you implement the lexing for a plus.

And in exactly the same way you would implement the Lexing for minus and the opening and closing parentheses.

So let me paste in all of this code.

So as you can see for all of these, we just add a new token to the result.

We specify the type and we specify the text.

Nothing particularly original here.

I don’t think the only real problem is here in the default case because the default case is where you’ve

hit.

You’ve hit an integer.

You.

Have hit a single digit, but you want to group the digits together.

So how do you do it?

Well, you make a stringbuilder.

Var SB equals new stringbuilder and you initialize the Stringbuilder by the way.

Stringbuilder.

You initialize the Stringbuilder with a character you’re currently on.

So we’re currently on a number already.

So we’re going to take this number which is input at I, we’re going to turn it into a string and that’s

going to be the initial state of the Stringbuilder And then of course we go through all the other characters.

So we say int J equals I plus one J is less than input dot length plus plus J.

And what we do is we say, okay, if this character is a digit, then we continue adding it to the stringbuilder.

So input at J.

If it’s a digit, then we add it to the Stringbuilder.

So we say SB dot append and we say input at J.

And then of course we increment I because remember I is that outer variable and we’re kind of going

through the digits.

So we need to increment I in addition to incrementing J otherwise we are done.

Otherwise we essentially need to take everything that we have in the stringbuilder and turn it into

an actual number.

So we say result dot add and we make a new token where the token type is integer and the actual text

is SB dot to string.

So we finally materialize the stringbuilder this way.

So we put a break here and in addition, we should have a break here as well.

Okay, so this is how you actually index things and we can now grab this text, we can perform the Lexing

operation.

So here I’m going to say var tokens equals Lex the input, please.

And then I’m going to write line and I’m going to take all the tokens.

And by the way, for write line I need using static system console here.

So for write line, I’m going to take the tokens and I’m going to actually let’s do it this way, let’s

do string dot join, I’ll use a Tab character and we’ll take all the tokens and we’ll see if the tokens

are the correct tokens that we are expecting.

So as soon as this compiles, hooray.

So as you can see, we’re getting the right result.

We’re getting the opening parentheses, then 13 plus four closing parentheses and so on and so forth.

So we have now completed the first part of the overall interpretation process.

We’ve turned an input, which is an ordinary string, just a sequence of characters.

We’ve turned it into a sequence of tokens, and from this on we can take the sequence of tokens and

turn it into something object oriented, nicely structured, possibly recursive.

So that’s what we’re going to look at next.

Play Stop Play Play Start Play information alert