slideList is[slidei, slide1,slide1a, slide2,slide2a, slide2b, slide2c, slide2d, slide2e, slide2f, slideii, slide3,slide4,slide5,slide6,slide7,slide8,slide9,slide10,slide11,slide11a,slide12,slide13,slide14,slide15,slide16,slide17,slide18,slide19,slide20,slide21,slide22,slide23]; slidei is Slide("

Hui Zhu's JS-EDEN parsing model: a tutorial

Traditional parsers are often implemented using parser generator tools, such as the UNIX utilities Flex and Bison. The full details of the parsing techniques used by Flex and Bison are beyond the scope of this tutorial, but the general principles will be illustrated by making an EM construal. Before tracing the stages in the development of the construal, we first demonstrate it in operation in a simple application: transforming an arithmetic expression in '+' and '*' into postfix (\"Reverse Polish\") notation.

To follow the demonstration, it is helpful to first dispose the panels on the screen so that you can conveniently access this presentation and the EDEN Input Window together with the HTML5 Canvases default, view_3, view_4 and view_5.

"); slide1 is Slide("

Exploring and understanding the JS-EDEN Parsing model

The JS-EDEN parsing model is an EM construal of a traditional parser. Making the construal exposes the whole process of parsing in detail. In its fully developed form the construal can also be used to parse given strings according to a specified grammar.

Because the model serves as a parser written in JS-EDEN, it enables users to write a model in a new notation without using any external compiler. It also benefits students of parsing theory who can gain understanding of the process of parsing through exploring the model.

Before tracing the construction of the parsing model in detail, we first illustrate key features of the complete model.

"); slide1a is Slide("

Exercising the completed parsing model

When Flex and Bison are used to generate a parser for a grammar, a crucial step is the construction of a finite state machine (FSM) that is based on the grammar.

The graph on the default HTML5 Canvas is a visualisation of the FSM that is generated from the following simple expression grammar:

E → E+T
E → T


T → T*F
T → F

F → (E)
F → id

In this grammar, the terminal symbol id refers either to a specific numerical value or to an identifier which has a numerical value associated with it. The non-terminal symbol E stands for an arbitrary arithmetic expression in + and * in which the operands are numbers or identifiers with an associated numerical value.

Note that this grammar (which is taken from teaching resources published at the University of Alaska Fairbanks at the url http://www.cs.uaf.edu/~cs631/aho/y.output) is rather ill-suited to automated parsing. It is however quite a good example for the purposes of EM construal.

"); slide2 is Slide("

The FSM derived from the grammar

The FSM depicted on the default HTML5 Canvas is derived from a textual representation that can be produced by Flex and Bison. To open an HTML view to display

together with you can execute the following JS-EDEN code:

include(\"models/parsingEMPE/funcGrammarHTML.js-e\");

The y.output file spells out the way in which the 13 states of the FSM are linked by transitions associated with the terminal and non-terminal symbols in the grammar. These edges are labelled by their corresponding symbols, and are also colour coded accordingly. The initial state of the FSM is state 0 and the final state is state 7. The FSM makes transitions from state to state as the parse proceeds and in the process builds up a path within the FSM. The edges of this path are displayed in black.

"); slide2a is Slide("

Exercising the JS-EDEN parsing construal

We shall illustrate parsing using the construal with a simple example: the conversion of the expression a*(b+3*(b+c)) into reverse Polish notation. To follow the parsing process step-by-step, dispose the panels on the display so that the finite state machine and the Canvas panels view_3, view_4 and vew_5 are visible. To initialise the parse, then execute:

inputstr = \"a*(b+3*(b+c))\";

You will notice that the input string is registered in view_5 and (after lexical analysis) is represented as a string of tokens of the form \"i*(i+i*(i+i))\", where i here represents an instance of the ID terminal symbol.

The parser reads the input string from left to right. There are two basic operations - shift and reduce. A shift operation transfers the next symbol from the input string to the stack that is depicted in view_4. A reduce operation replaces the top segment of the stack by a single non-terminal symbol in accordance with the grammar rules set out in view_3. These operations can be performed manually by pressing the Next Step and Apply Rule buttons, which correspond to shift and reduce respectively.

"); slide2b is Slide("

Performing the parse manually

With the Input String in place, you begin the parse by pressing the Next Step button. This reads the first symbol from the input, which is the identifier 'a'. You will see that this identifier appears on the stack in view_4, together with its associated semantic value, which for the purpose of this translation exercise is the string \"a\". You will also notice that a path in the FSM is initiated - it comprises a single edge, labelled by the token ID, that represents a transition from the initial state 0 to the state 1.

The idea behind the parsing method is that a derivation tree for the input string is developed incrementally by identifying the top segment of the stack with the RHS of a grammar rule, and replacing this by the non-terminal symbol on the LHS ('a reduction'). The semantic values associated with the top segment of the stack are simultaneously processed ('syntax-directed translation'). If the parse is to succeed then the entire input string must be processed in such a way that on termination the stack consists of a single symbol - the non-terminal E, which represents a well-formed arithmetic expression.

"); slide2c is Slide ("

Completing the manual parse

When the first input has been read, there are several possible reductions that can be performed. For instance, we could apply rule 6 to reduce ID to F, then apply rule 4 to reduce F to T, then apply rule 2 to T to get E. Given that the next symbol in the input is '*', reference to the Grammar Rules in view_3 shows that there is only one way in which the top segment of the stack can go on to form the RHS of a grammar rule once the '*' has been read. This can only happen through a reduction using rule 3 for which the RHS takes the form 'T*F'. On that basis the initial reduction of the ID must generate a T by invoking rule 6 followed by rule 4, but not applying rule 2.

Following the above reasoning, it follows that the first steps of the manual parse are: Apply Rule 6, Apply Rule 4 then press Next Step. As you can see from inspection of the input, the reduction using rule 3 alluded to above can't in fact take place until the final symbol in the string is read. The next point in the parse at which a reduction is possible is when the leftmost occurrence of the identifier b is read. The procedure to complete the parsing from this point is left as an exercise to the reader.

"); slide2d is Slide("

Further observations on the manual parsing process

There is at all times a direct correspondence between symbols on the stack and the path so far traversed in the FSM. Since the input string, and the stack, can have unbounded length the path necessarily can include cycles - as you will have observed in parsing the specified input.

You can interpret the shift and reduce operations in the FSM. A shift corresponds to a transition from one state to another in direct response to reading a terminal symbol. A reduction is associated with snipping a segment off the end of the path and replacing it by a single edge labelled by a non-terminal symbol. (This replacement process is associated with the 'go to' constructs in the description of the FSM in y.output.) The current state of the FSM is always the node at the end of the path.

What makes the grammar badly designed for automatic parsing is what makes it a more interesting subject in manual parsing - the fact that there are often choices to be made between possible shift and reduce actions. These are referred to as 'shift-reduce' and reduce-reduce' conflicts.

"); slide2e is Slide("

Automating the parsing

When you have successfully carried out the parse manually, you will recognise a simple set of rules that you can follow in order to automate the process. The parsing model is set up in such a way that by setting up a boolean flag 'isautomatic' you can invoke a procedure called 'automatic' that you specify yourself.

A suitable 'automatic' procedure for our example grammar - already loaded in the model, but available here for modification in case you wish to experiment - is specified on the next slide.

When you have inspected the automatic procedure you can invoke automatic exection and watch the parsing process being replayed. To toggle between manual and automatic execution, you execute the following code: isautomatic = 1- isautomatic;

In this way, you can in fact intervene in the automated parsing process if you wish.

"); slide2f is Slide ("

Automating the parsing procedure

speed=2000; proc automatic: k,InputTextt,isautomatic { if((currrule!=\"\")&&(isautomatic==1)){ after(speed){ if((currrule==2)&&(nextsymbol==\"*\")) k++; else if((currstate==11)&&(nextsymbol!=\"*\")){ evalrule(1); applyrule(1);} else if ((currstate==11)&&(nextsymbol==\"*\")){ k++; } else if(currstate==12){ evalrule(3); applyrule(3);} else { evalrule(currrule); applyrule(currrule); } } } if((currrule==\"\")&&(k<=InputText#)&&(isautomatic==1)) { after(speed){ k++; } } }

"); slideii is Slide("

How the JS-EDEN parsing construal was constructed

Having seen how the JS-EDEN parsing model works, you are now in a position to examine its construction in more detail. Following standard EM practice, the construction will be traced from beginning to end. For this purpose, it will be convenient to restart JS-EDEN and initialise by cutting and pasting the following code into the Input Window:

include(\"models/parsingEMPE/newrun.js-e\");

This will remove the definitions that specify the parsing model and restore you to the same point in this presentation.

Though the reconstruction of the parsing construal is quite detailed and can be carried out simply by following through the presentation, there are some additional definitions etc that are not explicitly exposed. To examine these, it would be necessary to refer to the source of the model itself within the online directory of models associated with the JS-EDEN master.

"); slide3 is Slide("

How to set up the Model and Grammar information

To record the structure of the FSM we use a list called 'transitions' with 13 elements because the finite state machine has 13 states. Since the states in the FSM are indexed from 0, it is convenient to record the transitions associated with state i in the i-th entry of the list for i in the range 1 to 12 and use the 13th entry to record transitions in state 0. When there is no meaningful transition defined for a given state-symbol pair we introduce a transition to a dummy state with index '100'.

For example, from the grammar, we see that in state 4, reading a *, we should go to state 9. This piece of information will be stored in the list element transitions[4] - itself a list - as a triple [4,\"*\",9]. Likewise the transition from state 0 to state 5 in response to an F is recorded in transitions[13] as a triple [13, \"F\",5].

Constructing the list of transitions manually is tiresome work. To avoid this, we could use a UNIX script file for automatic translation of the y.output file. An example of such a file (to be run in sed environment) can be found here:

mkeden file

The mkeden file will transform the grammar information into an EDDI table from which it is easy to derive an EDEN list with 13 states. This list can be displayed in an HTML view by executing the following code: include(\"models/parsingEMPE/TransitionsPicture.js-e\"); "); slide4 is Slide("

Setting up the parsing model environment

To set up the parsing model, execute the code below:

include(\"models/parsingEMPE/parsingModelRun.js-e\");

Now we have a plain canvas. In order to understand how the traditional parser works, the next thing to do is to make a visualisation of the finite state machine on the canvas.

We are going to use the graph tool in JS-Eden. Before starting adding something to the canvas, we need to set values for two observables: numberOfNodes (number of states) and numberOfSyms (number of symbols featuring in the grammar).

numberOfNodes=13; numberOfSyms=8; "); slide5 is Slide("

Visualisation of the finite state machine: Nodes and Labels

addNode(mygraph,\"s0\");
The red node on the canvas is a graph representation of node 0 (state 0). Of course, we want to have more:
addNode(mygraph,\"s1\"); addNode(mygraph,\"s2\"); addNode(mygraph,\"s3\");

We will add a label to each node to distinguish the nodes.

makeNodeLabel(0); makeNodeLabel(1); makeNodeLabel(2); makeNodeLabel(3);

Following this procedure, we can draw all the 13 nodes on the canvas: addNode(mygraph,\"s4\");makeNodeLabel(4); addNode(mygraph,\"s5\");makeNodeLabel(5); addNode(mygraph,\"s6\");makeNodeLabel(6); addNode(mygraph,\"s7\");makeNodeLabel(7); addNode(mygraph,\"s8\");makeNodeLabel(8); addNode(mygraph,\"s9\");makeNodeLabel(9); addNode(mygraph,\"s10\");makeNodeLabel(10); addNode(mygraph,\"s11\");makeNodeLabel(11); addNode(mygraph,\"s12\");makeNodeLabel(12);

"); slide6 is Slide("

Visualisation of the finite state machine: Edges

Edges to represent transitions between states can be added to the canvas. addEdge(mygraph,\"s0\",\"s4\");
We can see an arrowed line pointing from source(s1) to target(s4), which indicates there is an available transition from state 0 to state 4.
A label for an edge designates the symbol to be read in order to carry out this transition. From the grammar, we know that the symbol for transition from state 0 to state 4 is \"T\". makeEdgeLabel(0,4);

Consequently, in accordance with the actual transitions from the grammar, the rest of the edges can be shown on the canvas: addEdge(mygraph,\"s0\",\"s5\"); makeEdgeLabel(0,5); addEdge(mygraph,\"s0\",\"s3\"); makeEdgeLabel(0,3); addEdge(mygraph,\"s0\",\"s2\"); makeEdgeLabel(0,2); addEdge(mygraph,\"s0\",\"s1\"); makeEdgeLabel(0,1); addEdge(mygraph,\"s2\",\"s1\"); makeEdgeLabel(2,1); addEdge(mygraph,\"s2\",\"s2\"); makeEdgeLabel(2,2); addEdge(mygraph,\"s2\",\"s6\"); makeEdgeLabel(2,6); addEdge(mygraph,\"s2\",\"s4\"); makeEdgeLabel(2,4); addEdge(mygraph,\"s2\",\"s5\"); makeEdgeLabel(2,5); addEdge(mygraph,\"s3\",\"s7\"); makeEdgeLabel(3,7); addEdge(mygraph,\"s3\",\"s8\"); makeEdgeLabel(3,8); addEdge(mygraph,\"s4\",\"s9\"); makeEdgeLabel(4,9); addEdge(mygraph,\"s6\",\"s8\"); makeEdgeLabel(6,8); addEdge(mygraph,\"s6\",\"s10\"); makeEdgeLabel(6,10); addEdge(mygraph,\"s8\",\"s11\"); makeEdgeLabel(8,11); addEdge(mygraph,\"s8\",\"s5\"); makeEdgeLabel(8,5); addEdge(mygraph,\"s8\",\"s1\"); makeEdgeLabel(8,1); addEdge(mygraph,\"s8\",\"s2\"); makeEdgeLabel(8,2); addEdge(mygraph,\"s9\",\"s1\"); makeEdgeLabel(9,1); addEdge(mygraph,\"s9\",\"s2\"); makeEdgeLabel(9,2); addEdge(mygraph,\"s9\",\"s12\"); makeEdgeLabel(9,12); addEdge(mygraph,\"s11\",\"s9\"); makeEdgeLabel(11,9);

"); slide7 is Slide("

Laying out the Graph

In addition to displaying nodes and edges on the canvas, you can also move them by clicking. A procedure is required to move each node on the canvas: proc mouseMove_s1: mousePressed { if(((mouseX-ps1x)*(mouseX-ps1x)<100 )&&((mouseY-ps1y)*(mouseY-ps1y)<100)) { if(mousePressed) { ps1x is mouseX; ps1y is mouseY; } else{ ps1x = mouseX; ps1y = mouseY; } } }
Now s1 is movable. To move other nodes is routine. Just put the name of the node which you'd like to move in place of s1 in the above code.
However the whole process is rather slow and tedious! If you wish, you can try to use the 'definition factory' (also available in JS-Eden in the emile variant - see Lab 10 from CS405 in 2012) to save time. In this session, we won't go further with the use of definition factory, you can explore it yourself.

The code to make all the nodes movable is specified here: include(\"models/parsingEMPE/moveNodes.js-e\");

"); slide8 is Slide("

Presenting the graph

Previous slides explain how to store the grammar information in a JS-Eden list. In addition, we also want this information to be accessible to the learner in a window that they can refer to when trying to parse a string manually.

parsingrules is a list which stores the grammar information. It differs from transitions, in that each of its entries is one line in the grammar in a human-readable form and does not store any information about the finite state machine. parsingrules is [line1,line2,line3,line4,line5,line6,line7]; line1=\"$accept: E $end\"; line2=\"1 E: E+T\"; line3=\"2 | T\"; line4 =\"3 T: T*F\"; line5 =\"4 | F\"; line6 =\"5 F: (E) \"; line7 =\"6 | ID \";
In the window named \"CANVAS HTML5 view_3\", you will see the grammar information displayed.

In order to make the graph easier to interpret, it is also convenient to associate a default colour with each edge. This colour is to be determined by the label of the edge.

"); slide9 is Slide("

Modelling the state of the parser dynamically

The graph of the finite state machine is intended to represent the state of parser at each stage. To highlight the edges being traversed on the current path, we are going to make the traversed edges black whilst others retain their default color.

In order to do this, we need a function which returns how many times each edge has been traversed and observables to represent the colours of the edges. We then define edge colours by dependency.

Suppose we have an observable cols0s1, which is the color of the edge s0s1 - what kind of dependency do we want?
We want the color of the edge dependent on the number of times it has been traversed. Obviously, we need another observable of edge s0s1 for the number of times it has been traversed.

ctxy is a function, which takes three parameters: the start point and end point of an edge and the currpath (observable for current path being traversed), and returns the number of times that this edge has been traversed.
So if we make the colour of the edge depend on the number of times it has been traversed, the colours of the edges together will clearly show the path. For edge s0s1:

cts0s1 is ctxy(0,1,currpath); cols0s1 is cts0s1>0 ? \"black\": defaultcolour(0,1,transitions[13]); "); slide10 is Slide("

Counting the number of traversals of each edge

To define the colours of all the edges, we first set up an observable to record how many times it is being traversed in the current path:

cts0s1 is ctxy(0,1,currpath);cts0s3 is ctxy(0,3,currpath); cts0s2 is ctxy(0,2,currpath);cts0s4 is ctxy(0,4,currpath); cts0s5 is ctxy(0,5,currpath);cts2s1 is ctxy(2,1,currpath); cts2s6 is ctxy(2,6,currpath);cts2s2 is ctxy(2,2,currpath); cts2s4 is ctxy(2,4,currpath);cts2s5 is ctxy(2,5,currpath); cts3s7 is ctxy(3,7,currpath);cts3s8 is ctxy(3,8,currpath); cts4s9 is ctxy(4,9,currpath);cts6s8 is ctxy(6,8,currpath); cts6s10 is ctxy(6,10,currpath);cts8s1 is ctxy(8,1,currpath); cts8s5 is ctxy(8,5,currpath);cts8s2 is ctxy(8,2,currpath); cts8s11 is ctxy(8,11,currpath);cts9s1 is ctxy(9,1,currpath); cts9s2 is ctxy(9,2,currpath);cts9s12 is ctxy(9,12,currpath); cts11s9 is ctxy(11,9,currpath); "); slide11 is Slide("

Specifying the colour of each edge

cols0s1 is cts0s1>0 ? \"black\": defaultcolour(0,1,transitions[13]); cols0s2 is cts0s2>0 ? \"black\": defaultcolour(0,2,transitions[13]); cols0s3 is cts0s3>0 ? \"black\": defaultcolour(0,3,transitions[13]); cols0s4 is cts0s4>0 ? \"black\": defaultcolour(0,4,transitions[13]); cols0s5 is cts0s5>0 ? \"black\": defaultcolour(0,5,transitions[13]); cols2s1 is cts2s1>0 ? \"black\": defaultcolour(2,1,transitions[2]); cols2s2 is cts2s2>0 ? \"black\": defaultcolour(2,2,transitions[2]); cols2s4 is cts2s4>0 ? \"black\": defaultcolour(2,4,transitions[2]); cols2s5 is cts2s5>0 ? \"black\": defaultcolour(2,5,transitions[2]); cols2s6 is cts2s6>0 ? \"black\": defaultcolour(2,6,transitions[2]); cols3s7 is cts3s7>0 ? \"black\": defaultcolour(3,7,transitions[3]); cols3s8 is cts3s8>0 ? \"black\": defaultcolour(3,8,transitions[3]); cols4s9 is cts4s9>0 ? \"black\": defaultcolour(4,9,transitions[4]); cols6s8 is cts6s8>0 ? \"black\": defaultcolour(6,8,transitions[6]); cols6s10 is cts6s10>0 ? \"black\": defaultcolour(6,10,transitions[6]); cols8s1 is cts8s1>0 ? \"black\": defaultcolour(8,1,transitions[8]); cols8s2 is cts8s2>0 ? \"black\": defaultcolour(8,2,transitions[8]); cols8s5 is cts8s5>0 ? \"black\": defaultcolour(8,15,transitions[8]); cols8s11 is cts8s11>0 ? \"black\": defaultcolour(8,11,transitions[8]); cols9s1 is cts9s1>0 ? \"black\": defaultcolour(9,1,transitions[9]); cols9s2 is cts9s2>0 ? \"black\": defaultcolour(9,2,transitions[9]); cols9s12 is cts9s12>0 ? \"black\": defaultcolour(9,12,transitions[9]); cols11s9 is cts11s9>0 ? \"black\": defaultcolour(11,9,transitions[11]); "); slide11a is Slide("

Refining the visualisation

The use of the graph prototype saves much work in setting up the FSM, but there are still ways in which it has to be improved for readability. Having set up the FSM visualisation as above, it is convenient to layout the graph and modify the display of the nodes to make it easier to read. To this end, you can include the following definitions: include(\"models/parsingEMPE/fsminit.js-e\");

These definitions make adjustments to the value of observables that have been generated using the graph prototype. The observables concerned specify the explicit coordinates of points (e.g. ps0), the background colour of each of the nodes (e.g. cols0, cols1), and the position of the labels of selected nodes (e.g. labs10).

Note how this process of modifying the product of a prototyping activity in accordance with Empirical Modelling principles highlights the important distinction between tools that are developed with prototype-based rather than class-based object-orientation in mind. This is an important respect in which JavaScript is well-suited to EM.

"); slide12 is Slide("

Adding the parsing functionality

The key observables in the parsing process construal are:

currpath - stores the content on the state stack as a list.
currstack - current content on the stack as a list.
currrule - the number of the grammar rule the parser determines that all the items on the right-hand side of which have been seen(the production by which the parser reduces).
currrstate - current state of the parser (the last element of the currpath, or the top element on the stack state).

nextsymbol - represents the look ahead token.

isreduction - reads the transformations table and decides if the parser should reduce in the current state. If there is a shift-reduce conflict, it always chooses to shift.

InputText - the input string, stays the same all the time.
InputTextt - same as the input string at the beginning, but changes according to the reduction.

k - the index of the current symbol in the InputText (how far we have read in the InputText).
j - the index of the current symbol in the InputTextt (how far we have read in the InputTextt).
The observables k and j are used to control how many tokens the parser has read from the input. The values of k and j are increased by 1 every time the \"next step\" button is clicked. "); slide13 is Slide("

Making dependencies

The observables whose values are subject to be changed directly by an agent action, rather than being defined by a dependency relation, are InputText and k.
This network of dependency is derived from the operation of 'real' traditional parsers:
- The input and how far the parser has read from the beginning defines the look ahead token;
- The shift action pops a token and a state onto the stack;
- Based on its current state, the model decides whether it needs a look ahead token to carry out the next action, or whether it should reduce.
The observable InputTextt has the same initial value as InputText and changes when reduction occurs. An index j is used to record how far the parser has read in the InputTextt. Like InputText, InputTextt is independent of other observables and the observables currstack, currpath, currstate,etc depend on it. The dependencies between key observables are: k=0; InputText=\"\"; InputTextt = InputText//\"$\"; InputTextEnd = oneline(InputText)//\"$\"; j is InputTextt#-nextpart#; currstack is makestack(stringtolist(substr(InputTextt,1,j))); currpath is makepath(currstack); currstate is findstate(currpath); nextsymbol is strplus(substr(InputTextEnd,k+1,k+1),currstate); nextpart is substr(InputTextEnd,k+1,InputTextEnd#);

"); slide14 is Slide("

Lexical Analysis

The lexical analysis part of parsing models in JS-EDEN is different from the lexical analysis of the traditional parser. This is done by using the function subpatt, which is an example of embedding JavaScript in EDEN: func substpatt { ${{ var th = arguments[0]; var repatt = arguments[1]; var reattr = arguments[2]; var repstr = arguments[3]; return th.replace(RegExp(repatt, reattr), repstr); }}$; }

This function recognizes regular expressions and the set of tokens is defined by regular expressions.

If an input string is given, the model should find IDs and exprs, and mark them as identifiers.

proc makeinputtext : inputstr{ InputText=inputstr; InputText = substpatt(InputText,\"[A-Za-z][a-zA-Z0-9]*\",\"g\",\"i\"); InputText = substpatt(InputText,\"[0-9]+\",\"g\",\"i\"); InputTextEnd = oneline(InputText)//\"$\"; InputTextt=InputText//\"$\"; } "); slide15 is Slide("

Input values

Using the subspatt function, we substitute values in the input by lexified tokens. And we want to keep a record of the original input values, and store them into a list.

The record of the input values, can be used in the last step of parsing - syntax directed translation. So we make a function named makeinputvals. This function will be different for different grammars. It makes uses of the function findpatt (which finds and returns a regular string):

func findpatt { ${{ var th = arguments[0]; var repatt = arguments[1]; var reattr = arguments[2]; var result = th.match(RegExp(repatt, reattr)); if(result==null) return \"\"; else return result; }}$; }

and analyzes the input to many cases. strvals stores the values of the input string to a list. Below is the definition for strvals.

strvals is findpatt(inputstr,\"[A-Za-z][a-zA-Z0-9]*|[1-9][0-9]*\",\"g\");

Can you make a makeinputvals function for the simple expression grammar? For test purposes you can include the function via the Input Window. A suitable specification is given on the next slide.

"); slide16 is Slide(" func makeinputvals { para input, val; auto i,j,result,found,vals; vals=val.slice(0); j=1;result=[]; for(i=1;i<=input#;i++) { if(input[i]==\"i\") { for(j=1;j<=vals#;j++) { if (findpatt(vals[j],\"[A-Za-z][a-zA-Z0-9]*\",\"g\") != \"\") { result=result//vals[j]; vals[j]=\"\"; break; } else if (findpatt(vals[j],\"[0-9]+\",\"g\") != \"\") { result=result//vals[j]; vals[j]=\"\"; break; } } }else {result=result//[\"\"];} } return result; } "); slide17 is Slide("

Shift and Reduce

When the value of k increases 1, the parsing model reads the next token and shifts. In some certain points of the parsing, the parsing model should reduce according to the grammar rules. This means that it should change some thing on the top of the stack. So to do reduction, we make functions which takes InputTextt as parameter, and returns the changed value of InputTextt.

In grammar, we have 6 reduction rules:

$accept: E $end\";
1 E: E+T;
2 | T;
3 T: T*F;
4 | F;
5 F: (E) ;
6 | ID ;

The substitution associated with a reduction using rule 6 is as follows:

func substituteID { para cstack; auto result; if (currstack#-1>0) result=substr(cstack,1,currstack#-1)//\"F\"//nextpart; else result=\"F\"//nextpart; return result; } "); slide18 is Slide("

Implementing the reduction rules ...

func substituteEplusT { para cstack; auto result; if (currstack#-3>0) result=substr(cstack,1,currstack#-3)//\"E\"//nextpart; else result=\"E\"//nextpart; return result; } func substituteT { para cstack; auto result; if (currstack#-1>0) result=substr(cstack,1,currstack#-1)//\"E\"//nextpart; else result=\"E\"//nextpart; return result; } func substituteTmultiF { para cstack; auto result; if (currstack#-3>0) result=substr(cstack,1,currstack#-3)//\"T\"//nextpart; else result=\"T\"//nextpart; return result; } "); slide19 is Slide("

Implementing the reduction rules ...

func substituteF{ para cstack; auto result; if(currstack#-1>0) result=substr(cstack,1,currstack#-1)//\"T\"//nextpart; else result =\"T\"//nextpart; return result; } func substituteEbrackets{ para cstack; auto result; if (currstack#-3>0) result=substr(cstack,1,currstack#-3)//\"F\"//nextpart; else result=\"F\"//nextpart; return result; } "); slide20 is Slide("

Implementing the reduction rules

We also need a function to decide which of the substitute function should be called: func applyrule { para ruleno; if (ruleno==1) execute(\"InputTextt=substituteEplusT(InputTextt);\"); else if (ruleno==2) execute(\"InputTextt=substituteT(InputTextt);\"); else if (ruleno==3) execute(\"InputTextt=substituteTmultiF(InputTextt);\"); else if (ruleno==4) execute(\"InputTextt=substituteF(InputTextt);\"); else if (ruleno==5) execute(\"InputTextt=substituteEbrackets(InputTextt);\"); else if (ruleno==6) execute(\"InputTextt=substituteID(InputTextt);\"); else execute(\"InputTextt=InputTextt;\"); } "); slide21 is Slide("

Syntax Directed Translation

In parallel with the observables InputTextt and currstack, we have observables inputvals and stackvals to record the values of the input. When the parsing model reduces by rule 1, the inputvals is not changed. But when it reduces by rule 2, the inputvals changes as it combines the eight elements to one entry.

func substituteEplusT2 { para vals; auto calc, result; calc=j; if (calc-3>0) result=sublist(vals,1,calc-3)//[vals[calc-2]//\" \"//vals[calc]//\"+\"]//nextvals; else result=[vals[calc-2]//\" \"//vals[calc]//\"+\"]//nextvals; return result; } func substituteTmultiF2 { para vals; auto calc, result; calc=j; if (calc-3>0) result=sublist(vals,1,calc-3)//[vals[calc-2]//\" \"//vals[calc]//\"*\"]//nextvals; else result=[vals[calc-2]//\" \"//vals[calc]//\"*\"]//nextvals; return result; } "); slide22 is Slide(" func substituteEbrackets2 { para vals; auto calc, result; calc=j; if (calc-3>0) result=sublist(vals,1,calc-3)//[vals[calc-1]]//nextvals; else result=[vals[calc-1]]//nextvals; return result; } proc evalrule{ para ruleno; if ((ruleno==2)||(ruleno==4)||(ruleno==6)) execute(\"inputtvals=inputtvals;\"); else if (ruleno==1) execute(\"inputtvals=substituteEplusT2(inputtvals);\"); else if (ruleno==3) execute(\"inputtvals=substituteTmultiF2(inputtvals);\"); else if (ruleno ==5) execute(\"inputtvals=substituteEbrackets2(inputtvals);\"); } "); slide23 is Slide("

Parsing Input

Finally we have set up the environment for parsing. We can try to give some input string and the parser will do the lexical analysis and produce an input text. inputstr=\"3+5*8*(6+5)\";

The above is an input string accepted by the parser. You can also try some input strings which are not accepted:

inputstr=\"(a**3)\";

and

inputstr=\"p+*9\";

");