mirror of https://github.com/acidanthera/audk.git
2050 lines
81 KiB
Plaintext
2050 lines
81 KiB
Plaintext
|
======================================================================
|
||
|
|
||
|
CHANGES_SUMMARY.TXT
|
||
|
|
||
|
A QUICK overview of changes from 1.33 in reverse order
|
||
|
|
||
|
A summary of additions rather than bug fixes and minor code changes.
|
||
|
|
||
|
Numbers refer to items in CHANGES_FROM_133*.TXT
|
||
|
which may contain additional information.
|
||
|
|
||
|
DISCLAIMER
|
||
|
|
||
|
The software and these notes are provided "as is". They may include
|
||
|
typographical or technical errors and their authors disclaims all
|
||
|
liability of any kind or nature for damages due to error, fault,
|
||
|
defect, or deficiency regardless of cause. All warranties of any
|
||
|
kind, either express or implied, including, but not limited to, the
|
||
|
implied warranties of merchantability and fitness for a particular
|
||
|
purpose are disclaimed.
|
||
|
|
||
|
======================================================================
|
||
|
|
||
|
#258. You can specify a user-defined base class for your parser
|
||
|
|
||
|
The base class must constructor must have a signature similar to
|
||
|
that of ANTLRParser.
|
||
|
|
||
|
#253. Generation of block preamble (-preamble and -preamble_first)
|
||
|
|
||
|
The antlr option -preamble causes antlr to insert the code
|
||
|
BLOCK_PREAMBLE at the start of each rule and block.
|
||
|
|
||
|
The antlr option -preamble_first is similar, but inserts the
|
||
|
code BLOCK_PREAMBLE_FIRST(PreambleFirst_123) where the symbol
|
||
|
PreambleFirst_123 is equivalent to the first set defined by
|
||
|
the #FirstSetSymbol described in Item #248.
|
||
|
|
||
|
#248. Generate symbol for first set of an alternative
|
||
|
|
||
|
rr : #FirstSetSymbol(rr_FirstSet) ( Foo | Bar ) ;
|
||
|
|
||
|
#216. Defer token fetch for C++ mode
|
||
|
|
||
|
When the ANTLRParser class is built with the pre-processor option
|
||
|
ZZDEFER_FETCH defined, the fetch of new tokens by consume() is deferred
|
||
|
until LA(i) or LT(i) is called.
|
||
|
|
||
|
#215. Use reset() to reset DLGLexerBase
|
||
|
#188. Added pccts/h/DLG_stream_input.h
|
||
|
#180. Added ANTLRParser::getEofToken()
|
||
|
#173. -glms for Microsoft style filenames with -gl
|
||
|
#170. Suppression for predicates with lookahead depth >1
|
||
|
|
||
|
Consider the following grammar with -ck 2 and the predicate in rule
|
||
|
"a" with depth 2:
|
||
|
|
||
|
r1 : (ab)* "@"
|
||
|
;
|
||
|
|
||
|
ab : a
|
||
|
| b
|
||
|
;
|
||
|
|
||
|
a : (A B)? => <<p(LATEXT(2))>>? A B C
|
||
|
;
|
||
|
|
||
|
b : A B C
|
||
|
;
|
||
|
|
||
|
Normally, the predicate would be hoisted into rule r1 in order to
|
||
|
determine whether to call rule "ab". However it should *not* be
|
||
|
hoisted because, even if p is false, there is a valid alternative
|
||
|
in rule b. With "-mrhoistk on" the predicate will be suppressed.
|
||
|
|
||
|
If "-info p" command line option is present the following information
|
||
|
will appear in the generated code:
|
||
|
|
||
|
while ( (LA(1)==A)
|
||
|
#if 0
|
||
|
|
||
|
Part (or all) of predicate with depth > 1 suppressed by alternative
|
||
|
without predicate
|
||
|
|
||
|
pred << p(LATEXT(2))>>?
|
||
|
depth=k=2 ("=>" guard) rule a line 8 t1.g
|
||
|
tree context:
|
||
|
(root = A
|
||
|
B
|
||
|
)
|
||
|
|
||
|
The token sequence which is suppressed: ( A B )
|
||
|
The sequence of references which generate that sequence of tokens:
|
||
|
|
||
|
1 to ab r1/1 line 1 t1.g
|
||
|
2 ab ab/1 line 4 t1.g
|
||
|
3 to b ab/2 line 5 t1.g
|
||
|
4 b b/1 line 11 t1.g
|
||
|
5 #token A b/1 line 11 t1.g
|
||
|
6 #token B b/1 line 11 t1.g
|
||
|
|
||
|
#endif
|
||
|
|
||
|
A slightly more complicated example:
|
||
|
|
||
|
r1 : (ab)* "@"
|
||
|
;
|
||
|
|
||
|
ab : a
|
||
|
| b
|
||
|
;
|
||
|
|
||
|
a : (A B)? => <<p(LATEXT(2))>>? (A B | D E)
|
||
|
;
|
||
|
|
||
|
b : <<q(LATEXT(2))>>? D E
|
||
|
;
|
||
|
|
||
|
|
||
|
In this case, the sequence (D E) in rule "a" which lies behind
|
||
|
the guard is used to suppress the predicate with context (D E)
|
||
|
in rule b.
|
||
|
|
||
|
while ( (LA(1)==A || LA(1)==D)
|
||
|
#if 0
|
||
|
|
||
|
Part (or all) of predicate with depth > 1 suppressed by alternative
|
||
|
without predicate
|
||
|
|
||
|
pred << q(LATEXT(2))>>?
|
||
|
depth=k=2 rule b line 11 t2.g
|
||
|
tree context:
|
||
|
(root = D
|
||
|
E
|
||
|
)
|
||
|
|
||
|
The token sequence which is suppressed: ( D E )
|
||
|
The sequence of references which generate that sequence of tokens:
|
||
|
|
||
|
1 to ab r1/1 line 1 t2.g
|
||
|
2 ab ab/1 line 4 t2.g
|
||
|
3 to a ab/1 line 4 t2.g
|
||
|
4 a a/1 line 8 t2.g
|
||
|
5 #token D a/1 line 8 t2.g
|
||
|
6 #token E a/1 line 8 t2.g
|
||
|
|
||
|
#endif
|
||
|
&&
|
||
|
#if 0
|
||
|
|
||
|
pred << p(LATEXT(2))>>?
|
||
|
depth=k=2 ("=>" guard) rule a line 8 t2.g
|
||
|
tree context:
|
||
|
(root = A
|
||
|
B
|
||
|
)
|
||
|
|
||
|
#endif
|
||
|
|
||
|
(! ( LA(1)==A && LA(2)==B ) || p(LATEXT(2)) ) {
|
||
|
ab();
|
||
|
...
|
||
|
|
||
|
#165. (Changed in MR13) option -newAST
|
||
|
|
||
|
To create ASTs from an ANTLRTokenPtr antlr usually calls
|
||
|
"new AST(ANTLRTokenPtr)". This option generates a call
|
||
|
to "newAST(ANTLRTokenPtr)" instead. This allows a user
|
||
|
to define a parser member function to create an AST object.
|
||
|
|
||
|
#161. (Changed in MR13) Switch -gxt inhibits generation of tokens.h
|
||
|
|
||
|
#158. (Changed in MR13) #header causes problem for pre-processors
|
||
|
|
||
|
A user who runs the C pre-processor on antlr source suggested
|
||
|
that another syntax be allowed. With MR13 such directives
|
||
|
such as #header, #pragma, etc. may be written as "\#header",
|
||
|
"\#pragma", etc. For escaping pre-processor directives inside
|
||
|
a #header use something like the following:
|
||
|
|
||
|
\#header
|
||
|
<<
|
||
|
\#include <stdio.h>
|
||
|
>>
|
||
|
|
||
|
#155. (Changed in MR13) Context behind predicates can suppress
|
||
|
|
||
|
With -mrhoist enabled the context behind a guarded predicate can
|
||
|
be used to suppress other predicates. Consider the following grammar:
|
||
|
|
||
|
r0 : (r1)+;
|
||
|
|
||
|
r1 : rp
|
||
|
| rq
|
||
|
;
|
||
|
rp : <<p LATEXT(1)>>? B ;
|
||
|
rq : (A)? => <<q LATEXT(1)>>? (A|B);
|
||
|
|
||
|
In earlier versions both predicates "p" and "q" would be hoisted into
|
||
|
rule r0. With MR12c predicate p is suppressed because the context which
|
||
|
follows predicate q includes "B" which can "cover" predicate "p". In
|
||
|
other words, in trying to decide in r0 whether to call r1, it doesn't
|
||
|
really matter whether p is false or true because, either way, there is
|
||
|
a valid choice within r1.
|
||
|
|
||
|
#154. (Changed in MR13) Making hoist suppression explicit using <<nohoist>>
|
||
|
|
||
|
A common error, even among experienced pccts users, is to code
|
||
|
an init-action to inhibit hoisting rather than a leading action.
|
||
|
An init-action does not inhibit hoisting.
|
||
|
|
||
|
This was coded:
|
||
|
|
||
|
rule1 : <<;>> rule2
|
||
|
|
||
|
This is what was meant:
|
||
|
|
||
|
rule1 : <<;>> <<;>> rule2
|
||
|
|
||
|
With MR13, the user can code:
|
||
|
|
||
|
rule1 : <<;>> <<nohoist>> rule2
|
||
|
|
||
|
The following will give an error message:
|
||
|
|
||
|
rule1 : <<nohoist>> rule2
|
||
|
|
||
|
If the <<nohoist>> appears as an init-action rather than a leading
|
||
|
action an error message is issued. The meaning of an init-action
|
||
|
containing "nohoist" is unclear: does it apply to just one
|
||
|
alternative or to all alternatives ?
|
||
|
|
||
|
#151a. Addition of ANTLRParser::getLexer(), ANTLRTokenStream::getLexer()
|
||
|
|
||
|
You must manually cast the ANTLRTokenStream to your program's
|
||
|
lexer class. Because the name of the lexer's class is not fixed.
|
||
|
Thus it is impossible to incorporate it into the DLGLexerBase
|
||
|
class.
|
||
|
|
||
|
#151b.(Changed in MR12) ParserBlackBox member getLexer()
|
||
|
|
||
|
#150. (Changed in MR12) syntaxErrCount and lexErrCount now public
|
||
|
|
||
|
#149. (Changed in MR12) antlr option -info o (letter o for orphan)
|
||
|
|
||
|
If there is more than one rule which is not referenced by any
|
||
|
other rule then all such rules are listed. This is useful for
|
||
|
alerting one to rules which are not used, but which can still
|
||
|
contribute to ambiguity.
|
||
|
|
||
|
#148. (Changed in MR11) #token names appearing in zztokens,token_tbl
|
||
|
|
||
|
One can write:
|
||
|
|
||
|
#token Plus ("+") "\+"
|
||
|
#token RP ("(") "\("
|
||
|
#token COM ("comment begin") "/\*"
|
||
|
|
||
|
The string in parenthesis will be used in syntax error messages.
|
||
|
|
||
|
#146. (Changed in MR11) Option -treport for locating "difficult" alts
|
||
|
|
||
|
It can be difficult to determine which alternatives are causing
|
||
|
pccts to work hard to resolve an ambiguity. In some cases the
|
||
|
ambiguity is successfully resolved after much CPU time so there
|
||
|
is no message at all.
|
||
|
|
||
|
A rough measure of the amount of work being peformed which is
|
||
|
independent of the CPU speed and system load is the number of
|
||
|
tnodes created. Using "-info t" gives information about the
|
||
|
total number of tnodes created and the peak number of tnodes.
|
||
|
|
||
|
Tree Nodes: peak 1300k created 1416k lost 0
|
||
|
|
||
|
It also puts in the generated C or C++ file the number of tnodes
|
||
|
created for a rule (at the end of the rule). However this
|
||
|
information is not sufficient to locate the alternatives within
|
||
|
a rule which are causing the creation of tnodes.
|
||
|
|
||
|
Using:
|
||
|
|
||
|
antlr -treport 100000 ....
|
||
|
|
||
|
causes antlr to list on stdout any alternatives which require the
|
||
|
creation of more than 100,000 tnodes, along with the lookahead sets
|
||
|
for those alternatives.
|
||
|
|
||
|
The following is a trivial case from the ansi.g grammar which shows
|
||
|
the format of the report. This report might be of more interest
|
||
|
in cases where 1,000,000 tuples were created to resolve the ambiguity.
|
||
|
|
||
|
-------------------------------------------------------------------------
|
||
|
There were 0 tuples whose ambiguity could not be resolved
|
||
|
by full lookahead
|
||
|
There were 157 tnodes created to resolve ambiguity between:
|
||
|
|
||
|
Choice 1: statement/2 line 475 file ansi.g
|
||
|
Choice 2: statement/3 line 476 file ansi.g
|
||
|
|
||
|
Intersection of lookahead[1] sets:
|
||
|
|
||
|
IDENTIFIER
|
||
|
|
||
|
Intersection of lookahead[2] sets:
|
||
|
|
||
|
LPARENTHESIS COLON AMPERSAND MINUS
|
||
|
STAR PLUSPLUS MINUSMINUS ONESCOMPLEMENT
|
||
|
NOT SIZEOF OCTALINT DECIMALINT
|
||
|
HEXADECIMALINT FLOATONE FLOATTWO IDENTIFIER
|
||
|
STRING CHARACTER
|
||
|
-------------------------------------------------------------------------
|
||
|
|
||
|
#143. (Changed in MR11) Optional ";" at end of #token statement
|
||
|
|
||
|
Fixes problem of:
|
||
|
|
||
|
#token X "x"
|
||
|
|
||
|
<<
|
||
|
parser action
|
||
|
>>
|
||
|
|
||
|
Being confused with:
|
||
|
|
||
|
#token X "x" <<lexical action>>
|
||
|
|
||
|
#142. (Changed in MR11) class BufFileInput subclass of DLGInputStream
|
||
|
|
||
|
Alexey Demakov (demakov@kazbek.ispras.ru) has supplied class
|
||
|
BufFileInput derived from DLGInputStream which provides a
|
||
|
function lookahead(char *string) to test characters in the
|
||
|
input stream more than one character ahead.
|
||
|
The class is located in pccts/h/BufFileInput.* of the kit.
|
||
|
|
||
|
#140. #pred to define predicates
|
||
|
|
||
|
+---------------------------------------------------+
|
||
|
| Note: Assume "-prc on" for this entire discussion |
|
||
|
+---------------------------------------------------+
|
||
|
|
||
|
A problem with predicates is that each one is regarded as
|
||
|
unique and capable of disambiguating cases where two
|
||
|
alternatives have identical lookahead. For example:
|
||
|
|
||
|
rule : <<pred(LATEXT(1))>>? A
|
||
|
| <<pred(LATEXT(1))>>? A
|
||
|
;
|
||
|
|
||
|
will not cause any error messages or warnings to be issued
|
||
|
by earlier versions of pccts. To compare the text of the
|
||
|
predicates is an incomplete solution.
|
||
|
|
||
|
In 1.33MR11 I am introducing the #pred statement in order to
|
||
|
solve some problems with predicates. The #pred statement allows
|
||
|
one to give a symbolic name to a "predicate literal" or a
|
||
|
"predicate expression" in order to refer to it in other predicate
|
||
|
expressions or in the rules of the grammar.
|
||
|
|
||
|
The predicate literal associated with a predicate symbol is C
|
||
|
or C++ code which can be used to test the condition. A
|
||
|
predicate expression defines a predicate symbol in terms of other
|
||
|
predicate symbols using "!", "&&", and "||". A predicate symbol
|
||
|
can be defined in terms of a predicate literal, a predicate
|
||
|
expression, or *both*.
|
||
|
|
||
|
When a predicate symbol is defined with both a predicate literal
|
||
|
and a predicate expression, the predicate literal is used to generate
|
||
|
code, but the predicate expression is used to check for two
|
||
|
alternatives with identical predicates in both alternatives.
|
||
|
|
||
|
Here are some examples of #pred statements:
|
||
|
|
||
|
#pred IsLabel <<isLabel(LATEXT(1))>>?
|
||
|
#pred IsLocalVar <<isLocalVar(LATEXT(1))>>?
|
||
|
#pred IsGlobalVar <<isGlobalVar(LATEXT(1)>>?
|
||
|
#pred IsVar <<isVar(LATEXT(1))>>? IsLocalVar || IsGlobalVar
|
||
|
#pred IsScoped <<isScoped(LATEXT(1))>>? IsLabel || IsLocalVar
|
||
|
|
||
|
I hope that the use of EBNF notation to describe the syntax of the
|
||
|
#pred statement will not cause problems for my readers (joke).
|
||
|
|
||
|
predStatement : "#pred"
|
||
|
CapitalizedName
|
||
|
(
|
||
|
"<<predicate_literal>>?"
|
||
|
| "<<predicate_literal>>?" predOrExpr
|
||
|
| predOrExpr
|
||
|
)
|
||
|
;
|
||
|
|
||
|
predOrExpr : predAndExpr ( "||" predAndExpr ) * ;
|
||
|
|
||
|
predAndExpr : predPrimary ( "&&" predPrimary ) * ;
|
||
|
|
||
|
predPrimary : CapitalizedName
|
||
|
| "!" predPrimary
|
||
|
| "(" predOrExpr ")"
|
||
|
;
|
||
|
|
||
|
What is the purpose of this nonsense ?
|
||
|
|
||
|
To understand how predicate symbols help, you need to realize that
|
||
|
predicate symbols are used in two different ways with two different
|
||
|
goals.
|
||
|
|
||
|
a. Allow simplification of predicates which have been combined
|
||
|
during predicate hoisting.
|
||
|
|
||
|
b. Allow recognition of identical predicates which can't disambiguate
|
||
|
alternatives with common lookahead.
|
||
|
|
||
|
First we will discuss goal (a). Consider the following rule:
|
||
|
|
||
|
rule0: rule1
|
||
|
| ID
|
||
|
| ...
|
||
|
;
|
||
|
|
||
|
rule1: rule2
|
||
|
| rule3
|
||
|
;
|
||
|
|
||
|
rule2: <<isX(LATEXT(1))>>? ID ;
|
||
|
rule3: <<!isX(LATEXT(1)>>? ID ;
|
||
|
|
||
|
When the predicates in rule2 and rule3 are combined by hoisting
|
||
|
to create a prediction expression for rule1 the result is:
|
||
|
|
||
|
if ( LA(1)==ID
|
||
|
&& ( isX(LATEXT(1) || !isX(LATEXT(1) ) ) { rule1(); ...
|
||
|
|
||
|
This is inefficient, but more importantly, can lead to false
|
||
|
assumptions that the predicate expression distinguishes the rule1
|
||
|
alternative with some other alternative with lookahead ID. In
|
||
|
MR11 one can write:
|
||
|
|
||
|
#pred IsX <<isX(LATEXT(1))>>?
|
||
|
|
||
|
...
|
||
|
|
||
|
rule2: <<IsX>>? ID ;
|
||
|
rule3: <<!IsX>>? ID ;
|
||
|
|
||
|
During hoisting MR11 recognizes this as a special case and
|
||
|
eliminates the predicates. The result is a prediction
|
||
|
expression like the following:
|
||
|
|
||
|
if ( LA(1)==ID ) { rule1(); ...
|
||
|
|
||
|
Please note that the following cases which appear to be equivalent
|
||
|
*cannot* be simplified by MR11 during hoisting because the hoisting
|
||
|
logic only checks for a "!" in the predicate action, not in the
|
||
|
predicate expression for a predicate symbol.
|
||
|
|
||
|
*Not* equivalent and is not simplified during hoisting:
|
||
|
|
||
|
#pred IsX <<isX(LATEXT(1))>>?
|
||
|
#pred NotX <<!isX(LATEXT(1))>>?
|
||
|
...
|
||
|
rule2: <<IsX>>? ID ;
|
||
|
rule3: <<NotX>>? ID ;
|
||
|
|
||
|
*Not* equivalent and is not simplified during hoisting:
|
||
|
|
||
|
#pred IsX <<isX(LATEXT(1))>>?
|
||
|
#pred NotX !IsX
|
||
|
...
|
||
|
rule2: <<IsX>>? ID ;
|
||
|
rule3: <<NotX>>? ID ;
|
||
|
|
||
|
Now we will discuss goal (b).
|
||
|
|
||
|
When antlr discovers that there is a lookahead ambiguity between
|
||
|
two alternatives it attempts to resolve the ambiguity by searching
|
||
|
for predicates in both alternatives. In the past any predicate
|
||
|
would do, even if the same one appeared in both alternatives:
|
||
|
|
||
|
rule: <<p(LATEXT(1))>>? X
|
||
|
| <<p(LATEXT(1))>>? X
|
||
|
;
|
||
|
|
||
|
The #pred statement is a start towards solving this problem.
|
||
|
During ambiguity resolution (*not* predicate hoisting) the
|
||
|
predicates for the two alternatives are expanded and compared.
|
||
|
Consider the following example:
|
||
|
|
||
|
#pred Upper <<isUpper(LATEXT(1))>>?
|
||
|
#pred Lower <<isLower(LATEXT(1))>>?
|
||
|
#pred Alpha <<isAlpha(LATEXT(1))>>? Upper || Lower
|
||
|
|
||
|
rule0: rule1
|
||
|
| <<Alpha>>? ID
|
||
|
;
|
||
|
|
||
|
rule1:
|
||
|
| rule2
|
||
|
| rule3
|
||
|
...
|
||
|
;
|
||
|
|
||
|
rule2: <<Upper>>? ID;
|
||
|
rule3: <<Lower>>? ID;
|
||
|
|
||
|
The definition of #pred Alpha expresses:
|
||
|
|
||
|
a. to test the predicate use the C code "isAlpha(LATEXT(1))"
|
||
|
|
||
|
b. to analyze the predicate use the information that
|
||
|
Alpha is equivalent to the union of Upper and Lower,
|
||
|
|
||
|
During ambiguity resolution the definition of Alpha is expanded
|
||
|
into "Upper || Lower" and compared with the predicate in the other
|
||
|
alternative, which is also "Upper || Lower". Because they are
|
||
|
identical MR11 will report a problem.
|
||
|
|
||
|
-------------------------------------------------------------------------
|
||
|
t10.g, line 5: warning: the predicates used to disambiguate rule rule0
|
||
|
(file t10.g alt 1 line 5 and alt 2 line 6)
|
||
|
are identical when compared without context and may have no
|
||
|
resolving power for some lookahead sequences.
|
||
|
-------------------------------------------------------------------------
|
||
|
|
||
|
If you use the "-info p" option the output file will contain:
|
||
|
|
||
|
+----------------------------------------------------------------------+
|
||
|
|#if 0 |
|
||
|
| |
|
||
|
|The following predicates are identical when compared without |
|
||
|
| lookahead context information. For some ambiguous lookahead |
|
||
|
| sequences they may not have any power to resolve the ambiguity. |
|
||
|
| |
|
||
|
|Choice 1: rule0/1 alt 1 line 5 file t10.g |
|
||
|
| |
|
||
|
| The original predicate for choice 1 with available context |
|
||
|
| information: |
|
||
|
| |
|
||
|
| OR expr |
|
||
|
| |
|
||
|
| pred << Upper>>? |
|
||
|
| depth=k=1 rule rule2 line 14 t10.g |
|
||
|
| set context: |
|
||
|
| ID |
|
||
|
| |
|
||
|
| pred << Lower>>? |
|
||
|
| depth=k=1 rule rule3 line 15 t10.g |
|
||
|
| set context: |
|
||
|
| ID |
|
||
|
| |
|
||
|
| The predicate for choice 1 after expansion (but without context |
|
||
|
| information): |
|
||
|
| |
|
||
|
| OR expr |
|
||
|
| |
|
||
|
| pred << isUpper(LATEXT(1))>>? |
|
||
|
| depth=k=1 rule line 1 t10.g |
|
||
|
| |
|
||
|
| pred << isLower(LATEXT(1))>>? |
|
||
|
| depth=k=1 rule line 2 t10.g |
|
||
|
| |
|
||
|
| |
|
||
|
|Choice 2: rule0/2 alt 2 line 6 file t10.g |
|
||
|
| |
|
||
|
| The original predicate for choice 2 with available context |
|
||
|
| information: |
|
||
|
| |
|
||
|
| pred << Alpha>>? |
|
||
|
| depth=k=1 rule rule0 line 6 t10.g |
|
||
|
| set context: |
|
||
|
| ID |
|
||
|
| |
|
||
|
| The predicate for choice 2 after expansion (but without context |
|
||
|
| information): |
|
||
|
| |
|
||
|
| OR expr |
|
||
|
| |
|
||
|
| pred << isUpper(LATEXT(1))>>? |
|
||
|
| depth=k=1 rule line 1 t10.g |
|
||
|
| |
|
||
|
| pred << isLower(LATEXT(1))>>? |
|
||
|
| depth=k=1 rule line 2 t10.g |
|
||
|
| |
|
||
|
| |
|
||
|
|#endif |
|
||
|
+----------------------------------------------------------------------+
|
||
|
|
||
|
The comparison of the predicates for the two alternatives takes
|
||
|
place without context information, which means that in some cases
|
||
|
the predicates will be considered identical even though they operate
|
||
|
on disjoint lookahead sets. Consider:
|
||
|
|
||
|
#pred Alpha
|
||
|
|
||
|
rule1: <<Alpha>>? ID
|
||
|
| <<Alpha>>? Label
|
||
|
;
|
||
|
|
||
|
Because the comparison of predicates takes place without context
|
||
|
these will be considered identical. The reason for comparing
|
||
|
without context is that otherwise it would be necessary to re-evaluate
|
||
|
the entire predicate expression for each possible lookahead sequence.
|
||
|
This would require more code to be written and more CPU time during
|
||
|
grammar analysis, and it is not yet clear whether anyone will even make
|
||
|
use of the new #pred facility.
|
||
|
|
||
|
A temporary workaround might be to use different #pred statements
|
||
|
for predicates you know have different context. This would avoid
|
||
|
extraneous warnings.
|
||
|
|
||
|
The above example might be termed a "false positive". Comparison
|
||
|
without context will also lead to "false negatives". Consider the
|
||
|
following example:
|
||
|
|
||
|
#pred Alpha
|
||
|
#pred Beta
|
||
|
|
||
|
rule1: <<Alpha>>? A
|
||
|
| rule2
|
||
|
;
|
||
|
|
||
|
rule2: <<Alpha>>? A
|
||
|
| <<Beta>>? B
|
||
|
;
|
||
|
|
||
|
The predicate used for alt 2 of rule1 is (Alpha || Beta). This
|
||
|
appears to be different than the predicate Alpha used for alt1.
|
||
|
However, the context of Beta is B. Thus when the lookahead is A
|
||
|
Beta will have no resolving power and Alpha will be used for both
|
||
|
alternatives. Using the same predicate for both alternatives isn't
|
||
|
very helpful, but this will not be detected with 1.33MR11.
|
||
|
|
||
|
To properly handle this the predicate expression would have to be
|
||
|
evaluated for each distinct lookahead context.
|
||
|
|
||
|
To determine whether two predicate expressions are identical is
|
||
|
difficult. The routine may fail to identify identical predicates.
|
||
|
|
||
|
The #pred feature also compares predicates to see if a choice between
|
||
|
alternatives which is resolved by a predicate which makes the second
|
||
|
choice unreachable. Consider the following example:
|
||
|
|
||
|
#pred A <<A(LATEXT(1)>>?
|
||
|
#pred B <<B(LATEXT(1)>>?
|
||
|
#pred A_or_B A || B
|
||
|
|
||
|
r : s
|
||
|
| t
|
||
|
;
|
||
|
s : <<A_or_B>>? ID
|
||
|
;
|
||
|
t : <<A>>? ID
|
||
|
;
|
||
|
|
||
|
----------------------------------------------------------------------------
|
||
|
t11.g, line 5: warning: the predicate used to disambiguate the
|
||
|
first choice of rule r
|
||
|
(file t11.g alt 1 line 5 and alt 2 line 6)
|
||
|
appears to "cover" the second predicate when compared without context.
|
||
|
The second predicate may have no resolving power for some lookahead
|
||
|
sequences.
|
||
|
----------------------------------------------------------------------------
|
||
|
|
||
|
#132. (Changed in 1.33MR11) Recognition of identical predicates in alts
|
||
|
|
||
|
Prior to 1.33MR11, there would be no ambiguity warning when the
|
||
|
very same predicate was used to disambiguate both alternatives:
|
||
|
|
||
|
test: ref B
|
||
|
| ref C
|
||
|
;
|
||
|
|
||
|
ref : <<pred(LATEXT(1)>>? A
|
||
|
|
||
|
In 1.33MR11 this will cause the warning:
|
||
|
|
||
|
warning: the predicates used to disambiguate rule test
|
||
|
(file v98.g alt 1 line 1 and alt 2 line 2)
|
||
|
are identical and have no resolving power
|
||
|
|
||
|
----------------- Note -----------------
|
||
|
|
||
|
This is different than the following case
|
||
|
|
||
|
test: <<pred(LATEXT(1))>>? A B
|
||
|
| <<pred(LATEXT(1)>>? A C
|
||
|
;
|
||
|
|
||
|
In this case there are two distinct predicates
|
||
|
which have exactly the same text. In the first
|
||
|
example there are two references to the same
|
||
|
predicate. The problem represented by this
|
||
|
grammar will be addressed later.
|
||
|
|
||
|
|
||
|
#127. (Changed in 1.33MR11)
|
||
|
|
||
|
Count Syntax Errors Count DLG Errors
|
||
|
------------------- ----------------
|
||
|
|
||
|
C++ mode ANTLRParser:: DLGLexerBase::
|
||
|
syntaxErrCount lexErrCount
|
||
|
C mode zzSyntaxErrCount zzLexErrCount
|
||
|
|
||
|
The C mode variables are global and initialized to 0.
|
||
|
They are *not* reset to 0 automatically when antlr is
|
||
|
restarted.
|
||
|
|
||
|
The C++ mode variables are public. They are initialized
|
||
|
to 0 by the constructors. They are *not* reset to 0 by the
|
||
|
ANTLRParser::init() method.
|
||
|
|
||
|
Suggested by Reinier van den Born (reinier@vnet.ibm.com).
|
||
|
|
||
|
#126. (Changed in 1.33MR11) Addition of #first <<...>>
|
||
|
|
||
|
The #first <<...>> inserts the specified text in the output
|
||
|
files before any other #include statements required by pccts.
|
||
|
The only things before the #first text are comments and
|
||
|
a #define ANTLR_VERSION.
|
||
|
|
||
|
Requested by and Esa Pulkkinen (esap@cs.tut.fi) and Alexin
|
||
|
Zoltan (alexin@inf.u-szeged.hu).
|
||
|
|
||
|
#124. A Note on the New "&&" Style Guarded Predicates
|
||
|
|
||
|
I've been asked several times, "What is the difference between
|
||
|
the old "=>" style guard predicates and the new style "&&" guard
|
||
|
predicates, and how do you choose one over the other" ?
|
||
|
|
||
|
The main difference is that the "=>" does not apply the
|
||
|
predicate if the context guard doesn't match, whereas
|
||
|
the && form always does. What is the significance ?
|
||
|
|
||
|
If you have a predicate which is not on the "leading edge"
|
||
|
it is cannot be hoisted. Suppose you need a predicate that
|
||
|
looks at LA(2). You must introduce it manually. The
|
||
|
classic example is:
|
||
|
|
||
|
castExpr :
|
||
|
LP typeName RP
|
||
|
| ....
|
||
|
;
|
||
|
|
||
|
typeName : <<isTypeName(LATEXT(1))>>? ID
|
||
|
| STRUCT ID
|
||
|
;
|
||
|
|
||
|
The problem is that isTypeName() isn't on the leading edge
|
||
|
of typeName, so it won't be hoisted into castExpr to help
|
||
|
make a decision on which production to choose.
|
||
|
|
||
|
The *first* attempt to fix it is this:
|
||
|
|
||
|
castExpr :
|
||
|
<<isTypeName(LATEXT(2))>>?
|
||
|
LP typeName RP
|
||
|
| ....
|
||
|
;
|
||
|
|
||
|
Unfortunately, this won't work because it ignores
|
||
|
the problem of STRUCT. The solution is to apply
|
||
|
isTypeName() in castExpr if LA(2) is an ID and
|
||
|
don't apply it when LA(2) is STRUCT:
|
||
|
|
||
|
castExpr :
|
||
|
(LP ID)? => <<isTypeName(LATEXT(2))>>?
|
||
|
LP typeName RP
|
||
|
| ....
|
||
|
;
|
||
|
|
||
|
In conclusion, the "=>" style guarded predicate is
|
||
|
useful when:
|
||
|
|
||
|
a. the tokens required for the predicate
|
||
|
are not on the leading edge
|
||
|
b. there are alternatives in the expression
|
||
|
selected by the predicate for which the
|
||
|
predicate is inappropriate
|
||
|
|
||
|
If (b) were false, then one could use a simple
|
||
|
predicate (assuming "-prc on"):
|
||
|
|
||
|
castExpr :
|
||
|
<<isTypeName(LATEXT(2))>>?
|
||
|
LP typeName RP
|
||
|
| ....
|
||
|
;
|
||
|
|
||
|
typeName : <<isTypeName(LATEXT(1))>>? ID
|
||
|
;
|
||
|
|
||
|
So, when do you use the "&&" style guarded predicate ?
|
||
|
|
||
|
The new-style "&&" predicate should always be used with
|
||
|
predicate context. The context guard is in ADDITION to
|
||
|
the automatically computed context. Thus it useful for
|
||
|
predicates which depend on the token type for reasons
|
||
|
other than context.
|
||
|
|
||
|
The following example is contributed by Reinier van den Born
|
||
|
(reinier@vnet.ibm.com).
|
||
|
|
||
|
+-------------------------------------------------------------------------+
|
||
|
| This grammar has two ways to call functions: |
|
||
|
| |
|
||
|
| - a "standard" call syntax with parens and comma separated args |
|
||
|
| - a shell command like syntax (no parens and spacing separated args) |
|
||
|
| |
|
||
|
| The former also allows a variable to hold the name of the function, |
|
||
|
| the latter can also be used to call external commands. |
|
||
|
| |
|
||
|
| The grammar (simplified) looks like this: |
|
||
|
| |
|
||
|
| fun_call : ID "(" { expr ("," expr)* } ")" |
|
||
|
| /* ID is function name */ |
|
||
|
| | "@" ID "(" { expr ("," expr)* } ")" |
|
||
|
| /* ID is var containing fun name */ |
|
||
|
| ; |
|
||
|
| |
|
||
|
| command : ID expr* /* ID is function name */ |
|
||
|
| | path expr* /* path is external command name */ |
|
||
|
| ; |
|
||
|
| |
|
||
|
| path : ID /* left out slashes and such */ |
|
||
|
| | "@" ID /* ID is environment var */ |
|
||
|
| ; |
|
||
|
| |
|
||
|
| expr : .... |
|
||
|
| | "(" expr ")"; |
|
||
|
| |
|
||
|
| call : fun_call |
|
||
|
| | command |
|
||
|
| ; |
|
||
|
| |
|
||
|
| Obviously the call is wildly ambiguous. This is more or less how this |
|
||
|
| is to be resolved: |
|
||
|
| |
|
||
|
| A call begins with an ID or an @ followed by an ID. |
|
||
|
| |
|
||
|
| If it is an ID and if it is an ext. command name -> command |
|
||
|
| if followed by a paren -> fun_call |
|
||
|
| otherwise -> command |
|
||
|
| |
|
||
|
| If it is an @ and if the ID is a var name -> fun_call |
|
||
|
| otherwise -> command |
|
||
|
| |
|
||
|
| One can implement these rules quite neatly using && predicates: |
|
||
|
| |
|
||
|
| call : ("@" ID)? && <<isVarName(LT(2))>>? fun_call |
|
||
|
| | (ID)? && <<isExtCmdName>>? command |
|
||
|
| | (ID "(")? fun_call |
|
||
|
| | command |
|
||
|
| ; |
|
||
|
| |
|
||
|
| This can be done better, so it is not an ideal example, but it |
|
||
|
| conveys the principle. |
|
||
|
+-------------------------------------------------------------------------+
|
||
|
|
||
|
#122. (Changed in 1.33MR11) Member functions to reset DLG in C++ mode
|
||
|
|
||
|
void DLGFileReset(FILE *f) { input = f; found_eof = 0; }
|
||
|
void DLGStringReset(DLGChar *s) { input = s; p = &input[0]; }
|
||
|
|
||
|
Supplied by R.A. Nelson (cowboy@VNET.IBM.COM)
|
||
|
|
||
|
#119. (Changed in 1.33MR11) Ambiguity aid for grammars
|
||
|
|
||
|
The user can ask for additional information on ambiguities reported
|
||
|
by antlr to stdout. At the moment, only one ambiguity report can
|
||
|
be created in an antlr run.
|
||
|
|
||
|
This feature is enabled using the "-aa" (Ambiguity Aid) option.
|
||
|
|
||
|
The following options control the reporting of ambiguities:
|
||
|
|
||
|
-aa ruleName Selects reporting by name of rule
|
||
|
-aa lineNumber Selects reporting by line number
|
||
|
(file name not compared)
|
||
|
|
||
|
-aam Selects "multiple" reporting for a token
|
||
|
in the intersection set of the
|
||
|
alternatives.
|
||
|
|
||
|
For instance, the token ID may appear dozens
|
||
|
of times in various paths as the program
|
||
|
explores the rules which are reachable from
|
||
|
the point of an ambiguity. With option -aam
|
||
|
every possible path the search program
|
||
|
encounters is reported.
|
||
|
|
||
|
Without -aam only the first encounter is
|
||
|
reported. This may result in incomplete
|
||
|
information, but the information may be
|
||
|
sufficient and much shorter.
|
||
|
|
||
|
-aad depth Selects the depth of the search.
|
||
|
The default value is 1.
|
||
|
|
||
|
The number of paths to be searched, and the
|
||
|
size of the report can grow geometrically
|
||
|
with the -ck value if a full search for all
|
||
|
contributions to the source of the ambiguity
|
||
|
is explored.
|
||
|
|
||
|
The depth represents the number of tokens
|
||
|
in the lookahead set which are matched against
|
||
|
the set of ambiguous tokens. A depth of 1
|
||
|
means that the search stops when a lookahead
|
||
|
sequence of just one token is matched.
|
||
|
|
||
|
A k=1 ck=6 grammar might generate 5,000 items
|
||
|
in a report if a full depth 6 search is made
|
||
|
with the Ambiguity Aid. The source of the
|
||
|
problem may be in the first token and obscured
|
||
|
by the volume of data - I hesitate to call
|
||
|
it information.
|
||
|
|
||
|
When the user selects a depth > 1, the search
|
||
|
is first performed at depth=1 for both
|
||
|
alternatives, then depth=2 for both alternatives,
|
||
|
etc.
|
||
|
|
||
|
Sample output for rule grammar in antlr.g itself:
|
||
|
|
||
|
+---------------------------------------------------------------------+
|
||
|
| Ambiguity Aid |
|
||
|
| |
|
||
|
| Choice 1: grammar/70 line 632 file a.g |
|
||
|
| Choice 2: grammar/82 line 644 file a.g |
|
||
|
| |
|
||
|
| Intersection of lookahead[1] sets: |
|
||
|
| |
|
||
|
| "\}" "class" "#errclass" "#tokclass" |
|
||
|
| |
|
||
|
| Choice:1 Depth:1 Group:1 ("#errclass") |
|
||
|
| 1 in (...)* block grammar/70 line 632 a.g |
|
||
|
| 2 to error grammar/73 line 635 a.g |
|
||
|
| 3 error error/1 line 894 a.g |
|
||
|
| 4 #token "#errclass" error/2 line 895 a.g |
|
||
|
| |
|
||
|
| Choice:1 Depth:1 Group:2 ("#tokclass") |
|
||
|
| 2 to tclass grammar/74 line 636 a.g |
|
||
|
| 3 tclass tclass/1 line 937 a.g |
|
||
|
| 4 #token "#tokclass" tclass/2 line 938 a.g |
|
||
|
| |
|
||
|
| Choice:1 Depth:1 Group:3 ("class") |
|
||
|
| 2 to class_def grammar/75 line 637 a.g |
|
||
|
| 3 class_def class_def/1 line 669 a.g |
|
||
|
| 4 #token "class" class_def/3 line 671 a.g |
|
||
|
| |
|
||
|
| Choice:1 Depth:1 Group:4 ("\}") |
|
||
|
| 2 #token "\}" grammar/76 line 638 a.g |
|
||
|
| |
|
||
|
| Choice:2 Depth:1 Group:5 ("#errclass") |
|
||
|
| 1 in (...)* block grammar/83 line 645 a.g |
|
||
|
| 2 to error grammar/93 line 655 a.g |
|
||
|
| 3 error error/1 line 894 a.g |
|
||
|
| 4 #token "#errclass" error/2 line 895 a.g |
|
||
|
| |
|
||
|
| Choice:2 Depth:1 Group:6 ("#tokclass") |
|
||
|
| 2 to tclass grammar/94 line 656 a.g |
|
||
|
| 3 tclass tclass/1 line 937 a.g |
|
||
|
| 4 #token "#tokclass" tclass/2 line 938 a.g |
|
||
|
| |
|
||
|
| Choice:2 Depth:1 Group:7 ("class") |
|
||
|
| 2 to class_def grammar/95 line 657 a.g |
|
||
|
| 3 class_def class_def/1 line 669 a.g |
|
||
|
| 4 #token "class" class_def/3 line 671 a.g |
|
||
|
| |
|
||
|
| Choice:2 Depth:1 Group:8 ("\}") |
|
||
|
| 2 #token "\}" grammar/96 line 658 a.g |
|
||
|
+---------------------------------------------------------------------+
|
||
|
|
||
|
For a linear lookahead set ambiguity (where k=1 or for k>1 but
|
||
|
when all lookahead sets [i] with i<k all have degree one) the
|
||
|
reports appear in the following order:
|
||
|
|
||
|
for (depth=1 ; depth <= "-aad depth" ; depth++) {
|
||
|
for (alternative=1; alternative <=2 ; alternative++) {
|
||
|
while (matches-are-found) {
|
||
|
group++;
|
||
|
print-report
|
||
|
};
|
||
|
};
|
||
|
};
|
||
|
|
||
|
For reporting a k-tuple ambiguity, the reports appear in the
|
||
|
following order:
|
||
|
|
||
|
for (depth=1 ; depth <= "-aad depth" ; depth++) {
|
||
|
while (matches-are-found) {
|
||
|
for (alternative=1; alternative <=2 ; alternative++) {
|
||
|
group++;
|
||
|
print-report
|
||
|
};
|
||
|
};
|
||
|
};
|
||
|
|
||
|
This is because matches are generated in different ways for
|
||
|
linear lookahead and k-tuples.
|
||
|
|
||
|
#117. (Changed in 1.33MR10) new EXPERIMENTAL predicate hoisting code
|
||
|
|
||
|
The hoisting of predicates into rules to create prediction
|
||
|
expressions is a problem in antlr. Consider the following
|
||
|
example (k=1 with -prc on):
|
||
|
|
||
|
start : (a)* "@" ;
|
||
|
a : b | c ;
|
||
|
b : <<isUpper(LATEXT(1))>>? A ;
|
||
|
c : A ;
|
||
|
|
||
|
Prior to 1.33MR10 the code generated for "start" would resemble:
|
||
|
|
||
|
while {
|
||
|
if (LA(1)==A &&
|
||
|
(!LA(1)==A || isUpper())) {
|
||
|
a();
|
||
|
}
|
||
|
};
|
||
|
|
||
|
This code is wrong because it makes rule "c" unreachable from
|
||
|
"start". The essence of the problem is that antlr fails to
|
||
|
recognize that there can be a valid alternative within "a" even
|
||
|
when the predicate <<isUpper(LATEXT(1))>>? is false.
|
||
|
|
||
|
In 1.33MR10 with -mrhoist the hoisting of the predicate into
|
||
|
"start" is suppressed because it recognizes that "c" can
|
||
|
cover all the cases where the predicate is false:
|
||
|
|
||
|
while {
|
||
|
if (LA(1)==A) {
|
||
|
a();
|
||
|
}
|
||
|
};
|
||
|
|
||
|
With the antlr "-info p" switch the user will receive information
|
||
|
about the predicate suppression in the generated file:
|
||
|
|
||
|
--------------------------------------------------------------
|
||
|
#if 0
|
||
|
|
||
|
Hoisting of predicate suppressed by alternative without predicate.
|
||
|
The alt without the predicate includes all cases where
|
||
|
the predicate is false.
|
||
|
|
||
|
WITH predicate: line 7 v1.g
|
||
|
WITHOUT predicate: line 7 v1.g
|
||
|
|
||
|
The context set for the predicate:
|
||
|
|
||
|
A
|
||
|
|
||
|
The lookahead set for the alt WITHOUT the semantic predicate:
|
||
|
|
||
|
A
|
||
|
|
||
|
The predicate:
|
||
|
|
||
|
pred << isUpper(LATEXT(1))>>?
|
||
|
depth=k=1 rule b line 9 v1.g
|
||
|
set context:
|
||
|
A
|
||
|
tree context: null
|
||
|
|
||
|
Chain of referenced rules:
|
||
|
|
||
|
#0 in rule start (line 5 v1.g) to rule a
|
||
|
#1 in rule a (line 7 v1.g)
|
||
|
|
||
|
#endif
|
||
|
--------------------------------------------------------------
|
||
|
|
||
|
A predicate can be suppressed by a combination of alternatives
|
||
|
which, taken together, cover a predicate:
|
||
|
|
||
|
start : (a)* "@" ;
|
||
|
|
||
|
a : b | ca | cb | cc ;
|
||
|
|
||
|
b : <<isUpper(LATEXT(1))>>? ( A | B | C ) ;
|
||
|
|
||
|
ca : A ;
|
||
|
cb : B ;
|
||
|
cc : C ;
|
||
|
|
||
|
Consider a more complex example in which "c" covers only part of
|
||
|
a predicate:
|
||
|
|
||
|
start : (a)* "@" ;
|
||
|
|
||
|
a : b
|
||
|
| c
|
||
|
;
|
||
|
|
||
|
b : <<isUpper(LATEXT(1))>>?
|
||
|
( A
|
||
|
| X
|
||
|
);
|
||
|
|
||
|
c : A
|
||
|
;
|
||
|
|
||
|
Prior to 1.33MR10 the code generated for "start" would resemble:
|
||
|
|
||
|
while {
|
||
|
if ( (LA(1)==A || LA(1)==X) &&
|
||
|
(! (LA(1)==A || LA(1)==X) || isUpper()) {
|
||
|
a();
|
||
|
}
|
||
|
};
|
||
|
|
||
|
With 1.33MR10 and -mrhoist the predicate context is restricted to
|
||
|
the non-covered lookahead. The code resembles:
|
||
|
|
||
|
while {
|
||
|
if ( (LA(1)==A || LA(1)==X) &&
|
||
|
(! (LA(1)==X) || isUpper()) {
|
||
|
a();
|
||
|
}
|
||
|
};
|
||
|
|
||
|
With the antlr "-info p" switch the user will receive information
|
||
|
about the predicate restriction in the generated file:
|
||
|
|
||
|
--------------------------------------------------------------
|
||
|
#if 0
|
||
|
|
||
|
Restricting the context of a predicate because of overlap
|
||
|
in the lookahead set between the alternative with the
|
||
|
semantic predicate and one without
|
||
|
Without this restriction the alternative without the predicate
|
||
|
could not be reached when input matched the context of the
|
||
|
predicate and the predicate was false.
|
||
|
|
||
|
WITH predicate: line 11 v4.g
|
||
|
WITHOUT predicate: line 12 v4.g
|
||
|
|
||
|
The original context set for the predicate:
|
||
|
|
||
|
A X
|
||
|
|
||
|
The lookahead set for the alt WITHOUT the semantic predicate:
|
||
|
|
||
|
A
|
||
|
|
||
|
The intersection of the two sets
|
||
|
|
||
|
A
|
||
|
|
||
|
The original predicate:
|
||
|
|
||
|
pred << isUpper(LATEXT(1))>>?
|
||
|
depth=k=1 rule b line 15 v4.g
|
||
|
set context:
|
||
|
A X
|
||
|
tree context: null
|
||
|
|
||
|
The new (modified) form of the predicate:
|
||
|
|
||
|
pred << isUpper(LATEXT(1))>>?
|
||
|
depth=k=1 rule b line 15 v4.g
|
||
|
set context:
|
||
|
X
|
||
|
tree context: null
|
||
|
|
||
|
#endif
|
||
|
--------------------------------------------------------------
|
||
|
|
||
|
The bad news about -mrhoist:
|
||
|
|
||
|
(a) -mrhoist does not analyze predicates with lookahead
|
||
|
depth > 1.
|
||
|
|
||
|
(b) -mrhoist does not look past a guarded predicate to
|
||
|
find context which might cover other predicates.
|
||
|
|
||
|
For these cases you might want to use syntactic predicates.
|
||
|
When a semantic predicate fails during guess mode the guess
|
||
|
fails and the next alternative is tried.
|
||
|
|
||
|
Limitation (a) is illustrated by the following example:
|
||
|
|
||
|
start : (stmt)* EOF ;
|
||
|
|
||
|
stmt : cast
|
||
|
| expr
|
||
|
;
|
||
|
cast : <<isTypename(LATEXT(2))>>? LP ID RP ;
|
||
|
|
||
|
expr : LP ID RP ;
|
||
|
|
||
|
This is not much different from the first example, except that
|
||
|
it requires two tokens of lookahead context to determine what
|
||
|
to do. This predicate is NOT suppressed because the current version
|
||
|
is unable to handle predicates with depth > 1.
|
||
|
|
||
|
A predicate can be combined with other predicates during hoisting.
|
||
|
In those cases the depth=1 predicates are still handled. Thus,
|
||
|
in the following example the isUpper() predicate will be suppressed
|
||
|
by line #4 when hoisted from "bizarre" into "start", but will still
|
||
|
be present in "bizarre" in order to predict "stmt".
|
||
|
|
||
|
start : (bizarre)* EOF ; // #1
|
||
|
// #2
|
||
|
bizarre : stmt // #3
|
||
|
| A // #4
|
||
|
;
|
||
|
|
||
|
stmt : cast
|
||
|
| expr
|
||
|
;
|
||
|
|
||
|
cast : <<isTypename(LATEXT(2))>>? LP ID RP ;
|
||
|
|
||
|
expr : LP ID RP ;
|
||
|
| <<isUpper(LATEXT(1))>>? A
|
||
|
|
||
|
Limitation (b) is illustrated by the following example of a
|
||
|
context guarded predicate:
|
||
|
|
||
|
rule : (A)? <<p>>? // #1
|
||
|
(A // #2
|
||
|
|B // #3
|
||
|
) // #4
|
||
|
| <<q>> B // #5
|
||
|
;
|
||
|
|
||
|
Recall that this means that when the lookahead is NOT A then
|
||
|
the predicate "p" is ignored and it attempts to match "A|B".
|
||
|
Ideally, the "B" at line #3 should suppress predicate "q".
|
||
|
However, the current version does not attempt to look past
|
||
|
the guard predicate to find context which might suppress other
|
||
|
predicates.
|
||
|
|
||
|
In some cases -mrhoist will lead to the reporting of ambiguities
|
||
|
which were not visible before:
|
||
|
|
||
|
start : (a)* "@";
|
||
|
a : bc | d;
|
||
|
bc : b | c ;
|
||
|
|
||
|
b : <<isUpper(LATEXT(1))>>? A;
|
||
|
c : A ;
|
||
|
|
||
|
d : A ;
|
||
|
|
||
|
In this case there is a true ambiguity in "a" between "bc" and "d"
|
||
|
which can both match "A". Without -mrhoist the predicate in "b"
|
||
|
is hoisted into "a" and there is no ambiguity reported. However,
|
||
|
with -mrhoist, the predicate in "b" is suppressed by "c" (as it
|
||
|
should be) making the ambiguity in "a" apparent.
|
||
|
|
||
|
The motivations for these changes were hoisting problems reported
|
||
|
by Reinier van den Born (reinier@vnet.ibm.com) and several others.
|
||
|
|
||
|
#113. (Changed in 1.33MR10) new context guarded pred: (g)? && <<p>>? expr
|
||
|
|
||
|
The existing context guarded predicate:
|
||
|
|
||
|
rule : (guard)? => <<p>>? expr
|
||
|
| next_alternative
|
||
|
;
|
||
|
|
||
|
generates code which resembles:
|
||
|
|
||
|
if (lookahead(expr) && (!guard || pred)) {
|
||
|
expr()
|
||
|
} else ....
|
||
|
|
||
|
This is not suitable for some applications because it allows
|
||
|
expr() to be invoked when the predicate is false. This is
|
||
|
intentional because it is meant to mimic automatically computed
|
||
|
predicate context.
|
||
|
|
||
|
The new context guarded predicate uses the guard information
|
||
|
differently because it has a different goal. Consider:
|
||
|
|
||
|
rule : (guard)? && <<p>>? expr
|
||
|
| next_alternative
|
||
|
;
|
||
|
|
||
|
The new style of context guarded predicate is equivalent to:
|
||
|
|
||
|
rule : <<guard==true && pred>>? expr
|
||
|
| next_alternative
|
||
|
;
|
||
|
|
||
|
It generates code which resembles:
|
||
|
|
||
|
if (lookahead(expr) && guard && pred) {
|
||
|
expr();
|
||
|
} else ...
|
||
|
|
||
|
Both forms of guarded predicates severely restrict the form of
|
||
|
the context guard: it can contain no rule references, no
|
||
|
(...)*, no (...)+, and no {...}. It may contain token and
|
||
|
token class references, and alternation ("|").
|
||
|
|
||
|
Addition for 1.33MR11: in the token expression all tokens must
|
||
|
be at the same height of the token tree:
|
||
|
|
||
|
(A ( B | C))? && ... is ok (all height 2)
|
||
|
(A ( B | ))? && ... is not ok (some 1, some 2)
|
||
|
(A B C D | E F G H)? && ... is ok (all height 4)
|
||
|
(A B C D | E )? && ... is not ok (some 4, some 1)
|
||
|
|
||
|
This restriction is required in order to properly compute the lookahead
|
||
|
set for expressions like:
|
||
|
|
||
|
rule1 : (A B C)? && <<pred>>? rule2 ;
|
||
|
rule2 : (A|X) (B|Y) (C|Z);
|
||
|
|
||
|
This addition was suggested by Rienier van den Born (reinier@vnet.ibm.com)
|
||
|
|
||
|
#109. (Changed in 1.33MR10) improved trace information
|
||
|
|
||
|
The quality of the trace information provided by the "-gd"
|
||
|
switch has been improved significantly. Here is an example
|
||
|
of the output from a test program. It shows the rule name,
|
||
|
the first token of lookahead, the call depth, and the guess
|
||
|
status:
|
||
|
|
||
|
exit rule gusxx {"?"} depth 2
|
||
|
enter rule gusxx {"?"} depth 2
|
||
|
enter rule gus1 {"o"} depth 3 guessing
|
||
|
guess done - returning to rule gus1 {"o"} at depth 3
|
||
|
(guess mode continues - an enclosing guess is still active)
|
||
|
guess done - returning to rule gus1 {"Z"} at depth 3
|
||
|
(guess mode continues - an enclosing guess is still active)
|
||
|
exit rule gus1 {"Z"} depth 3 guessing
|
||
|
guess done - returning to rule gusxx {"o"} at depth 2 (guess mode ends)
|
||
|
enter rule gus1 {"o"} depth 3
|
||
|
guess done - returning to rule gus1 {"o"} at depth 3 (guess mode ends)
|
||
|
guess done - returning to rule gus1 {"Z"} at depth 3 (guess mode ends)
|
||
|
exit rule gus1 {"Z"} depth 3
|
||
|
line 1: syntax error at "Z" missing SC
|
||
|
...
|
||
|
|
||
|
Rule trace reporting is controlled by the value of the integer
|
||
|
[zz]traceOptionValue: when it is positive tracing is enabled,
|
||
|
otherwise it is disabled. Tracing during guess mode is controlled
|
||
|
by the value of the integer [zz]traceGuessOptionValue. When
|
||
|
it is positive AND [zz]traceOptionValue is positive rule trace
|
||
|
is reported in guess mode.
|
||
|
|
||
|
The values of [zz]traceOptionValue and [zz]traceGuessOptionValue
|
||
|
can be adjusted by subroutine calls listed below.
|
||
|
|
||
|
Depending on the presence or absence of the antlr -gd switch
|
||
|
the variable [zz]traceOptionValueDefault is set to 0 or 1. When
|
||
|
the parser is initialized or [zz]traceReset() is called the
|
||
|
value of [zz]traceOptionValueDefault is copied to [zz]traceOptionValue.
|
||
|
The value of [zz]traceGuessOptionValue is always initialzed to 1,
|
||
|
but, as noted earlier, nothing will be reported unless
|
||
|
[zz]traceOptionValue is also positive.
|
||
|
|
||
|
When the parser state is saved/restored the value of the trace
|
||
|
variables are also saved/restored. If a restore causes a change in
|
||
|
reporting behavior from on to off or vice versa this will be reported.
|
||
|
|
||
|
When the -gd option is selected, the macro "#define zzTRACE_RULES"
|
||
|
is added to appropriate output files.
|
||
|
|
||
|
C++ mode
|
||
|
--------
|
||
|
int traceOption(int delta)
|
||
|
int traceGuessOption(int delta)
|
||
|
void traceReset()
|
||
|
int traceOptionValueDefault
|
||
|
|
||
|
C mode
|
||
|
--------
|
||
|
int zzTraceOption(int delta)
|
||
|
int zzTraceGuessOption(int delta)
|
||
|
void zzTraceReset()
|
||
|
int zzTraceOptionValueDefault
|
||
|
|
||
|
The argument "delta" is added to the traceOptionValue. To
|
||
|
turn on trace when inside a particular rule one:
|
||
|
|
||
|
rule : <<traceOption(+1);>>
|
||
|
(
|
||
|
rest-of-rule
|
||
|
)
|
||
|
<<traceOption(-1);>>
|
||
|
; /* fail clause */ <<traceOption(-1);>>
|
||
|
|
||
|
One can use the same idea to turn *off* tracing within a
|
||
|
rule by using a delta of (-1).
|
||
|
|
||
|
An improvement in the rule trace was suggested by Sramji
|
||
|
Ramanathan (ps@kumaran.com).
|
||
|
|
||
|
#108. A Note on Deallocation of Variables Allocated in Guess Mode
|
||
|
|
||
|
NOTE
|
||
|
------------------------------------------------------
|
||
|
This mechanism only works for heap allocated variables
|
||
|
------------------------------------------------------
|
||
|
|
||
|
The rewrite of the trace provides the machinery necessary
|
||
|
to properly free variables or undo actions following a
|
||
|
failed guess.
|
||
|
|
||
|
The macro zzUSER_GUESS_HOOK(guessSeq,zzrv) is expanded
|
||
|
as part of the zzGUESS macro. When a guess is opened
|
||
|
the value of zzrv is 0. When a longjmp() is executed to
|
||
|
undo the guess, the value of zzrv will be 1.
|
||
|
|
||
|
The macro zzUSER_GUESS_DONE_HOOK(guessSeq) is expanded
|
||
|
as part of the zzGUESS_DONE macro. This is executed
|
||
|
whether the guess succeeds or fails as part of closing
|
||
|
the guess.
|
||
|
|
||
|
The guessSeq is a sequence number which is assigned to each
|
||
|
guess and is incremented by 1 for each guess which becomes
|
||
|
active. It is needed by the user to associate the start of
|
||
|
a guess with the failure and/or completion (closing) of a
|
||
|
guess.
|
||
|
|
||
|
Guesses are nested. They must be closed in the reverse
|
||
|
of the order that they are opened.
|
||
|
|
||
|
In order to free memory used by a variable during a guess
|
||
|
a user must write a routine which can be called to
|
||
|
register the variable along with the current guess sequence
|
||
|
number provided by the zzUSER_GUESS_HOOK macro. If the guess
|
||
|
fails, all variables tagged with the corresponding guess
|
||
|
sequence number should be released. This is ugly, but
|
||
|
it would require a major rewrite of antlr 1.33 to use
|
||
|
some mechanism other than setjmp()/longjmp().
|
||
|
|
||
|
The order of calls for a *successful* guess would be:
|
||
|
|
||
|
zzUSER_GUESS_HOOK(guessSeq,0);
|
||
|
zzUSER_GUESS_DONE_HOOK(guessSeq);
|
||
|
|
||
|
The order of calls for a *failed* guess would be:
|
||
|
|
||
|
zzUSER_GUESS_HOOK(guessSeq,0);
|
||
|
zzUSER_GUESS_HOOK(guessSeq,1);
|
||
|
zzUSER_GUESS_DONE_HOOK(guessSeq);
|
||
|
|
||
|
The default definitions of these macros are empty strings.
|
||
|
|
||
|
Here is an example in C++ mode. The zzUSER_GUESS_HOOK and
|
||
|
zzUSER_GUESS_DONE_HOOK macros and myGuessHook() routine
|
||
|
can be used without change in both C and C++ versions.
|
||
|
|
||
|
----------------------------------------------------------------------
|
||
|
<<
|
||
|
|
||
|
#include "AToken.h"
|
||
|
|
||
|
typedef ANTLRCommonToken ANTLRToken;
|
||
|
|
||
|
#include "DLGLexer.h"
|
||
|
|
||
|
int main() {
|
||
|
|
||
|
{
|
||
|
DLGFileInput in(stdin);
|
||
|
DLGLexer lexer(&in,2000);
|
||
|
ANTLRTokenBuffer pipe(&lexer,1);
|
||
|
ANTLRCommonToken aToken;
|
||
|
P parser(&pipe);
|
||
|
|
||
|
lexer.setToken(&aToken);
|
||
|
parser.init();
|
||
|
parser.start();
|
||
|
};
|
||
|
|
||
|
fclose(stdin);
|
||
|
fclose(stdout);
|
||
|
return 0;
|
||
|
}
|
||
|
|
||
|
>>
|
||
|
|
||
|
<<
|
||
|
char *s=NULL;
|
||
|
|
||
|
#undef zzUSER_GUESS_HOOK
|
||
|
#define zzUSER_GUESS_HOOK(guessSeq,zzrv) myGuessHook(guessSeq,zzrv);
|
||
|
#undef zzUSER_GUESS_DONE_HOOK
|
||
|
#define zzUSER_GUESS_DONE_HOOK(guessSeq) myGuessHook(guessSeq,2);
|
||
|
|
||
|
void myGuessHook(int guessSeq,int zzrv) {
|
||
|
if (zzrv == 0) {
|
||
|
fprintf(stderr,"User hook: starting guess #%d\n",guessSeq);
|
||
|
} else if (zzrv == 1) {
|
||
|
free (s);
|
||
|
s=NULL;
|
||
|
fprintf(stderr,"User hook: failed guess #%d\n",guessSeq);
|
||
|
} else if (zzrv == 2) {
|
||
|
free (s);
|
||
|
s=NULL;
|
||
|
fprintf(stderr,"User hook: ending guess #%d\n",guessSeq);
|
||
|
};
|
||
|
}
|
||
|
|
||
|
>>
|
||
|
|
||
|
#token A "a"
|
||
|
#token "[\t \ \n]" <<skip();>>
|
||
|
|
||
|
class P {
|
||
|
|
||
|
start : (top)+
|
||
|
;
|
||
|
|
||
|
top : (which) ? <<fprintf(stderr,"%s is a which\n",s); free(s); s=NULL; >>
|
||
|
| other <<fprintf(stderr,"%s is an other\n",s); free(s); s=NULL; >>
|
||
|
; <<if (s != NULL) free(s); s=NULL; >>
|
||
|
|
||
|
which : which2
|
||
|
;
|
||
|
|
||
|
which2 : which3
|
||
|
;
|
||
|
which3
|
||
|
: (label)? <<fprintf(stderr,"%s is a label\n",s);>>
|
||
|
| (global)? <<fprintf(stderr,"%s is a global\n",s);>>
|
||
|
| (exclamation)? <<fprintf(stderr,"%s is an exclamation\n",s);>>
|
||
|
;
|
||
|
|
||
|
label : <<s=strdup(LT(1)->getText());>> A ":" ;
|
||
|
|
||
|
global : <<s=strdup(LT(1)->getText());>> A "::" ;
|
||
|
|
||
|
exclamation : <<s=strdup(LT(1)->getText());>> A "!" ;
|
||
|
|
||
|
other : <<s=strdup(LT(1)->getText());>> "other" ;
|
||
|
|
||
|
}
|
||
|
----------------------------------------------------------------------
|
||
|
|
||
|
This is a silly example, but illustrates the idea. For the input
|
||
|
"a ::" with tracing enabled the output begins:
|
||
|
|
||
|
----------------------------------------------------------------------
|
||
|
enter rule "start" depth 1
|
||
|
enter rule "top" depth 2
|
||
|
User hook: starting guess #1
|
||
|
enter rule "which" depth 3 guessing
|
||
|
enter rule "which2" depth 4 guessing
|
||
|
enter rule "which3" depth 5 guessing
|
||
|
User hook: starting guess #2
|
||
|
enter rule "label" depth 6 guessing
|
||
|
guess failed
|
||
|
User hook: failed guess #2
|
||
|
guess done - returning to rule "which3" at depth 5 (guess mode continues
|
||
|
- an enclosing guess is still active)
|
||
|
User hook: ending guess #2
|
||
|
User hook: starting guess #3
|
||
|
enter rule "global" depth 6 guessing
|
||
|
exit rule "global" depth 6 guessing
|
||
|
guess done - returning to rule "which3" at depth 5 (guess mode continues
|
||
|
- an enclosing guess is still active)
|
||
|
User hook: ending guess #3
|
||
|
enter rule "global" depth 6 guessing
|
||
|
exit rule "global" depth 6 guessing
|
||
|
exit rule "which3" depth 5 guessing
|
||
|
exit rule "which2" depth 4 guessing
|
||
|
exit rule "which" depth 3 guessing
|
||
|
guess done - returning to rule "top" at depth 2 (guess mode ends)
|
||
|
User hook: ending guess #1
|
||
|
enter rule "which" depth 3
|
||
|
.....
|
||
|
----------------------------------------------------------------------
|
||
|
|
||
|
Remember:
|
||
|
|
||
|
(a) Only init-actions are executed during guess mode.
|
||
|
(b) A rule can be invoked multiple times during guess mode.
|
||
|
(c) If the guess succeeds the rule will be called once more
|
||
|
without guess mode so that normal actions will be executed.
|
||
|
This means that the init-action might need to distinguish
|
||
|
between guess mode and non-guess mode using the variable
|
||
|
[zz]guessing.
|
||
|
|
||
|
#101. (Changed in 1.33MR10) antlr -info command line switch
|
||
|
|
||
|
-info
|
||
|
|
||
|
p - extra predicate information in generated file
|
||
|
|
||
|
t - information about tnode use:
|
||
|
at the end of each rule in generated file
|
||
|
summary on stderr at end of program
|
||
|
|
||
|
m - monitor progress
|
||
|
prints name of each rule as it is started
|
||
|
flushes output at start of each rule
|
||
|
|
||
|
f - first/follow set information to stdout
|
||
|
|
||
|
0 - no operation (added in 1.33MR11)
|
||
|
|
||
|
The options may be combined and may appear in any order.
|
||
|
For example:
|
||
|
|
||
|
antlr -info ptm -CC -gt -mrhoist on mygrammar.g
|
||
|
|
||
|
#100a. (Changed in 1.33MR10) Predicate tree simplification
|
||
|
|
||
|
When the same predicates can be referenced in more than one
|
||
|
alternative of a block large predicate trees can be formed.
|
||
|
|
||
|
The difference that these optimizations make is so dramatic
|
||
|
that I have decided to use it even when -mrhoist is not selected.
|
||
|
|
||
|
Consider the following grammar:
|
||
|
|
||
|
start : ( all )* ;
|
||
|
|
||
|
all : a
|
||
|
| d
|
||
|
| e
|
||
|
| f
|
||
|
;
|
||
|
|
||
|
a : c A B
|
||
|
| c A C
|
||
|
;
|
||
|
|
||
|
c : <<AAA(LATEXT(2))>>?
|
||
|
;
|
||
|
|
||
|
d : <<BBB(LATEXT(2))>>? B C
|
||
|
;
|
||
|
|
||
|
e : <<CCC(LATEXT(2))>>? B C
|
||
|
;
|
||
|
|
||
|
f : e X Y
|
||
|
;
|
||
|
|
||
|
In rule "a" there is a reference to rule "c" in both alternatives.
|
||
|
The length of the predicate AAA is k=2 and it can be followed in
|
||
|
alternative 1 only by (A B) while in alternative 2 it can be
|
||
|
followed only by (A C). Thus they do not have identical context.
|
||
|
|
||
|
In rule "all" the alternatives which refer to rules "e" and "f" allow
|
||
|
elimination of the duplicate reference to predicate CCC.
|
||
|
|
||
|
The table below summarized the kind of simplification performed by
|
||
|
1.33MR10. In the table, X and Y stand for single predicates
|
||
|
(not trees).
|
||
|
|
||
|
(OR X (OR Y (OR Z))) => (OR X Y Z)
|
||
|
(AND X (AND Y (AND Z))) => (AND X Y Z)
|
||
|
|
||
|
(OR X (... (OR X Y) ... )) => (OR X (... Y ... ))
|
||
|
(AND X (... (AND X Y) ... )) => (AND X (... Y ... ))
|
||
|
(OR X (... (AND X Y) ... )) => (OR X (... ... ))
|
||
|
(AND X (... (OR X Y) ... )) => (AND X (... ... ))
|
||
|
|
||
|
(AND X) => X
|
||
|
(OR X) => X
|
||
|
|
||
|
In a test with a complex grammar for a real application, a predicate
|
||
|
tree with six OR nodes and 12 leaves was reduced to "(OR X Y Z)".
|
||
|
|
||
|
In 1.33MR10 there is a greater effort to release memory used
|
||
|
by predicates once they are no longer in use.
|
||
|
|
||
|
#100b. (Changed in 1.33MR10) Suppression of extra predicate tests
|
||
|
|
||
|
The following optimizations require that -mrhoist be selected.
|
||
|
|
||
|
It is relatively easy to optimize the code generated for predicate
|
||
|
gates when they are of the form:
|
||
|
|
||
|
(AND X Y Z ...)
|
||
|
or (OR X Y Z ...)
|
||
|
|
||
|
where X, Y, Z, and "..." represent individual predicates (leaves) not
|
||
|
predicate trees.
|
||
|
|
||
|
If the predicate is an AND the contexts of the X, Y, Z, etc. are
|
||
|
ANDed together to create a single Tree context for the group and
|
||
|
context tests for the individual predicates are suppressed:
|
||
|
|
||
|
--------------------------------------------------
|
||
|
Note: This was incorrect. The contexts should be
|
||
|
ORed together. This has been fixed. A more
|
||
|
complete description is available in item #152.
|
||
|
---------------------------------------------------
|
||
|
|
||
|
Optimization 1: (AND X Y Z ...)
|
||
|
|
||
|
Suppose the context for Xtest is LA(1)==LP and the context for
|
||
|
Ytest is LA(1)==LP && LA(2)==ID.
|
||
|
|
||
|
Without the optimization the code would resemble:
|
||
|
|
||
|
if (lookaheadContext &&
|
||
|
!(LA(1)==LP && LA(1)==LP && LA(2)==ID) ||
|
||
|
( (! LA(1)==LP || Xtest) &&
|
||
|
(! (LA(1)==LP || LA(2)==ID) || Xtest)
|
||
|
)) {...
|
||
|
|
||
|
With the -mrhoist optimization the code would resemble:
|
||
|
|
||
|
if (lookaheadContext &&
|
||
|
! (LA(1)==LP && LA(2)==ID) || (Xtest && Ytest) {...
|
||
|
|
||
|
Optimization 2: (OR X Y Z ...) with identical contexts
|
||
|
|
||
|
Suppose the context for Xtest is LA(1)==ID and for Ytest
|
||
|
the context is also LA(1)==ID.
|
||
|
|
||
|
Without the optimization the code would resemble:
|
||
|
|
||
|
if (lookaheadContext &&
|
||
|
! (LA(1)==ID || LA(1)==ID) ||
|
||
|
(LA(1)==ID && Xtest) ||
|
||
|
(LA(1)==ID && Ytest) {...
|
||
|
|
||
|
With the -mrhoist optimization the code would resemble:
|
||
|
|
||
|
if (lookaheadContext &&
|
||
|
(! LA(1)==ID) || (Xtest || Ytest) {...
|
||
|
|
||
|
Optimization 3: (OR X Y Z ...) with distinct contexts
|
||
|
|
||
|
Suppose the context for Xtest is LA(1)==ID and for Ytest
|
||
|
the context is LA(1)==LP.
|
||
|
|
||
|
Without the optimization the code would resemble:
|
||
|
|
||
|
if (lookaheadContext &&
|
||
|
! (LA(1)==ID || LA(1)==LP) ||
|
||
|
(LA(1)==ID && Xtest) ||
|
||
|
(LA(1)==LP && Ytest) {...
|
||
|
|
||
|
With the -mrhoist optimization the code would resemble:
|
||
|
|
||
|
if (lookaheadContext &&
|
||
|
(zzpf=0,
|
||
|
(LA(1)==ID && (zzpf=1) && Xtest) ||
|
||
|
(LA(1)==LP && (zzpf=1) && Ytest) ||
|
||
|
!zzpf) {
|
||
|
|
||
|
These may appear to be of similar complexity at first,
|
||
|
but the non-optimized version contains two tests of each
|
||
|
context while the optimized version contains only one
|
||
|
such test, as well as eliminating some of the inverted
|
||
|
logic (" !(...) || ").
|
||
|
|
||
|
Optimization 4: Computation of predicate gate trees
|
||
|
|
||
|
When generating code for the gates of predicate expressions
|
||
|
antlr 1.33 vanilla uses a recursive procedure to generate
|
||
|
"&&" and "||" expressions for testing the lookahead. As each
|
||
|
layer of the predicate tree is exposed a new set of "&&" and
|
||
|
"||" expressions on the lookahead are generated. In many
|
||
|
cases the lookahead being tested has already been tested.
|
||
|
|
||
|
With -mrhoist a lookahead tree is computed for the entire
|
||
|
lookahead expression. This means that predicates with identical
|
||
|
context or context which is a subset of another predicate's
|
||
|
context disappear.
|
||
|
|
||
|
This is especially important for predicates formed by rules
|
||
|
like the following:
|
||
|
|
||
|
uppperCaseVowel : <<isUpperCase(LATEXT(1))>>? vowel;
|
||
|
vowel: : <<isVowel(LATEXT(1))>>? LETTERS;
|
||
|
|
||
|
These predicates are combined using AND since both must be
|
||
|
satisfied for rule upperCaseVowel. They have identical
|
||
|
context which makes this optimization very effective.
|
||
|
|
||
|
The affect of Items #100a and #100b together can be dramatic. In
|
||
|
a very large (but real world) grammar one particular predicate
|
||
|
expression was reduced from an (unreadable) 50 predicate leaves,
|
||
|
195 LA(1) terms, and 5500 characters to an (easily comprehensible)
|
||
|
3 predicate leaves (all different) and a *single* LA(1) term.
|
||
|
|
||
|
#98. (Changed in 1.33MR10) Option "-info p"
|
||
|
|
||
|
When the user selects option "-info p" the program will generate
|
||
|
detailed information about predicates. If the user selects
|
||
|
"-mrhoist on" additional detail will be provided explaining
|
||
|
the promotion and suppression of predicates. The output is part
|
||
|
of the generated file and sandwiched between #if 0/#endif statements.
|
||
|
|
||
|
Consider the following k=1 grammar:
|
||
|
|
||
|
start : ( all ) * ;
|
||
|
|
||
|
all : ( a
|
||
|
| b
|
||
|
)
|
||
|
;
|
||
|
|
||
|
a : c B
|
||
|
;
|
||
|
|
||
|
c : <<LATEXT(1)>>?
|
||
|
| B
|
||
|
;
|
||
|
|
||
|
b : <<LATEXT(1)>>? X
|
||
|
;
|
||
|
|
||
|
Below is an excerpt of the output for rule "start" for the three
|
||
|
predicate options (off, on, and maintenance release style hoisting).
|
||
|
|
||
|
For those who do not wish to use the "-mrhoist on" option for code
|
||
|
generation the option can be used in a "diagnostic" mode to provide
|
||
|
valuable information:
|
||
|
|
||
|
a. where one should insert null actions to inhibit hoisting
|
||
|
b. a chain of rule references which shows where predicates are
|
||
|
being hoisted
|
||
|
|
||
|
======================================================================
|
||
|
Example of "-info p" with "-mrhoist on"
|
||
|
======================================================================
|
||
|
#if 0
|
||
|
|
||
|
Hoisting of predicate suppressed by alternative without predicate.
|
||
|
The alt without the predicate includes all cases where the
|
||
|
predicate is false.
|
||
|
|
||
|
WITH predicate: line 11 v36.g
|
||
|
WITHOUT predicate: line 12 v36.g
|
||
|
|
||
|
The context set for the predicate:
|
||
|
|
||
|
B
|
||
|
|
||
|
The lookahead set for alt WITHOUT the semantic predicate:
|
||
|
|
||
|
B
|
||
|
|
||
|
The predicate:
|
||
|
|
||
|
pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g
|
||
|
|
||
|
set context:
|
||
|
B
|
||
|
tree context: null
|
||
|
|
||
|
Chain of referenced rules:
|
||
|
|
||
|
#0 in rule start (line 1 v36.g) to rule all
|
||
|
#1 in rule all (line 3 v36.g) to rule a
|
||
|
#2 in rule a (line 8 v36.g) to rule c
|
||
|
#3 in rule c (line 11 v36.g)
|
||
|
|
||
|
#endif
|
||
|
&&
|
||
|
#if 0
|
||
|
|
||
|
pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g
|
||
|
|
||
|
set context:
|
||
|
X
|
||
|
tree context: null
|
||
|
|
||
|
#endif
|
||
|
======================================================================
|
||
|
Example of "-info p" with the default -prc setting ( "-prc off")
|
||
|
======================================================================
|
||
|
#if 0
|
||
|
|
||
|
OR
|
||
|
pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g
|
||
|
|
||
|
set context:
|
||
|
nil
|
||
|
tree context: null
|
||
|
|
||
|
pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g
|
||
|
|
||
|
set context:
|
||
|
nil
|
||
|
tree context: null
|
||
|
|
||
|
#endif
|
||
|
======================================================================
|
||
|
Example of "-info p" with "-prc on" and "-mrhoist off"
|
||
|
======================================================================
|
||
|
#if 0
|
||
|
|
||
|
OR
|
||
|
pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g
|
||
|
|
||
|
set context:
|
||
|
B
|
||
|
tree context: null
|
||
|
|
||
|
pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g
|
||
|
|
||
|
set context:
|
||
|
X
|
||
|
tree context: null
|
||
|
|
||
|
#endif
|
||
|
======================================================================
|
||
|
|
||
|
#60. (Changed in 1.33MR7) Major changes to exception handling
|
||
|
|
||
|
There were significant problems in the handling of exceptions
|
||
|
in 1.33 vanilla. The general problem is that it can only
|
||
|
process one level of exception handler. For example, a named
|
||
|
exception handler, an exception handler for an alternative, or
|
||
|
an exception for a subrule always went to the rule's exception
|
||
|
handler if there was no "catch" which matched the exception.
|
||
|
|
||
|
In 1.33MR7 the exception handlers properly "nest". If an
|
||
|
exception handler does not have a matching "catch" then the
|
||
|
nextmost outer exception handler is checked for an appropriate
|
||
|
"catch" clause, and so on until an exception handler with an
|
||
|
appropriate "catch" is found.
|
||
|
|
||
|
There are still undesirable features in the way exception
|
||
|
handlers are implemented, but I do not have time to fix them
|
||
|
at the moment:
|
||
|
|
||
|
The exception handlers for alternatives are outside the
|
||
|
block containing the alternative. This makes it impossible
|
||
|
to access variables declared in a block or to resume the
|
||
|
parse by "falling through". The parse can still be easily
|
||
|
resumed in other ways, but not in the most natural fashion.
|
||
|
|
||
|
This results in an inconsistentcy between named exception
|
||
|
handlers and exception handlers for alternatives. When
|
||
|
an exception handler for an alternative "falls through"
|
||
|
it goes to the nextmost outer handler - not the "normal
|
||
|
action".
|
||
|
|
||
|
A major difference between 1.33MR7 and 1.33 vanilla is
|
||
|
the default action after an exception is caught:
|
||
|
|
||
|
1.33 Vanilla
|
||
|
------------
|
||
|
In 1.33 vanilla the signal value is set to zero ("NoSignal")
|
||
|
and the code drops through to the code following the exception.
|
||
|
For named exception handlers this is the "normal action".
|
||
|
For alternative exception handlers this is the rule's handler.
|
||
|
|
||
|
1.33MR7
|
||
|
-------
|
||
|
In 1.33MR7 the signal value is NOT automatically set to zero.
|
||
|
|
||
|
There are two cases:
|
||
|
|
||
|
For named exception handlers: if the signal value has been
|
||
|
set to zero the code drops through to the "normal action".
|
||
|
|
||
|
For all other cases the code branches to the nextmost outer
|
||
|
exception handler until it reaches the handler for the rule.
|
||
|
|
||
|
The following macros have been defined for convenience:
|
||
|
|
||
|
C/C++ Mode Name
|
||
|
--------------------
|
||
|
(zz)suppressSignal
|
||
|
set signal & return signal arg to 0 ("NoSignal")
|
||
|
(zz)setSignal(intValue)
|
||
|
set signal & return signal arg to some value
|
||
|
(zz)exportSignal
|
||
|
copy the signal value to the return signal arg
|
||
|
|
||
|
I'm not sure why PCCTS make a distinction between the local
|
||
|
signal value and the return signal argument, but I'm loathe
|
||
|
to change the code. The burden of copying the local signal
|
||
|
value to the return signal argument can be given to the
|
||
|
default signal handler, I suppose.
|
||
|
|
||
|
#53. (Explanation for 1.33MR6) What happens after an exception is caught ?
|
||
|
|
||
|
The Book is silent about what happens after an exception
|
||
|
is caught.
|
||
|
|
||
|
The following code fragment prints "Error Action" followed
|
||
|
by "Normal Action".
|
||
|
|
||
|
test : Word ex:Number <<printf("Normal Action\n");>>
|
||
|
exception[ex]
|
||
|
catch NoViableAlt:
|
||
|
<<printf("Error Action\n");>>
|
||
|
;
|
||
|
|
||
|
The reason for "Normal Action" is that the normal flow of the
|
||
|
program after a user-written exception handler is to "drop through".
|
||
|
In the case of an exception handler for a rule this results in
|
||
|
the exection of a "return" statement. In the case of an
|
||
|
exception handler attached to an alternative, rule, or token
|
||
|
this is the code that would have executed had there been no
|
||
|
exception.
|
||
|
|
||
|
The user can achieve the desired result by using a "return"
|
||
|
statement.
|
||
|
|
||
|
test : Word ex:Number <<printf("Normal Action\n");>>
|
||
|
exception[ex]
|
||
|
catch NoViableAlt:
|
||
|
<<printf("Error Action\n"); return;>>
|
||
|
;
|
||
|
|
||
|
The most powerful mechanism for recovery from parse errors
|
||
|
in pccts is syntactic predicates because they provide
|
||
|
backtracking. Exceptions allow "return", "break",
|
||
|
"consumeUntil(...)", "goto _handler", "goto _fail", and
|
||
|
changing the _signal value.
|
||
|
|
||
|
#41. (Added in 1.33MR6) antlr -stdout
|
||
|
|
||
|
Using "antlr -stdout ..." forces the text that would
|
||
|
normally go to the grammar.c or grammar.cpp file to
|
||
|
stdout.
|
||
|
|
||
|
#40. (Added in 1.33MR6) antlr -tab to change tab stops
|
||
|
|
||
|
Using "antlr -tab number ..." changes the tab stops
|
||
|
for the grammar.c or grammar.cpp file. The number
|
||
|
must be between 0 and 8. Using 0 gives tab characters,
|
||
|
values between 1 and 8 give the appropriate number of
|
||
|
space characters.
|
||
|
|
||
|
#34. (Added to 1.33MR1) Add public DLGLexerBase::set_line(int newValue)
|
||
|
|
||
|
Previously there was no public function for changing the line
|
||
|
number maintained by the lexer.
|
||
|
|
||
|
#28. (Added to 1.33MR1) More control over DLG header
|
||
|
|
||
|
Version 1.33MR1 adds the following directives to PCCTS
|
||
|
for C++ mode:
|
||
|
|
||
|
#lexprefix <<source code>>
|
||
|
|
||
|
Adds source code to the DLGLexer.h file
|
||
|
after the #include "DLexerBase.h" but
|
||
|
before the start of the class definition.
|
||
|
|
||
|
#lexmember <<source code>>
|
||
|
|
||
|
Adds source code to the DLGLexer.h file
|
||
|
as part of the DLGLexer class body. It
|
||
|
appears immediately after the start of
|
||
|
the class and a "public: statement.
|
||
|
|