====================================================================== CHANGES_SUMMARY.TXT A QUICK overview of changes from 1.33 in reverse order A summary of additions rather than bug fixes and minor code changes. Numbers refer to items in CHANGES_FROM_133*.TXT which may contain additional information. DISCLAIMER The software and these notes are provided "as is". They may include typographical or technical errors and their authors disclaims all liability of any kind or nature for damages due to error, fault, defect, or deficiency regardless of cause. All warranties of any kind, either express or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. ====================================================================== #258. You can specify a user-defined base class for your parser The base class must constructor must have a signature similar to that of ANTLRParser. #253. Generation of block preamble (-preamble and -preamble_first) The antlr option -preamble causes antlr to insert the code BLOCK_PREAMBLE at the start of each rule and block. The antlr option -preamble_first is similar, but inserts the code BLOCK_PREAMBLE_FIRST(PreambleFirst_123) where the symbol PreambleFirst_123 is equivalent to the first set defined by the #FirstSetSymbol described in Item #248. #248. Generate symbol for first set of an alternative rr : #FirstSetSymbol(rr_FirstSet) ( Foo | Bar ) ; #216. Defer token fetch for C++ mode When the ANTLRParser class is built with the pre-processor option ZZDEFER_FETCH defined, the fetch of new tokens by consume() is deferred until LA(i) or LT(i) is called. #215. Use reset() to reset DLGLexerBase #188. Added pccts/h/DLG_stream_input.h #180. Added ANTLRParser::getEofToken() #173. -glms for Microsoft style filenames with -gl #170. Suppression for predicates with lookahead depth >1 Consider the following grammar with -ck 2 and the predicate in rule "a" with depth 2: r1 : (ab)* "@" ; ab : a | b ; a : (A B)? => <
>? A B C ; b : A B C ; Normally, the predicate would be hoisted into rule r1 in order to determine whether to call rule "ab". However it should *not* be hoisted because, even if p is false, there is a valid alternative in rule b. With "-mrhoistk on" the predicate will be suppressed. If "-info p" command line option is present the following information will appear in the generated code: while ( (LA(1)==A) #if 0 Part (or all) of predicate with depth > 1 suppressed by alternative without predicate pred << p(LATEXT(2))>>? depth=k=2 ("=>" guard) rule a line 8 t1.g tree context: (root = A B ) The token sequence which is suppressed: ( A B ) The sequence of references which generate that sequence of tokens: 1 to ab r1/1 line 1 t1.g 2 ab ab/1 line 4 t1.g 3 to b ab/2 line 5 t1.g 4 b b/1 line 11 t1.g 5 #token A b/1 line 11 t1.g 6 #token B b/1 line 11 t1.g #endif A slightly more complicated example: r1 : (ab)* "@" ; ab : a | b ; a : (A B)? => <
>? (A B | D E)
;
b : < >? B ;
rq : (A)? => < >? X
| < >? X
;
The #pred statement is a start towards solving this problem.
During ambiguity resolution (*not* predicate hoisting) the
predicates for the two alternatives are expanded and compared.
Consider the following example:
#pred Upper < >? // #1
(A // #2
|B // #3
) // #4
| < >? expr
The existing context guarded predicate:
rule : (guard)? => < >? expr
| next_alternative
;
generates code which resembles:
if (lookahead(expr) && (!guard || pred)) {
expr()
} else ....
This is not suitable for some applications because it allows
expr() to be invoked when the predicate is false. This is
intentional because it is meant to mimic automatically computed
predicate context.
The new context guarded predicate uses the guard information
differently because it has a different goal. Consider:
rule : (guard)? && < >? expr
| next_alternative
;
The new style of context guarded predicate is equivalent to:
rule : <>? D E
;
In this case, the sequence (D E) in rule "a" which lies behind
the guard is used to suppress the predicate with context (D E)
in rule b.
while ( (LA(1)==A || LA(1)==D)
#if 0
Part (or all) of predicate with depth > 1 suppressed by alternative
without predicate
pred << q(LATEXT(2))>>?
depth=k=2 rule b line 11 t2.g
tree context:
(root = D
E
)
The token sequence which is suppressed: ( D E )
The sequence of references which generate that sequence of tokens:
1 to ab r1/1 line 1 t2.g
2 ab ab/1 line 4 t2.g
3 to a ab/1 line 4 t2.g
4 a a/1 line 8 t2.g
5 #token D a/1 line 8 t2.g
6 #token E a/1 line 8 t2.g
#endif
&&
#if 0
pred << p(LATEXT(2))>>?
depth=k=2 ("=>" guard) rule a line 8 t2.g
tree context:
(root = A
B
)
#endif
(! ( LA(1)==A && LA(2)==B ) || p(LATEXT(2)) ) {
ab();
...
#165. (Changed in MR13) option -newAST
To create ASTs from an ANTLRTokenPtr antlr usually calls
"new AST(ANTLRTokenPtr)". This option generates a call
to "newAST(ANTLRTokenPtr)" instead. This allows a user
to define a parser member function to create an AST object.
#161. (Changed in MR13) Switch -gxt inhibits generation of tokens.h
#158. (Changed in MR13) #header causes problem for pre-processors
A user who runs the C pre-processor on antlr source suggested
that another syntax be allowed. With MR13 such directives
such as #header, #pragma, etc. may be written as "\#header",
"\#pragma", etc. For escaping pre-processor directives inside
a #header use something like the following:
\#header
<<
\#include
>? (A|B);
In earlier versions both predicates "p" and "q" would be hoisted into
rule r0. With MR12c predicate p is suppressed because the context which
follows predicate q includes "B" which can "cover" predicate "p". In
other words, in trying to decide in r0 whether to call r1, it doesn't
really matter whether p is false or true because, either way, there is
a valid choice within r1.
#154. (Changed in MR13) Making hoist suppression explicit using <
> B // #5
;
Recall that this means that when the lookahead is NOT A then
the predicate "p" is ignored and it attempts to match "A|B".
Ideally, the "B" at line #3 should suppress predicate "q".
However, the current version does not attempt to look past
the guard predicate to find context which might suppress other
predicates.
In some cases -mrhoist will lead to the reporting of ambiguities
which were not visible before:
start : (a)* "@";
a : bc | d;
bc : b | c ;
b : <
getText());>> A ":" ;
global : <getText());>> A "::" ;
exclamation : <getText());>> A "!" ;
other : <getText());>> "other" ;
}
----------------------------------------------------------------------
This is a silly example, but illustrates the idea. For the input
"a ::" with tracing enabled the output begins:
----------------------------------------------------------------------
enter rule "start" depth 1
enter rule "top" depth 2
User hook: starting guess #1
enter rule "which" depth 3 guessing
enter rule "which2" depth 4 guessing
enter rule "which3" depth 5 guessing
User hook: starting guess #2
enter rule "label" depth 6 guessing
guess failed
User hook: failed guess #2
guess done - returning to rule "which3" at depth 5 (guess mode continues
- an enclosing guess is still active)
User hook: ending guess #2
User hook: starting guess #3
enter rule "global" depth 6 guessing
exit rule "global" depth 6 guessing
guess done - returning to rule "which3" at depth 5 (guess mode continues
- an enclosing guess is still active)
User hook: ending guess #3
enter rule "global" depth 6 guessing
exit rule "global" depth 6 guessing
exit rule "which3" depth 5 guessing
exit rule "which2" depth 4 guessing
exit rule "which" depth 3 guessing
guess done - returning to rule "top" at depth 2 (guess mode ends)
User hook: ending guess #1
enter rule "which" depth 3
.....
----------------------------------------------------------------------
Remember:
(a) Only init-actions are executed during guess mode.
(b) A rule can be invoked multiple times during guess mode.
(c) If the guess succeeds the rule will be called once more
without guess mode so that normal actions will be executed.
This means that the init-action might need to distinguish
between guess mode and non-guess mode using the variable
[zz]guessing.
#101. (Changed in 1.33MR10) antlr -info command line switch
-info
p - extra predicate information in generated file
t - information about tnode use:
at the end of each rule in generated file
summary on stderr at end of program
m - monitor progress
prints name of each rule as it is started
flushes output at start of each rule
f - first/follow set information to stdout
0 - no operation (added in 1.33MR11)
The options may be combined and may appear in any order.
For example:
antlr -info ptm -CC -gt -mrhoist on mygrammar.g
#100a. (Changed in 1.33MR10) Predicate tree simplification
When the same predicates can be referenced in more than one
alternative of a block large predicate trees can be formed.
The difference that these optimizations make is so dramatic
that I have decided to use it even when -mrhoist is not selected.
Consider the following grammar:
start : ( all )* ;
all : a
| d
| e
| f
;
a : c A B
| c A C
;
c : <