2009-07-17 11:10:31 +02:00
|
|
|
|
|
|
|
|
|
=======================================================
|
|
|
|
|
Known Problems In PCCTS - Last revised 14 November 1998
|
|
|
|
|
=======================================================
|
|
|
|
|
|
|
|
|
|
#17. The dlg fix for handling characters up to 255 is incorrect.
|
|
|
|
|
|
|
|
|
|
See item #207.
|
|
|
|
|
|
|
|
|
|
Reported by Frank Hartmann.
|
|
|
|
|
|
|
|
|
|
#16. A note about "&&" predicates (Mike Dimmick)
|
|
|
|
|
|
|
|
|
|
Mike Dimmick has pointed out a potential pitfall in the use of the
|
|
|
|
|
"&&" style predicate. Consider:
|
|
|
|
|
|
|
|
|
|
r0: (g)? => <<P>>? r1
|
|
|
|
|
| ...
|
|
|
|
|
;
|
|
|
|
|
r1: A | B;
|
|
|
|
|
|
|
|
|
|
If the context guard g is not a subset of the lookahead context for r1
|
|
|
|
|
(in other words g is neither A nor B) then the code may execute r1
|
|
|
|
|
even when the lookahead context is not satisfied. This is an error
|
2019-02-06 08:44:39 +01:00
|
|
|
|
by the person coding the grammar, and the error should be reported to
|
2009-07-17 11:10:31 +02:00
|
|
|
|
the user, but it isn't. expect. Some examples I've run seem to
|
|
|
|
|
indicate that such an error actually results in the rule becoming
|
|
|
|
|
unreachable.
|
|
|
|
|
|
|
|
|
|
When g is properly coded the code is correct, the problem is when g
|
|
|
|
|
is not properly coded.
|
|
|
|
|
|
|
|
|
|
A second problem reported by Mike Dimmick is that the test for a
|
|
|
|
|
failed validation predicate is equivalent to a test on the predicate
|
|
|
|
|
along. In other words, if the "&&" has not been hoisted then it may
|
|
|
|
|
falsely report a validation error.
|
|
|
|
|
|
|
|
|
|
#15. (Changed in MR23) Warning for LT(i), LATEXT(i) in token match actions
|
|
|
|
|
|
|
|
|
|
An bug (or at least an oddity) is that a reference to LT(1), LA(1),
|
|
|
|
|
or LATEXT(1) in an action which immediately follows a token match
|
|
|
|
|
in a rule refers to the token matched, not the token which is in
|
|
|
|
|
the lookahead buffer. Consider:
|
|
|
|
|
|
|
|
|
|
r : abc <<action alpha>> D <<action beta>> E;
|
|
|
|
|
|
|
|
|
|
In this case LT(1) in action alpha will refer to the next token in
|
|
|
|
|
the lookahead buffer ("D"), but LT(1) in action beta will refer to
|
|
|
|
|
the token matched by D - the preceding token.
|
|
|
|
|
|
|
|
|
|
A warning has been added which warns users about this when an action
|
|
|
|
|
following a token match contains a reference to LT(1), LA(1), or LATEXT(1).
|
|
|
|
|
|
|
|
|
|
This behavior should be changed, but it appears in too many programs
|
|
|
|
|
now. Another problem, perhaps more significant, is that the obvious
|
|
|
|
|
fix (moving the consume() call to before the action) could change the
|
|
|
|
|
order in which input is requested and output appears in existing programs.
|
|
|
|
|
|
|
|
|
|
This problem was reported, along with a fix by Benjamin Mandel
|
|
|
|
|
(beny@sd.co.il). However, I felt that changing the behavior was too
|
|
|
|
|
dangerous for existing code.
|
|
|
|
|
|
|
|
|
|
#14. Parsing bug in dlg
|
|
|
|
|
|
|
|
|
|
THM: I have been unable to reproduce this problem.
|
|
|
|
|
|
|
|
|
|
Reported by Rick Howard Mijenix Corporation (rickh@mijenix.com).
|
|
|
|
|
|
|
|
|
|
The regular expression parser (in rexpr.c) fails while
|
|
|
|
|
trying to parse the following regular expression:
|
|
|
|
|
|
|
|
|
|
{[a-zA-Z]:}(\\\\[a-zA-Z0-9]*)+
|
|
|
|
|
|
|
|
|
|
See my comment in the following excerpt from rexpr.c:
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* <regExpr> ::= <andExpr> ( '|' {<andExpr>} )*
|
|
|
|
|
*
|
|
|
|
|
* Return -1 if syntax error
|
|
|
|
|
* Return 0 if none found
|
|
|
|
|
* Return 1 if a regExrp was found
|
|
|
|
|
*/
|
|
|
|
|
static
|
|
|
|
|
regExpr(g)
|
|
|
|
|
GraphPtr g;
|
|
|
|
|
{
|
|
|
|
|
Graph g1, g2;
|
|
|
|
|
|
|
|
|
|
if ( andExpr(&g1) == -1 )
|
|
|
|
|
{
|
|
|
|
|
return -1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
while ( token == '|' )
|
|
|
|
|
{
|
|
|
|
|
int a;
|
|
|
|
|
next();
|
|
|
|
|
a = andExpr(&g2);
|
|
|
|
|
if ( a == -1 ) return -1; /* syntax error below */
|
|
|
|
|
else if ( !a ) return 1; /* empty alternative */
|
|
|
|
|
g1 = BuildNFA_AorB(g1, g2);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if ( token!='\0' ) return -1;
|
|
|
|
|
*****
|
|
|
|
|
***** It appears to fail here becuause token is 125 - the closing '}'
|
|
|
|
|
***** If I change it to:
|
|
|
|
|
***** if ( token!='\0' && token!='}' && token!= ')' ) return -1;
|
|
|
|
|
*****
|
|
|
|
|
***** It succeeds, but I'm not sure this is the corrrect approach.
|
|
|
|
|
*****
|
|
|
|
|
*g = g1;
|
|
|
|
|
return 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
#13. dlg reports an invalid range for: [\0x00-\0xff]
|
|
|
|
|
|
|
|
|
|
Diagnosed by Piotr Eljasiak (eljasiak@no-spam.zt.gdansk.tpsa.pl):
|
|
|
|
|
|
|
|
|
|
Fixed in MR16.
|
|
|
|
|
|
|
|
|
|
#12. Strings containing comment actions
|
|
|
|
|
|
|
|
|
|
Sequences that looked like C style comments appearing in string
|
|
|
|
|
literals are improperly parsed by antlr/dlg.
|
|
|
|
|
|
|
|
|
|
<< fprintf(out," /* obsolete */ ");
|
|
|
|
|
|
|
|
|
|
For this case use:
|
|
|
|
|
|
|
|
|
|
<< fprintf(out," \/\* obsolete \*\/ ");
|
|
|
|
|
|
|
|
|
|
Reported by K.J. Cummings (cummings@peritus.com).
|
|
|
|
|
|
|
|
|
|
#11. User hook for deallocation of variables on guess fail
|
|
|
|
|
|
|
|
|
|
The mechanism outlined in Item #108 works only for
|
|
|
|
|
heap allocated variables.
|
|
|
|
|
|
|
|
|
|
#10. Label re-initialization in ( X {y:Y} )*
|
|
|
|
|
|
|
|
|
|
If a label assignment is optional and appears in a
|
|
|
|
|
(...)* or (...)+ block it will not be reset to NULL
|
|
|
|
|
when it is skipped by a subsequent iteration.
|
|
|
|
|
|
|
|
|
|
Consider the example:
|
|
|
|
|
|
|
|
|
|
( X { y:Y })* Z
|
|
|
|
|
|
|
|
|
|
with input:
|
|
|
|
|
|
|
|
|
|
X Y X Z
|
|
|
|
|
|
|
|
|
|
The first time through the block Y will be matched and
|
|
|
|
|
y will be set to point to the token. On the second
|
|
|
|
|
iteration of the (...)* block there is no match for Y.
|
|
|
|
|
But y will not be reset to NULL, as the user might
|
|
|
|
|
expect, it will contain a reference to the Y that was
|
|
|
|
|
matched in the first iteration.
|
|
|
|
|
|
|
|
|
|
The work-around is to manually reset y:
|
|
|
|
|
|
|
|
|
|
( X << y = NULL; >> { y:Y } )* Z
|
|
|
|
|
|
|
|
|
|
or
|
|
|
|
|
|
|
|
|
|
( X ( y:Y | << y = NULL; >> /* epsilon */ ) )* Z
|
|
|
|
|
|
|
|
|
|
Reported by Jeff Vincent (JVincent@novell.com).
|
|
|
|
|
|
|
|
|
|
#9. PCCTAST.h PCCTSAST::setType() is a noop
|
|
|
|
|
|
|
|
|
|
#8. #tokdefs with ~Token and .
|
|
|
|
|
|
|
|
|
|
THM: I have been unable to reproduce this problem.
|
|
|
|
|
|
|
|
|
|
When antlr uses #tokdefs to define tokens the fields of
|
|
|
|
|
#errclass and #tokclass do not get properly defined.
|
|
|
|
|
When it subsequently attempts to take the complement of
|
|
|
|
|
the set of tokens (using ~Token or .) it can refer to
|
|
|
|
|
tokens which don't have names, generating a fatal error.
|
|
|
|
|
|
|
|
|
|
#7. DLG crashes on some invalid inputs
|
|
|
|
|
|
|
|
|
|
THM: In MR20 have fixed the most common cases.
|
|
|
|
|
|
2019-02-06 08:44:39 +01:00
|
|
|
|
The following token definition will cause DLG to crash.
|
2009-07-17 11:10:31 +02:00
|
|
|
|
|
|
|
|
|
#token "()"
|
|
|
|
|
|
|
|
|
|
Reported by Mengue Olivier (dolmen@bigfoot.com).
|
|
|
|
|
|
|
|
|
|
#6. On MS systems \n\r is treated as two new lines
|
|
|
|
|
|
|
|
|
|
Fixed.
|
|
|
|
|
|
|
|
|
|
#5. Token expressions in #tokclass
|
|
|
|
|
|
|
|
|
|
#errclass does not support TOK1..TOK2 or ~TOK syntax.
|
|
|
|
|
#tokclass does not support ~TOKEN syntax
|
|
|
|
|
|
|
|
|
|
A workaround for #errclass TOK1..TOK2 is to use a
|
|
|
|
|
#tokclass.
|
|
|
|
|
|
|
|
|
|
Reported by Dave Watola (dwatola@amtsun.jpl.nasa.gov)
|
|
|
|
|
|
|
|
|
|
#4. A #tokdef must appear "early" in the grammar file.
|
|
|
|
|
|
|
|
|
|
The "early" section of the grammar file is the only
|
|
|
|
|
place where the following directives may appear:
|
|
|
|
|
|
|
|
|
|
#header
|
|
|
|
|
#first
|
|
|
|
|
#tokdefs
|
|
|
|
|
#parser
|
|
|
|
|
|
|
|
|
|
Any other kind of statement signifiies the end of the
|
|
|
|
|
"early" section.
|
|
|
|
|
|
|
|
|
|
#3. Use of PURIFY macro for C++ mode
|
|
|
|
|
|
|
|
|
|
Item #93 of the CHANGES_FROM_1.33 describes the use of
|
|
|
|
|
the PURIFY macro to zero arguments to be passed by
|
|
|
|
|
upward inheritance.
|
|
|
|
|
|
|
|
|
|
#define PURIFY(r, s) memset((char *) &(r), '\0', (s));
|
|
|
|
|
|
|
|
|
|
This may not be the right thing to do for C++ objects that
|
|
|
|
|
have constructors. Reported by Bonny Rais (bonny@werple.net.au).
|
|
|
|
|
|
|
|
|
|
For those cases one should #define PURIFY to be an empty macro
|
|
|
|
|
in the #header or #first actions.
|
|
|
|
|
|
|
|
|
|
#2. Fixed in 1.33MR10 - See CHANGES_FROM_1.33 Item #80.
|
|
|
|
|
|
|
|
|
|
#1. The quality of support for systems with 8.3 file names leaves
|
|
|
|
|
much to be desired. Since the kit is distributed using the
|
|
|
|
|
long file names and the make file uses long file names it requires
|
|
|
|
|
some effort to generate. This will probably not be changed due
|
|
|
|
|
to the large number of systems already written using the long
|
|
|
|
|
file names.
|