mirror of https://github.com/acidanthera/audk.git
210 lines
10 KiB
Groff
210 lines
10 KiB
Groff
.TH ANTLR 1 "September 1995" "ANTLR" "PCCTS Manual Pages"
|
|
.SH NAME
|
|
antlr \- ANother Tool for Language Recognition
|
|
.SH SYNTAX
|
|
.LP
|
|
\fBantlr\fR [\fIoptions\fR] \fIgrammar_files\fR
|
|
.SH DESCRIPTION
|
|
.PP
|
|
\fIAntlr\fP converts an extended form of context-free grammar into a
|
|
set of C functions which directly implement an efficient form of
|
|
deterministic recursive-descent LL(k) parser. Context-free grammars
|
|
may be augmented with predicates to allow semantics to influence
|
|
parsing; this allows a form of context-sensitive parsing. Selective
|
|
backtracking is also available to handle non-LL(k) and even
|
|
non-LALR(k) constructs. \fIAntlr\fP also produces a definition of a
|
|
lexer which can be automatically converted into C code for a DFA-based
|
|
lexer by \fIdlg\fR. Hence, \fIantlr\fR serves a function much like
|
|
that of \fIyacc\fR, however, it is notably more flexible and is more
|
|
integrated with a lexer generator (\fIantlr\fR directly generates
|
|
\fIdlg\fR code, whereas \fIyacc\fR and \fIlex\fR are given independent
|
|
descriptions). Unlike \fIyacc\fR which accepts LALR(1) grammars,
|
|
\fIantlr\fR accepts LL(k) grammars in an extended BNF notation \(em
|
|
which eliminates the need for precedence rules.
|
|
.PP
|
|
Like \fIyacc\fR grammars, \fIantlr\fR grammars can use
|
|
automatically-maintained symbol attribute values referenced as dollar
|
|
variables. Further, because \fIantlr\fR generates top-down parsers,
|
|
arbitrary values may be inherited from parent rules (passed like
|
|
function parameters). \fIAntlr\fP also has a mechanism for creating
|
|
and manipulating abstract-syntax-trees.
|
|
.PP
|
|
There are various other niceties in \fIantlr\fR, including the ability to
|
|
spread one grammar over multiple files or even multiple grammars in a single
|
|
file, the ability to generate a version of the grammar with actions stripped
|
|
out (for documentation purposes), and lots more.
|
|
.SH OPTIONS
|
|
.IP "\fB-ck \fIn\fR"
|
|
Use up to \fIn\fR symbols of lookahead when using compressed (linear
|
|
approximation) lookahead. This type of lookahead is very cheap to
|
|
compute and is attempted before full LL(k) lookahead, which is of
|
|
exponential complexity in the worst case. In general, the compressed
|
|
lookahead can be much deeper (e.g, \f(CW-ck 10\fP) than the full
|
|
lookahead (which usually must be less than 4).
|
|
.IP \fB-CC\fP
|
|
Generate C++ output from both ANTLR and DLG.
|
|
.IP \fB-cr\fP
|
|
Generate a cross-reference for all rules. For each rule, print a list
|
|
of all other rules that reference it.
|
|
.IP \fB-e1\fP
|
|
Ambiguities/errors shown in low detail (default).
|
|
.IP \fB-e2\fP
|
|
Ambiguities/errors shown in more detail.
|
|
.IP \fB-e3\fP
|
|
Ambiguities/errors shown in excruciating detail.
|
|
.IP "\fB-fe\fP file"
|
|
Rename \fBerr.c\fP to file.
|
|
.IP "\fB-fh\fP file"
|
|
Rename \fBstdpccts.h\fP header (turns on \fB-gh\fP) to file.
|
|
.IP "\fB-fl\fP file"
|
|
Rename lexical output, \fBparser.dlg\fP, to file.
|
|
.IP "\fB-fm\fP file"
|
|
Rename file with lexical mode definitions, \fBmode.h\fP, to file.
|
|
.IP "\fB-fr\fP file"
|
|
Rename file which remaps globally visible symbols, \fBremap.h\fP, to file.
|
|
.IP "\fB-ft\fP file"
|
|
Rename \fBtokens.h\fP to file.
|
|
.IP \fB-ga\fP
|
|
Generate ANSI-compatible code (default case). This has not been
|
|
rigorously tested to be ANSI XJ11 C compliant, but it is close. The
|
|
normal output of \fIantlr\fP is currently compilable under both K&R,
|
|
ANSI C, and C++\(emthis option does nothing because \fIantlr\fP
|
|
generates a bunch of #ifdef's to do the right thing depending on the
|
|
language.
|
|
.IP \fB-gc\fP
|
|
Indicates that \fIantlr\fP should generate no C code, i.e., only
|
|
perform analysis on the grammar.
|
|
.IP \fB-gd\fP
|
|
C code is inserted in each of the \fIantlr\fR generated parsing functions to
|
|
provide for user-defined handling of a detailed parse trace. The inserted
|
|
code consists of calls to the user-supplied macros or functions called
|
|
\fBzzTRACEIN\fR and \fBzzTRACEOUT\fP. The only argument is a
|
|
\fIchar *\fR pointing to a C-style string which is the grammar rule
|
|
recognized by the current parsing function. If no definition is given
|
|
for the trace functions, upon rule entry and exit, a message will be
|
|
printed indicating that a particular rule as been entered or exited.
|
|
.IP \fB-ge\fP
|
|
Generate an error class for each non-terminal.
|
|
.IP \fB-gh\fP
|
|
Generate \fBstdpccts.h\fP for non-ANTLR-generated files to include.
|
|
This file contains all defines needed to describe the type of parser
|
|
generated by \fIantlr\fP (e.g. how much lookahead is used and whether
|
|
or not trees are constructed) and contains the \fBheader\fP action
|
|
specified by the user.
|
|
.IP \fB-gk\fP
|
|
Generate parsers that delay lookahead fetches until needed. Without
|
|
this option, \fIantlr\fP generates parsers which always have \fIk\fP
|
|
tokens of lookahead available.
|
|
.IP \fB-gl\fP
|
|
Generate line info about grammar actions in C parser of the form
|
|
\fB#\ \fIline\fP\ "\fIfile\fP"\fR which makes error messages from
|
|
the C/C++ compiler make more sense as they will \*Qpoint\*U into the
|
|
grammar file not the resulting C file. Debugging is easier as well,
|
|
because you will step through the grammar not C file.
|
|
.IP \fB-gs\fR
|
|
Do not generate sets for token expression lists; instead generate a
|
|
\fB||\fP-separated sequence of \fBLA(1)==\fItoken_number\fR. The
|
|
default is to generate sets.
|
|
.IP \fB-gt\fP
|
|
Generate code for Abstract-Syntax Trees.
|
|
.IP \fB-gx\fP
|
|
Do not create the lexical analyzer files (dlg-related). This option
|
|
should be given when the user wishes to provide a customized lexical
|
|
analyzer. It may also be used in \fImake\fR scripts to cause only the
|
|
parser to be rebuilt when a change not affecting the lexical structure
|
|
is made to the input grammars.
|
|
.IP "\fB-k \fIn\fR"
|
|
Set k of LL(k) to \fIn\fR; i.e. set tokens of look-ahead (default==1).
|
|
.IP "\fB-o\fP dir
|
|
Directory where output files should go (default="."). This is very
|
|
nice for keeping the source directory clear of ANTLR and DLG spawn.
|
|
.IP \fB-p\fP
|
|
The complete grammar, collected from all input grammar files and
|
|
stripped of all comments and embedded actions, is listed to
|
|
\fBstdout\fP. This is intended to aid in viewing the entire grammar
|
|
as a whole and to eliminate the need to keep actions concisely stated
|
|
so that the grammar is easier to read. Hence, it is preferable to
|
|
embed even complex actions directly in the grammar, rather than to
|
|
call them as subroutines, since the subroutine call overhead will be
|
|
saved.
|
|
.IP \fB-pa\fP
|
|
This option is the same as \fB-p\fP except that the output is
|
|
annotated with the first sets determined from grammar analysis.
|
|
.IP "\fB-prc on\fR
|
|
Turn on the computation and hoisting of predicate context.
|
|
.IP "\fB-prc off\fR
|
|
Turn off the computation and hoisting of predicate context. This
|
|
option makes 1.10 behave like the 1.06 release with option \fB-pr\fR
|
|
on. Context computation is off by default.
|
|
.IP "\fB-rl \fIn\fR
|
|
Limit the maximum number of tree nodes used by grammar analysis to
|
|
\fIn\fP. Occasionally, \fIantlr\fP is unable to analyze a grammar
|
|
submitted by the user. This rare situation can only occur when the
|
|
grammar is large and the amount of lookahead is greater than one. A
|
|
nonlinear analysis algorithm is used by PCCTS to handle the general
|
|
case of LL(k) parsing. The average complexity of analysis, however, is
|
|
near linear due to some fancy footwork in the implementation which
|
|
reduces the number of calls to the full LL(k) algorithm. An error
|
|
message will be displayed, if this limit is reached, which indicates
|
|
the grammar construct being analyzed when \fIantlr\fP hit a
|
|
non-linearity. Use this option if \fIantlr\fP seems to go out to
|
|
lunch and your disk start thrashing; try \fIn\fP=10000 to start. Once
|
|
the offending construct has been identified, try to remove the
|
|
ambiguity that \fIantlr\fP was trying to overcome with large lookahead
|
|
analysis. The introduction of (...)? backtracking blocks eliminates
|
|
some of these problems\ \(em \fIantlr\fP does not analyze alternatives
|
|
that begin with (...)? (it simply backtracks, if necessary, at run
|
|
time).
|
|
.IP \fB-w1\fR
|
|
Set low warning level. Do not warn if semantic predicates and/or
|
|
(...)? blocks are assumed to cover ambiguous alternatives.
|
|
.IP \fB-w2\fR
|
|
Ambiguous parsing decisions yield warnings even if semantic predicates
|
|
or (...)? blocks are used. Warn if predicate context computed and
|
|
semantic predicates incompletely disambiguate alternative productions.
|
|
.IP \fB-\fR
|
|
Read grammar from standard input and generate \fBstdin.c\fP as the
|
|
parser file.
|
|
.SH "SPECIAL CONSIDERATIONS"
|
|
.PP
|
|
\fIAntlr\fP works... we think. There is no implicit guarantee of
|
|
anything. We reserve no \fBlegal\fP rights to the software known as
|
|
the Purdue Compiler Construction Tool Set (PCCTS) \(em PCCTS is in the
|
|
public domain. An individual or company may do whatever they wish
|
|
with source code distributed with PCCTS or the code generated by
|
|
PCCTS, including the incorporation of PCCTS, or its output, into
|
|
commercial software. We encourage users to develop software with
|
|
PCCTS. However, we do ask that credit is given to us for developing
|
|
PCCTS. By "credit", we mean that if you incorporate our source code
|
|
into one of your programs (commercial product, research project, or
|
|
otherwise) that you acknowledge this fact somewhere in the
|
|
documentation, research report, etc... If you like PCCTS and have
|
|
developed a nice tool with the output, please mention that you
|
|
developed it using PCCTS. As long as these guidelines are followed,
|
|
we expect to continue enhancing this system and expect to make other
|
|
tools available as they are completed.
|
|
.SH FILES
|
|
.IP *.c
|
|
output C parser.
|
|
.IP *.cpp
|
|
output C++ parser when C++ mode is used.
|
|
.IP \fBparser.dlg\fP
|
|
output \fIdlg\fR lexical analyzer.
|
|
.IP \fBerr.c\fP
|
|
token string array, error sets and error support routines. Not used in
|
|
C++ mode.
|
|
.IP \fBremap.h\fP
|
|
file that redefines all globally visible parser symbols. The use of
|
|
the #parser directive creates this file. Not used in
|
|
C++ mode.
|
|
.IP \fBstdpccts.h\fP
|
|
list of definitions needed by C files, not generated by PCCTS, that
|
|
reference PCCTS objects. This is not generated by default. Not used in
|
|
C++ mode.
|
|
.IP \fBtokens.h\fP
|
|
output \fI#defines\fR for tokens used and function prototypes for
|
|
functions generated for rules.
|
|
.SH "SEE ALSO"
|
|
.LP
|
|
dlg(1), pccts(1)
|