test262/docs/rationale.md

178 lines
8.8 KiB
Markdown

# Test262 maintenance rationale
Explanations behind the practices promoted by the project maintainers.
## Vestigial tests
Test262 has been maintained for many years, and the practices used to write
tests have evolved alongside the needs of its consumers. When conventions
change, old tests are typically updated to accommodate the new practices. That
doesn't always happen, though, and one doesn't have to look very far to find
examples of tests which contradict the preferred patterns.
For instance:
- tests which expression expectations with `throw` statements inside of
conditional statements rather than the assertion API implemented by the
harness files (though this explicitness will always be desirable when
asserting the semantics of conditional statements and `throw` statements
themselves)
- tests with file names derived from section numbers in the 5th edition of
ECMA262, e.g. `built-ins/Array/15.4.5-1.js`
- tests which validate multiple behaviors, using elaborate comment blocks to
designate sections
- tests which use deprecated harness functions, e.g. `verifyEnumerable`,
`verifyConfigurable`, and `verifyWritable`
Since existing tests do not necessarily reflect the project's current
best-practices, it's especially important for test authors to familiarize
themselves with the contribution guidelines.
## Test generation
The project includes a software tool for generating test material from abstract
templates. The generator was designed to promote uniformity of coverage,
particularly for parts of the grammar that are used in many productions
(specifically: the destructuring assignment patterns introduced in the 6th
edition of ECMA262).
This tool makes it easy to introduce enormous numbers of tests. Introducing
more tests is not a goal unto itself, though. Test262 prioritizes coherence in
its coverage of the specification, and it recognizes that the value of tests,
as measured by their likelihood to identify defects, varies.
For these reasons, the maintainers urge restraint in the application of the
test generation tool.
## File structure
For practical reasons, tests are organized in a tree structure according to the
conventions of modern file systems. Unfortunately, this structure is not
expressive enough to model the semantics of a rich and evolving programming
language like ECMAScript. This means the common and crucial task of coverage
assessment will likely always be a challenge, but strong conventions around
file organization can help.
Tests for syntax-derived operations are organized according to the language
grammar, with directories used to describe non-terminals. For example, tests
for example, tests for [the `if`
statement](https://tc39.es/ecma262/#sec-if-statement) are located in [the
`tests/language/statements/if`
directory](https://github.com/tc39/test262/tree/main/test/language/statements/if),
and tests for [the `instanceof`
operator](https://tc39.es/ecma262/#sec-relational-operators) are located in
[the `tests/language/expressions/instanceof`
directory](https://github.com/tc39/test262/tree/main/test/language/expressions/instanceof).
Tests for built-in APIs are organized within [the `tests/built-ins`
directory](https://github.com/tc39/test262/tree/main/test/intl402) according to
the identifiers by which they can be accessed. There, directories describe the
sequence of properties that can be used from the global scope. For example,
tests for [the `Array.prototype.reduce`
method](https://tc39.es/ecma262/#sec-array.prototype.reduce) are located in
[the `tests/built-ins/Array/prototype/reduce`
directory](https://github.com/tc39/test262/tree/main/test/built-ins/Array/prototype/reduce),
while tests for [the `isNaN`
function](https://tc39.es/ecma262/#sec-isnan-number) are located in [the
`tests/built-ins/isNan`
directory(https://github.com/tc39/test262/tree/main/test/built-ins/isNaN)].
Built-ins which are defined only in [the ECMA-402
specification](https://tc39.es/ecma402/) follow a similar naming convention
within [the `tests/intl402`
directory](https://github.com/tc39/test262/tree/main/test/intl402).
[The `tests/annexB`
directory](https://github.com/tc39/test262/tree/main/test/annexB) holds tests
for the semantics described by [Annex B of
ECMA262](https://tc39.es/ecma262/#sec-additional-ecmascript-features-for-web-browsers).
The conventions for syntax-derived operations and built-in APIs as described
above are also applied within this directory.
[The `tests/harness`
directory](https://github.com/tc39/test262/tree/main/test/harness) stores tests
for the "harness" files which Test262 maintains to assist in test writing.
Directories are not generally applied beyond these limits; further
differentiation is instead achieved through structured file names which follow
ad-hoc conventions. This organization balances the need to group tests
logically with the need to discover tests.
Many consumers use file names as a way to compare test results across revisions
and between implementations. For this reason, tests files are rarely
re-organized after being accepted.
## Regression tests
It is possible to write tests for semantics which, while not explicitly
specified by ECMA262, are nonetheless valid according to the normative text.
Such tests are welcome in Test262, but their fitness is not a given. Consumers
from many constituencies value the coherence and consistency of the test suite,
and tests which disallow arbitrary extraneous behavior can degrade those
qualities. Because Test262 is not maintained as a repository of regression
tests, contributions which include these kinds of tests will be weighed against
their likelihood of identifying error in a plurality of implementations.
## Large tests
Test262 tests are typically very focused. The vast majority exercise just one
algorithm step/grammar production, and some are even more granular that that!
Some test contributors are uncomfortable splitting their work across files like
this. It's certainly unlike the practices that are common in modern application
development. In those settings, many tests are often grouped into the same file
and separated by function boundaries.
Test262 doesn't use the same approach as a typical application test suite in
order to limit complexity. The guidance of "one test per file" means that
consuming Test262 is relatively easy; there is no "test runner" API for
consumers to implement, and interpreting results is likewise straightforward.
It also lowers the barrier to entry for new contributors since there is no API
to learn.
## Syntax tests
When testing a syntactic feature of the language, it can be tempting to write
tests which verify that some bit of source text does *not* produce a syntax
error. Contributors should try to push beyond verifying only the lack of a
syntax error because almost all such tests also have observable semantics. It's
often better for a test to assert that the expected semantics are followed,
even when they may already be covered elsewhere.
However, this is not always desirable because verifying semantics invariably
requires inserting still more code, and that additional code may degrade the
tests' precision for verifying syntax.
When considering this tension, be aware that TC39 maintains [a separate project
called test262-parser-tests](https://github.com/tc39/test262-parser-tests).
This project was partially motivated by a desire to offer test material that
isn't (and perhaps cannot be) related to any specific grammar production. The
availability of that project may inform decisions about if, where, and how to
include tests for syntax in Test262.
## Avoiding abstraction
Contributors will occasionally suggest introducing new abstractions to reduce
duplication in tests. The maintainers set a relatively high bar for such
enhancements, both due to their many drawbacks and due to the aspects of
standards testing which limit their benefit.
The drawbacks to abstraction include:
- it degrades the tests by introducing unrelated semantics
- it discourages contributors by requiring them to learn more
- it frustrates implementers by making it harder to understand what's being
tested and what has failed
One of abstraction's common motivations is its tendency to reduce maintenance
costs by limiting duplication. TC39 has a very high standard for compatibility
between revisions of ECMA262. This gives us a certain assurance in Test262 that
maintainers of other test suites do not enjoy: Test262's tests are very rarely
invalidated. The project takes advantage of this by using a more declarative,
readable, and verbose style.
Abstraction has other motivations, so there will always be room for it to some
extent. When the benefits of a specific proposal outweigh the drawbacks, then
it should be well documented and also well-tested. Test262 maintains tests for
its "harness" abstractions in [a dedicated directory within the test suite
itself](https://github.com/tc39/test262/tree/main/test/harness).