Prior to this patch, the CircleCI continuous integration environment was
configured to report test failures in a negative light, displaying red
cross-marks and reporting that "some checks were not successful" in
commits and GitHub Pull Requests which included them.
The passing/failing status of tests does not influence their
desirability for Test262. (In practice, engines very commonly fail
newly-contributed tests.)
Although these conflicting interpretations does not technically
interfere with the maintainers' ability to merge new contributions, it
does create confusion for many contributors who interpreted the UI as a
rejection of their work.
In addition, this behavior made it impossible to distinguish between the
benign test failures and disruptive infrastructural problems (e.g. the
crashing of engines).
Reconfigure the continuous integration environment to accept passing and
failing tests equally, and to only report a problem when the Test262
project's testing infrastructure behaves unexpectedly.
The file is derived from the same-named file for SpiderMonkey, therefore I've
kept the MPL license info.
The next commits use this script to generate language tag mappings data.
Verify that every test file which references a harness file using the
"includes" directive also contains at least one reference to a value
defined in the harness file.
To support this check, extend each harness file with a list of values
which it defines.
The command-line interface to Python's "unittest" module allows users to
run tests by name, e.g.
$ test.py TestLinter.test_foo
If the tests include a period, they cannot be referenced in this way.
Rename the tests to omit the filename extension so that they may be
referenced from the command-line in this way.
Prior to this commit, the tests for the linter partially depended on
project files. This coupling risks churn in the tests: if the needs of
the project change (e.g. a harness file is modified), then the linter
tests would need to be updated. It also made the tests more difficult to
understand because the input was both larger and more complicated than
necessary to exercise the relevant functionality.
Execute the tests in the context of the fixture directory and introduce
minimal support files that serve the same purpose as the corresponding
project files.
* Generation: Use Python 3-compatible imports.
* Generation: Use range() instead of xrange().
* Generation: Use list comprehensions instead of map().
* Generation: Explicitly use bytes in the Test class.
* Generation: Run unit tests on Python 3 as well.
the double dot gets only commits not in the left side of the double dot.
The right side assumes HEAD.
This will allow us to fetch the correct list of new or modified files in the current branch.
This makes it possible to run the script locally even if `TRAVIS_PULL_REQUEST` is not set.
Currently, it results in an error:
tools/scripts/ci_build.sh: line 2: [: !=: unary operator expected
Update the lint.py script to work with pip 10+ as pip.req was moved to
pip._internal.req in version 10 onwards and the existing code only works
on versions of pip under or equal to 9.0.3
The document fragments used by the ECMAScript specification do not
conform to any particular pattern beyond the grammar defined by the URL
standard [1]. Relax the linting rule to enforce a simplified version of
that grammar.
[1] https://url.spec.whatwg.org/#fragment-state
Early errors may result from parsing the source text of a test file, but
they may also result from parsing some other source text as referenced
through the ES2015 module syntax. The latter form of early error is not
necessarily detectable by ECMAScript parsers, however. Because of this,
the label "early" is not sufficiently precise for all Test262 consumers
to correctly interpret all tests.
Update the "phase" name of "early" to "parse" for all those negative
tests that describe errors resulting from parsing of the file's source
text directly. A forthcoming commit will update the remaining tests to
use a "phase" name that is more specific to module resolution.
Previously, test consumers were encouraged to insert a `throw` statement
as the first statement of tests for early errors. This recommendation
made tests harder to consume, and as an optional transformation,
consumers may have ignored it or simply been unaware it was made. By
explicitly including such a `throw` statement, the tests become more
literal, making them easier to consume and more transparent in their
expectations.
Document expectation for all tests for early errors to include an
explicit `throw` statement. Extend linting script to verify that
contributors are automatically notified of violations and to ensure that
future contributions satisfy this expectation.
A recent commit introduced a document that enumerated acceptable values
for the test "features" metadata tag. However, this list was incomplete,
and maintaining it placed extra burden on the project owners.
Restructure the document into a machine-readable format. Add entries for
all previously-omitted values. Add in-line documentation with
recommendations for maintenance of the file. Extend the project's
linting tool to validate tests according to the document's contents.
This script is intended to identify common test file formatting errors
prior to their acceptance into the project. It is designed to support
future extensions for additional validation rules.
In order to promote readability of the generated test material, the test
generation tool may insert whitespace if the context a given expanded
variable calls for it. Avoid inserting such whitespace within literal
values that span multiple lines.
Since the argument is required, we mark it as so. Using this approach
gives the user a much nicer error message, as compared to just the "not
enough args" message.
When inspecting previously-generated files, a new `Test` instance should
be used. This avoids over-writing the in-memory representation of the
latest test, and allows previously-existing test files to be partially
updated according to subsequent changes in their respective source/case
files.
In expecting "case directories" to contain a sub-directory named
"default", the test generation tool is unable to generate tests for
features where a directory named "default" is not appropriate.
Modify the heuristic that identifies "case directories" to use a more
fundamental aspect (i.e. the existence of at least one "case" file).
For asynchronous tests, the contract between test file and test runner
is implicit: runners are expected to inspect the source code for
references to a global `$DONE` identifier.
Promote a more explicit contract between test file and test runner by
introducing a new frontmatter "tag", `async`. This brings asynchronous
test configuration in-line with other configuration mechanisms and also
provides a more natural means of test filtering.
The modifications to test files was made programatically using the
`grep` and `sed` utilities:
$ grep "\$DONE" test/ -r --files-with-match --null | \
xargs -0 sed -i 's/^\(flags:\s*\)\[/\1[async, /g'
$ grep "\$DONE" test/ -rl --null | \
xargs -0 grep -E '^flags:' --files-without-match --null | \
xargs -0 sed -i 's/^---\*\//flags: [async]\n---*\//'
When executing multiple tests in parallel, each "child" thread would
write to the process's standard output buffer immediately upon test
completion. Because thread execution order and instruction interleaving
is non-deterministic, this made it possible for characters to be emitted
out-of-order.
When extended to support multiple concurrent threads, the runner was
outfitted with a "log lock" dedicated to sharing access to the output
file (when applicable). Re-use this lock when writing to standard out,
ensuring proper ordering of test result messages.
A recent extension to the test runner introduced support for running
tests in parallel using multi-threading. Following this, the runner
would incorrectly emit the "final report" before all individual test
results.
In order to emit the "final report" at the end of the output stream, the
parent thread would initialize all children and wait for availability of
a "log lock" shared by all children.
According to the documentation on the "threading" module's Lock object
[1]:
> When more than one thread is blocked in acquire() waiting for the state
> to turn to unlocked, only one thread proceeds when a release() call
> resets the state to unlocked; which one of the waiting threads proceeds
> is not defined, and may vary across implementations.
This means the primitive cannot be used by the parent thread to reliably
detect completion of all child threads.
Update the parent to maintain a reference for each child thread, and to
explicitly wait for every child thread to complete before emitting the
final result.
[1] https://docs.python.org/2/library/threading.html#lock-objects
Adds a `-j`/`--workers-count` parameter to `tools/packaging/test262.py`, defaulting to `[number of cores] - 1`.
Speeds up running the test suite by about ~3x on my 4-core machine, with the SpiderMonkey shell. This could certainly be optimized more by just appending test results to per-thread lists and merging them at the end, but it's better than nothing.