Previously there was one runner per test target (mostly VMs). This had
a few limitations:
- multiple tests that ran on the same target (eg multiple build
configs) were serialized on availability or that runner.
- it needed manual balancing of VMs over host machines.
To address this, make VMs that use ephemeral disks (ie most of them)
all use a pool of runners with the "libvirt" label. This requires that
we distinguish between "host" and "target" for those. Native runners
and VMs with persistent disks (eg the constantly-updated snapshot ones)
specify the same host and target.
This should improve test throughput.
Add "compat-tests" to the default TEST_TARGET so we can override as
necessary. Override TEST_TARGET for Cygwin as the tests don't currently
compile there.
The default job timeout of 360 (6h) is not enough to complete the
regress tests for some of the slow VMs depending on the load on the host.
Increase to 600 (10h).
This also moves the cygwin package install from the workflow file to
setup_ci.sh so that we can install different sets of Cygwin packages
for different test configs.
In addition to installing the requisite Cygwin packages, we also need to
explicitly invoke "sh" for steps that run other scripts since the runner
environment doesn't understand #! paths.
Ubuntu 22.04 defaults to private home dirs which prevents "nobody"
running ssh-add during the agent-getpeereid test. Check for this and
add the necessary permissions.
Valgrind doesn't let ssh exec ssh-keysign (because it's setuid) so skip
it during the Valgrind based tests.
See https://bugs.kde.org/show_bug.cgi?id=119404 for a discussion of this
(ironically there the problematic binary was ssh(1) back when it could
still be setuid).
If a previous run on a physical runner has failed to clean up, the next
run will fail because it'll try to check out the code to a broken
directory mount. Make cleanup the first step.