as they would be re-connected in Resume() (HA).
Before they were still connected during pause and connected X+1 times
after X split-brains (the same data was written X+1 times).
m_DataBuffer may be modified concurrently while StatsFunc() is called, thus
it's unsafe to call size() on it. As write access to m_DataBuffer is already
synchronized by only modifying it from the single work queue thread, instead of
adding a mutex, this commit adds a new std::atomic_size_t which is additionally
updated when modifying m_DataBuffer and can safely be accessed in StatsFunc().
There is no explicit synchronization of access to m_DataBuffer which is fine if
it is only accessed from the single-threaded work queue. However, Stop() also
called Flush() in another thread, leading to concurrent write access to
m_DataBuffer which can result in a crash due to use after free/double free.
Changes in this commit:
* Flush() is renamed to FlushWQ() to show that it should only be called from
the work queue. Additionally, it now asserts that it is running on the work
queue.
* Visibility of some data members is changed from protected to private. No
other classes have to access these at the moment. By this change, accidental
concurrent access from derived classes in the future is prevented.
* Stop() now flushes by posting FlushWQ() to the work queue and joining it.
Boost.Beast changed the signature of the previously used generic `set`
method so that it no longer accepts integer types, however there is
alreay a more specific method for setting the Content-Length header, so
use this one instead.
What does this change?
* Remove use of spaces for formatting
These could be found by using `grep -r -l -P '^\t+ +[^*]'
* Removal of training whitespaces
* A few lines longer than 120 chars
Rather than leaving stale connections about we tried to poll for data coming in
from InfluxDB and timeout if it didn't repond in a timely manner. This introduced
a race where the timeout triggers, a context switch occurs where data is actually
available and the TlsStream spins trying to asynchronously notify that data is
available, but which never gets read. Not only does this use up 100% of a core,
but it also slowly starves the system of handler threads at which point metrics
stop being delivered.
This basically removes the poll and timeout, any TLS socket erros should be
detected by TCP keep-alives.
Fixes#5460#5469