From ac4d1a0cb1e4ae4de8f864542e9a24406cb584fd Mon Sep 17 00:00:00 2001 From: ClementTsang <34804052+ClementTsang@users.noreply.github.com> Date: Sun, 11 Aug 2024 18:22:41 -0400 Subject: [PATCH] add readme for data collection --- src/data_collection/README.md | 44 +++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) create mode 100644 src/data_collection/README.md diff --git a/src/data_collection/README.md b/src/data_collection/README.md new file mode 100644 index 00000000..1d8a955a --- /dev/null +++ b/src/data_collection/README.md @@ -0,0 +1,44 @@ +# Data Collection + +Data collection in bottom has two main components: **sources** and **collectors**. + +**Sources** are either libraries or system APIs that actually extract the data. +These may map to multiple different operating systems. Examples are `sysinfo`, +or `libc` bindings, or Linux-specific code. + +**Collectors** are _platform-specific_ (typically OS-specific), and can pull from +different sources to get all the data needed, with some glue code in between. As +such, sources should be written to be per-"job", and be divisible such that +collectors can import specific code as needed. + +We can kinda visualize this with a quick-and-dirty diagram (note this is not accurate or up-to-date): + +```mermaid +flowchart TB + subgraph sources + direction TB + linux + windows + macos + unix + sysinfo + freebsd + end + subgraph collectors + direction TB + Linux + Windows + macOS + FreeBSD + end + linux -..-> Linux + unix -..-> Linux + sysinfo -..-> Linux + windows -..-> Windows + sysinfo -..-> Windows + macos -..-> macOS + unix -..-> macOS + sysinfo -..-> macOS + freebsd -..-> FreeBSD + sysinfo -..-> FreeBSD +```