How does Splunk OpenTelemetry Collector (Agent) work as Kubernetes Daemonset?
The purpose of this article is to explain the workings for the curious minds. I will break down the question by the definitions.
- Splunk OpenTelemetry Collector is a “wrapper around” (not a fork of) OpenTelemetry Collector. The purpose of the wrapper is to enhance the user’s experience (easy to use) in setting up OpenTelemetry Collector. Hence, Splunk OpenTelemtry Collector is 100% upstream compatible.
- Agent mode.
- Kubernetes Daemonset ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.
How It Works
One of the purposes of OpenTelemetry Collector (Agent) is to receive spans from application.
Using this architecture diagram, we know that the application is in one pod. The OpenTelemetry Collector (Agent) is in another pod. Both pods are on the same node. We also know that both app pod and ds pod will always be available on the same Node. It’s because a Daemonset ensures that all Nodes run a copy of a Pod. Hence, whichever Node the application runs on, there will always be an OpenTelemetry Collector (Agent).
Some possible patterns for communicating with Pods in a DaemonSet are:
- Push: Pods in the DaemonSet are configured to send updates to another service, such as a stats database. They do not have clients.
- NodeIP and Known Port: Pods in the DaemonSet can use a hostPort, so that the pods are reachable via the node IPs. Clients know the list of node IPs somehow, and know the port by convention.
- DNS: Create a headless service with the same pod selector, and then discover DaemonSets using the endpoints resource or retrieve multiple A records from DNS.
- Service: Create a service with the same Pod selector, and use the service to reach a daemon on a random node. (No way to reach a specific node.)
My favourite is pattern #2 i.e. NodeIP. It is because this approach is simple to implement in deployment files and scales well across multiple namespaces.
Another favourite is pattern #4, it is also simple to implement. The downside is that pattern #4 is using a random node. I wonder if this would work with tail-based sampling (at Splunk it is full fidelity no sample approach so shouldn’t be a concern).
How To Configure
Pattern #2 — add below config to the pod or deployment yaml:
The official guide on how to do it: Expose Pod Information to Containers Through Environment Variables.
Note: This technique was introduced in Kubernetes 1.7 link.
FAQ
Q: What configuration allowed deployment pods to reach a daemonset pod using the Node IP address?
A: We could put our daemonset pods in the host network space with `hostNetwork: true`. This way our deployment pods can reach our daemonset pod on the Node IP.
Source: https://stackoverflow.com/a/50221453/3073280
Another source: https://stackoverflow.com/a/65059275/3073280
Q: Can the OpenTelemetry Collector (Agent) pod in one namespace work across multiple namespaces? Do I need to install the Daemonset in every namespace?
A: No. It’s because the daemonset privileges will allow the OpenTelemetry Collector (Agent) pod to see all namespaces.
Disclaimer
My name is Jek. I am a Sales Engineer specialising in Splunk Observability Cloud. I wrote this to document my learning.
The postings on this site are my own and do not represent the position or opinions of Splunk Inc., or its affiliates.