Streaming Build Logs
July 11, 2022
product
announcement
preview
devops
The PreviewHQ product is a hosted ephemeral Preview Environment service. It contains both an internal build and deployment service, and logs from these services need to be available to users.
The initial implementation of these log streams was based entirely on Kubernetes pod logs. On a request for logs, the application backend:
- Queried for the name of the Kubernetes pod that ran that workflow step
- Created a kubernetes client to tail the logs of that pod
- Filter out any internal application logs
- Stream each log to the client
While this approach was relatively simple to implement it had a number of core issues:
- Logs were only available for recent workflows. If a completed pod had been pruned by Kubernetes, the logs were no longer available. Ie, there is no long-term storage of logs
- Filtering out “internal” logs from the log stream was handled by the consumer application (backend server) rather than the log producer (build/deploy system)
- This solution is dependent on the availability of the Kubernetes control plane. This is generally a bad practice when it’s not required
- Similiarly, this solution is dependent on being run within Kubernetes, and the implementation of the workflow engine
A New Path Forward
The main considerations of a new log system were to:
- Allow access to historical logs
- Remove the coupling of the log streaming system from the implementation of the workflow engine
- Avoid building additional internal services
At the center of this new approach is Redpanda; a kafka-compatible streaming log stream platform.
The internal workflows were updated to utilize a new job-specific python logger instance with a KafkaLoggingHandler sink. This allowed us to control separation of internal platform logs and user logs at the producer, and publish to job-specific topics with deterministic names. On shutdown, a sentinel is published to the topic to indicate the end of the topic. By keeping this new logger instance also writing to stdout
, it maintained backwards compatibility with the previous implementation.
Now on a request for logs, the application server is able to create a Kafka consumer to serve logs from the deterministic topic name. This trivially allows us to serve both existing logs, as well as stream real-time events published to the topic. For “historical logs”, we utilize Redpanda’s provided Tiered Storage, which offloads older segments to cloud storage. When requested, Redpanda transparently downloads the required segments. This would lead to increased latency on requests for historical logs but is an accepted tradeoff for the provided simplicity.
This new approach is independent of the workflow engine implementation, provided at the jobs sink logs into the applicable redpanda topic. It is also no longer dependent on the availability of the Kubernetes control plane, nor does the log-serving application require any Kubernetes RBAC permissions.
Future Improvements
With the changes put in place, it drastically increases the ease of implementing future log-based features, such as:
- Log Exports
- External Log Forwarding
- Application Logs in Preview Environments