Ivan Koblyk, Andrii Y. Berko, Lviv Polytechnic National University, Information Systems and Networks Department, 12, S. Bandery str., Lviv, Ukraine
Abstract
The quest for efficiency, scalability, and agility has been relentless in the ever-evolving software development landscape. Over the years, the software development paradigm has undergone significant transformations catalyzed by technological advancements and the quest for optimal resource utilization. From the rudimentary days of monolithic architectures to the contemporary era of microservices and dynamic development environments, the journey has been marked by a relentless pursuit of innovation and optimization. In dynamic software development environments powered by Kubernetes, the rapid creation and deletion of namespaces (virtual clusters) are common occurrences. However, without proper management, these namespaces can accumulate, leading to resource wastage and operational inefficiencies. Therefore, there is a pressing need for intelligent systems that can monitor and clean up these environments based on predefined criteria.
This article focuses on developing an intelligent management system tailored for dynamic software development environments. Specifically, it aims to create a Python application that monitors and deletes namespaces within Kubernetes clusters based on predefined time-to-live (TTL) parameters. Additionally, the system will provide insightful analytics on the utilization and lifecycle of these environments, offering valuable data on metrics such as the number of environments.
Keywords
Kubernetes, software development environment, time-to-live, cluster, namespaces.
1. Introduction
The evolution of software development is a testament to the human quest for efficiency and progress. It dates back to the early days of computing when programs were crafted as monolithic architectures. In these architectures, entire applications were built as single units, characterized by tightly integrated components and a lack of modularization [1].
As computing technologies advanced, the limitations of monolithic architectures became apparent. Software systems grew increasingly complex, making maintenance and scalability challenging tasks. In response to these challenges, the concept of modularization began to emerge. Developers sought to break down monolithic applications into smaller, more manageable components, laying the groundwork for modern software engineering practices [2].
One significant milestone in the evolution of software development was the introduction of virtualization. Virtualization revolutionized the IT landscape by enabling the creation of multiple virtual environments on a single physical machine. This breakthrough technology allowed for better resource utilization, increased scalability, and enhanced flexibility in software deployment [3]. Virtualization paved the way for further innovations in software development, including the rise of containerization. Containers encapsulate applications and their dependencies, providing lightweight, portable, and consistent environments across different platforms. This technology offered numerous benefits, including faster deployment, improved isolation, and enhanced reproducibility of software environments [4].
The culmination of these advancements in software development is container orchestration, with Kubernetes emerging as the de facto standard. Kubernetes automates the deployment, scaling, and management of containerized applications, providing powerful tools for orchestrating complex microservices architectures [5].
Dynamic environments in Kubernetes revolve around the creation, management, and deletion of ephemeral namespaces and resources within Kubernetes clusters. This dynamic characteristic allows organizations to deploy and manage applications with greater flexibility and efficiency [6]. However, along with this dynamism come challenges, particularly in handling redundant or no longer used resources effectively.
The primary challenge within dynamic Kubernetes environments lies in the proliferation of resources due to the creation and deletion of ephemeral namespaces and resources. As applications are deployed and updated, ephemeral namespaces and resources are dynamically created to accommodate the workload. However, when applications are scaled down or updated, these ephemeral resources may not always be properly cleaned up, leading to the accumulation of redundant or obsolete resources over time [7]. The presence of redundant or no longer used resources can pose several challenges for organizations operating Kubernetes clusters. Firstly, it can result in resource wastage and increased infrastructure costs. Ephemeral namespaces and resources that are no longer needed consume valuable computing storage and networking resources, leading to inefficiencies in resource utilization and higher operational expenses.
Moreover, redundant resources can also impact the performance and reliability of Kubernetes clusters. As the number of redundant or unused ephemeral namespaces and resources increases, the cluster’s control plane may become overwhelmed, leading to increased latency and decreased responsiveness. This can adversely affect the overall performance of applications running on the cluster and degrade the user experience.
Furthermore, redundant or no longer used resources can introduce security risks to Kubernetes clusters. Unused namespaces and resources may contain vulnerabilities or misconfigurations that malicious actors could exploit [8]. Additionally, these resources may inadvertently expose sensitive data or APIs, leading to potential data breaches or compliance violations.
Managing redundant or unused resources in dynamic Kubernetes environments requires proactive monitoring and cleanup strategies. Organizations can leverage automation tools and policies to identify and remove unused namespaces and resources automatically, reducing the risk of resource sprawl and improving overall cluster efficiency [9].
The intelligent system introduced in this article aims to automate the management of resources within dynamic software development environments, including containers and other infrastructure components within a single Kubernetes namespace. By automatically deleting and updating dynamic environments, the system seeks to optimize resource utilization and reduce operational overhead. It also has an analytical component that can help to keep track of resources deployed on the cluster at a particular moment in time.
2. Review of existing solutions
In the realm of addressing the challenges of managing ephemeral resources in Kubernetes clusters, a variety of solutions have surfaced to tackle this intricate task. Among the array of options, we will focus on two of the most prevalent solutions. Notably, one solution that has gained traction, despite its recent emergence, is “k8s-cleaner”. This tool, introduced just a few months ago, offers a novel approach to automating the cleanup process of unused namespaces and associated resources within Kubernetes environments [10].
With its focus on simplicity and automation, “k8s-cleaner” is a valuable addition to the Kubernetes ecosystem. It empowers administrators to manage and optimize resources within dynamic environments.
The key advantages of this tool are the ability to monitor and update the specific resource types as well as whole namespaces and the dry-run mode, which can be very useful, especially when dealing with production infrastructure. However, one missing component that could be useful in the ephemeral environments in Kubernetes is having some insight into environment statistics. Such information could help businesses to estimate costs, resources, and efforts.
The second option we will review is “ephemeral-namespace-operator.” RedHat developed it to manage ephemeral namespaces inside its cloud ecosystem [11].
So, considering that it already has a limitation compared to the previous solution, which is cloud and platform agnostic. In addition to that, its scope is limited to monitoring the namespaces only. Its key features also include the concept of namespace pools. It is a platform-specific concept that makes sure that, for some applications, the corresponding dependency services are present out of the box inside the newly created namespace.
Another aspect missing in both systems is the ability to convert ephemeral resources to static ones. This functionality can be useful when the development of a particular feature has widened and grown, demanding that the environment become a fulfilled static development environment.
So, after analyzing the existing analog systems, the following assumptions have been made about the functionalities that should be included in the developed intelligent system:
- The system should have the ability to set TTL for a specific environment
- There should be messaging platform integration to receive notifications about events happening with environments.
- There should be the ability to transform a dynamic environment into a static one.
- There should be a way to obtain statistics about the dynamic and static environments and a graphical representation of such statistics.
3. Methodology
It is important to note that the solutions reviewed in the previous section are developed as Kubernetes operators and, therefore, operate inside the Kubernetes cluster in the form of custom resource definitions. While this approach is a recommended one and easy to work with from the standpoint of user experience, in the system proposed in this article, we operate only at a level of regular Kubernetes resources. This can help better understand how the system works at a very low level without adding a level of abstraction. However, the system can be transformed to a Kubernetes operator later.
3.1. Overview of the proposed intelligent system
The developed system keeps track of the newly deployed Kubernetes namespaces using Helm package manager. Once the helm release is deployed, the system automatically registers it for tracking. The user can provide its own TTL (time-to-live) for the new release. In case the TTL has not been provided, the system will assign the default value automatically and send the notification to the user about the event. Then, the system periodically checks the currently deployed environment and its TTL value, decreasing it by one every cycle. Once the TTL is set to 0, the system will delete the environment from the cluster and send a notification to the user. The user can manually delete an environment (or a set of environments) via the graphical user interface. He can also convert the environment to static.
The user is notified after each of these events. Finally, there is an option to review the environment’s statistics for a particular moment. Users can query the number of currently deployed dynamic and static environments. They can also visualize and see this information over a specific period.
3.2. Architecture and components
The environment manager is shipped as a helm chart that contains everything necessary to run the system. The main component of the system is a Python script called ephemeral-tracker.py. It performs all the operations within the cluster. It runs periodically as a CronJob resource in Kubernetes. The schedule can be customized using the crontab syntax. The recommended approach is to have it running hourly. However, for testing and debugging purposes, we will be using a configuration that makes it run every minute to see the outcome results immediately. The pod where the script is executed runs a custom docker image. It is built specifically to have all necessary dependencies installed, such as Python, helm, kubectl, etc. The script is mounted on a pod as a configmap with executable permissions.
The very same image is used by the Tekton pipeline, which is designed to allow users to update environments via the Tekton Dashboard manually. It is possible to use another CI/CD tool for this purpose, but Tekton is used in this case for simplicity because of its native Kubernetes approach. Finally, there is a deployment that runs the Prometheus exporter script that scrapes the environment metadata that’s stored inside the Kubernetes secrets in a dedicated namespace called “ephemeral-environments-metadata.” Let’s focus on this part in more detail.
When the new environment is created (by new environment creation, we mean the new helm release installed on the cluster), within one CronJob cycle, its metadata is picked up by the ephemeral tracker and written into a Kubernetes secret. This data includes:
- The TTL (time to live) value in hours ( or minutes, depending on the CronJob schedule);
- The boolean value called “isAutoSuspended” indicates whether a release has been automatically suspended or not;
- Helm chart name;
- Helm chart version;
For every new cycle, CronJob decreases the TTL value by one unit. When the value reaches 0, the environment is automatically suspended. The script can set minimum and maximum values for the TLL. In our case, the minimum is 8, and the maximum is 120. This can be modified according to the user’s needs.
The way that the ephemeral tracker understands what resources to track is based on the yaml config, which is provided within the Helm values. Let’s take a look at the sample format:
The trackingCharts section includes a list of helm chart names that are supposed to be tracked. The staticEnvironments section is a list consisting of all the static environments that are supposed to be permanent and not be auto-deleted. It is important to note that this list can also be expanded by manually setting the environment to static. When a user performs a “makeStatic” operation through the Tekton pipeline, the TTL value of the environment is set to -1. This prevents the environment from automatically being deleted in the next cron cycle because the environment manager checks if TTL is equal to -1 in one of its conditionals. If it is, it ignores the environment and skips automatic updates.
All operations with cluster resources, such as namespace creation or deletion, secrets and configmap updates, etc., are done using the kubectl binary. Authentication to the Kubernetes API is done through the service accounts under which the pipeline or cronjob pods run.
The Prometheus exporter is exposed through the Kubernetes service and connected to the Prometheus instance via Service Monitor. For the sake of simplicity, the setup of Prometheus and Tekton for a Kubernetes cluster is a separate topic and will not be covered in this article.
4. Case Study
Let’s examine the real-world scenario now. In this example, we’ll use the Kind tool. It lets you easily spin up a new lightweight Kubernetes cluster using Docker containers as nodes [12]. The cluster will have one master and one worker node.
We will first deploy Prometheus and Tekton operators there; then, we’ll deploy a few instances of a helm chart with a simple frontend application. We’ll set the ephemeral-tracker config to track this chart, and we will also set the “frontend-uat” environment as static. Later, we will also try converting an existing dynamic environment to a static one manually through a pipeline. Finally, we will deploy an instance of “ephemeral-envs-manager” to the cluster to automatically track and update environments.
For this example, we have set a custom TTL value of 20 to one of the environments and left the rest with an unspecified value so that the system can assign the default value of 8. Let’s check the output of the first CronJob run and verify this.
As we can see, the system reported with correct environment metadata, meaning it is working as expected. So, currently, we have four dynamic and one static environment running on the cluster. Let’s try manually converting one of the dynamic environments to a static one. For this, we need to open the Tekton Dashboard, go to the TaskRuns tab, and create a new task run specifying the pipeline, action, and release name.
After completing the task successfully, we can check for a Slack notification about its status.
As we can see, the “frontend-test1” environment has been converted to static. We can also verify this by looking at the Prometheus data. The exporter publishes two types of metrics: dynamic_environments and static_environments. Let’s open the Prometheus web page and query these metrics to verify the results.
As we can see, the number of dynamic environments is 3, which confirms that the operation was successful since initially, we created five new environments, 4 of which were dynamic. Now, let’s verify the number of static environments.
As we can see, the static environment number is 2, which is the number we expected to see after running the Tekton pipeline. Let’s now verify the last piece of the system’s functionality: environment cleanup.
Since frontend-test2 and frontend-test4 have the same TTL of 8, they will be terminated together. Let’s wait for the final cycle for these two environments and check if they are removed automatically.
The notification received in Slack states that the frontend-test2 and frontend-test4 were suspended. Let’s verify this by checking the current namespaces running on the cluster using kubectl tool.
5. Conclusion
In this article, we have explored the implementation and capabilities of an intelligent system designed for managing dynamic software development environments, namely the Ephemeral Environment Manager. This system leverages advanced automation to enhance the efficiency and flexibility of development processes by automatically managing the lifecycle of environments based on time-to-live (TTL) parameters and providing manual intervention options through a Tekton pipeline. Additionally, the integration with Prometheus for real-time monitoring of static and dynamic environments showcases the system’s robustness in providing critical insights into resource utilization.
However, despite its effectiveness, there are opportunities for further enhancement to make the system even more versatile and powerful.
One potential improvement is the extension of the system’s tracking capabilities. Currently, the system primarily manages and monitors environments at the namespace level. Expanding this to track specific Kubernetes resources using labels would allow for more granular management and detailed insights. This finer level of detail would enable developers to pinpoint inefficiencies and optimize individual components within the broader environment.
References
- Baddula, The Evolution of Software Architecture: Monolithic to Microservices, 2023. URL: https://medium.com/@phanindra208/the-evolution-of-software-architecture-monolithic-to-microservices-cb62fcd7aa94.
- Semeraro, The Long and Winding Road to Microservices: The Evolution of Monolithic Architecture, 2023. URL: https://levelup.gitconnected.com/the-long-and-winding-road-to-microservices-the-evolution-of-mmonolithic-architecture-6cb69b5ab3bd.
- Athaunda, The Rise of Virtualization in Modern IT Infrastructure, 2023. URL: https://medium.com/@imanathauda/the-rise-of-virtualization-in-modern-it-infrastructure-900e94f9038b
- Strotmann, Infographic: A Brief History of Containerization, 2016. The evolution of containers: Docker, Kubernetes and the future, 2023. URL: https://www.plesk.com/blog/business-industry/infographic-brief-history-linux-containerization/
- Wang, A Brief History of Google’s Kubernetes and Why It’s Fantastic, 2022. URL: https://medium.com/fstnetwork/a-brief-history-of-googles-kubernetes-and-why-it-s-fantastic-658ad4248e3
- Crosby, Dynamic Environments with Kubernetes, 2017. URL: https://blog.container-solutions.com/dynamic-environments-kubernetes
- DavidW, Managing Orphaned Resources in Kubernetes, 2024. URL: https://overcast.blog/managing-orphaned-resources-in-kubernetes-865e05490478
- Rozen, Top 10 Kubernetes Security Risks Every DevSecOps Pro Should Know, 2022. URL: https://www.darkreading.com/cyber-risk/top-10-kubernetes-security-risks-every-devsecops-needs-to-know
- Faruqui, Ephemeral Environments in Cloud Infrastructure: Use Cases and Benefits, 2023. URL: https://www.withcoherence.com/articles/ephemeral-environments-in-cloud-infrastructure-use-cases-and-benefits
- Drosopoulou, Optimize Kubernetes with K8s-Cleaner, 2024. URL: https://www.javacodegeeks.com/2024/02/optimize-kubernetes-with-k8s-cleaner.html
- RedHatInsights, Ephemeral Namespace Operator (ENO), 2024. URL: https://github.com/RedHatInsights/ephemeral-namespace-operator
- Pundkar, KIND: A Modern Tool for Running Kubernetes Locally, 2023. URL: https://medium.com/@dinesh.pundkar/kind-a-modern-tool-for-running-kubernetes-locally-fe78ae134562