Kubernetes cluster size best practices

Kubernetes helps manage the lifecycle of hundreds of containers deployed in pods. It is highly distributed and its parts are dynamic. An implemented Kubernetes environment involves several systems with clusters and nodes that host hundreds of containers that are constantly being spun up and destroyed based on workloads. When dealing with a large pool of containerized applications and workloads in Kubernetes, it is important to be proactive with monitoring and debugging errors.

These errors are seen at the container, node, or cluster level. In the case of Kubernetes, logs allow you to track errors and even to fine-tune the performance of containers that host applications. Source: kubernetes.

kubernetes cluster size best practices

The first step is to understand how logs are generated. With Kubernetes, logs are sent to two streams — stdout and stderr. These streams of logs are written to a JSON file and this process is handled internally by Kubernetes. A best practice is to send all application logs to stdout and all error logs to stderr.

Kubernetes recommends using sidecar containers to collect logs. The sidecar model helps to avoid exposing logs at the node level, and it gives you control over logs at the container level. The problem with this model, however, is that it works well for low-volume logging, but at scale, it can be a resource drain. The alternative is to use a logging agent that collects logs at the node level. This accounts for little overhead and ensures the logs are handled securely. Fluentd has emerged as the best option to aggregate Kubernetes logs at scale.

Opting for a managed K8s service like Platform9 even gives you the ease of a fully-managed Fluentd instance without you having to manually configure or maintain it.

Traditionally, with on-prem server-centric systems, application logs are stored in log files located in the system. These files can be seen in a defined location or can be moved to a central server. The log files would be lost when the pod is deleted. This can be an issue when trying to troubleshoot with part of the log data missing. Kubernetes recommends two options: send all logs to Elasticsearch, or use a third-party logging tool of your choice.

Google Cloud Kubernetes: Deploy Your First Cluster on GKE

Here again, there is a choice to make. Each tool has its own role to play. As mentioned above, Fluentd aggregates and routes logs.Kubernetes made a splash when it brought containerized app management to the world a few years back. Now, many of us are using it in production to deploy and manage apps at scale. Here are some of the most popular posts on our site about deploying and using Kubernetes. Want a refresher? Get started for free. Google Cloud Kubernetes. Google Cloud Blog Team.

Free Trial. Simple tasks get more complicated as you build services on Kubernetes. Using Namespaces, a sort of virtual cluster, can help with organization, security, and performance. This post shares tips on which Namespaces to use and not to usehow to set them up, view them, and create resources within a Namespace.

Use readiness and liveness probes for health checks. Managing large, distributed systems can be complicated, especially when something goes wrong. Kubernetes health checks are an easy way to make sure app instances are working. Creating custom health checks lets you tailor them to your environment. This blog post walks you through how and when to use readiness and liveness probes. Keep control of your deployment with requests and limits. However, you do still have to keep an eye on resources to make sure containers have enough to actually run.

Learn more in this post about using requests and limits to stay firmly in charge of your Kubernetes resources. And there are a few different ways to connect to these services, like external service endpoints or ConfigMaps.

Decide whether to run databases on Kubernetes. It can make life easier to use the same tools for databases and apps, and get the same benefits of repeatability and rapid spin-up. This post explains which databases are best run on Kubernetes, and how to get started when you decide to deploy.

Understand Kubernetes termination practices. All good things have to come to an end, even Kubernetes containers. The key to Kubernetes terminations, though, is that your application can handle them gracefully.

kubernetes cluster size best practices

This post walks through the steps of Kubernetes terminations and what you need to know to avoid any excessive downtime. Use Kubernetes Namespaces for easier resource management. Discover services running outside the cluster.AKS provides flexibility in how you can run multi-tenant clusters and isolate resources.

To maximize your investment in Kubernetes, these multi-tenancy and isolation features should be understood and implemented. This best practices article focuses on isolation for cluster operators. In this article, you learn how to:. Kubernetes provides features that let you logically isolate teams and workloads in the same cluster.

The goal should be to provide the least number of privileges, scoped to the resources each team needs. A Namespace in Kubernetes creates a logical isolation boundary. Additional Kubernetes features and considerations for isolation and multi-tenancy include the following areas:. Best practice guidance - Use logical isolation to separate teams and projects. Try to minimize the number of physical AKS clusters you deploy to isolate teams or applications.

With logical isolation, a single AKS cluster can be used for multiple workloads, teams, or environments. Kubernetes Namespaces form the logical isolation boundary for workloads and resources.

Logical separation of clusters usually provides a higher pod density than physically isolated clusters. There's less excess compute capacity that sits idle in the cluster. When combined with the Kubernetes cluster autoscaler, you can scale the number of nodes up or down to meet demands. This best practice approach to autoscaling lets you run only the number of nodes required and minimizes costs.

Kubernetes environments, in AKS or elsewhere, aren't completely safe for hostile multi-tenant usage. In a multi-tenant environment multiple tenants are working on a common, shared infrastructure. As a result if all tenants cannot be trusted, you need to do additional planning to avoid one tenant impacting the security and service of another.

Additional security features such as Pod Security Policy and more fine-grained role-based access controls RBAC for nodes make exploits more difficult.

However, for true security when running hostile multi-tenant workloads, a hypervisor is the only level of security that you should trust. The security domain for Kubernetes becomes the entire cluster, not an individual node. For these types of hostile multi-tenant workloads, you should use physically isolated clusters.

Best practice guidance - Minimize the use of physical isolation for each separate team or application deployment. Instead, use logical isolation, as discussed in the previous section. A common approach to cluster isolation is to use physically separate AKS clusters.

In this isolation model, teams or workloads are assigned their own AKS cluster. This approach often looks like the easiest way to isolate workloads or teams, but adds additional management and financial overhead.

You now have to maintain these multiple clusters, and have to individually provide access and assign permissions. You're also billed for all the individual nodes. Physically separate clusters usually have a low pod density.

Hacking and Hardening Kubernetes Clusters by Example [I] - Brad Geesaman, Symantec

As each team or workload has their own AKS cluster, the cluster is often over-provisioned with compute resources. Often, a small number of pods are scheduled on those nodes.

Unused capacity on the nodes can't be used for applications or services in development by other teams. These excess resources contribute to the additional costs in physically separate clusters. This article focused on cluster isolation. For more information about cluster operations in AKS, see the following best practices:. You may also leave feedback directly on GitHub. Skip to main content.GKE is easy to set up and use, but can get complex for large deployments or when you need to support enterprise requirements like security and compliance.

Read on to learn how to take your first steps with GKE, get important tips for daily operations and learn how to simplify enterprise deployments with Rancher. Kubernetes was created by Google to orchestrate its own containerized applications and workloads.

Google was also the first cloud vendor to provide a managed Kubernetes service, in the form of GKE. GKE is a managed, upstream Kubernetes service that you can use to automate many of your deployment, maintenance and management tasks. It integrates with a variety of Google cloud services and can be used with hybrid clouds via the Anthos service. Part of deciding whether GKE is right for you requires understanding the cost of the service.

The easiest way to estimate your costs is with the Google Cloud pricing calculator. This fee does not apply to Anthos clusters, however, and you do get one zonal cluster free. Billing is calculated on a per-second basis. At the end of the month, the total is rounded to the nearest cent. Your cost for worker nodes depends on which Compute Engine Instances you choose to use. All instances have a one-minute minimum use cost and are billed per second.

You are billed for each instance you use and continue to be charged until you delete your nodes. Select your project and enable the API. The Cloud Shell is designed for quick startup and comes preinstalled with the kubectl and gcloud CLI tools.

The gcloud tool is used to manage cloud functions and kubectl is used to manage Kubernetes. If you want to use your local shell, just make sure to install these tools first. Clusters are composed of one or more masters and multiple worker nodes.

When creating nodes, you use virtual machine VM instances which then host your applications and services. To create a simple, one-node cluster, you can use the following command. However, note that a single node cluster is not fit for production so you should only use this cluster for testing. Once your cluster is created, you need to set up authentication credentials before you can interact with it.Comment 0. Kubernetes K8S is an open-source container orchestration tool that can automatically scale, distribute, and handle faults on containers.

Originally created by Google and donated to the Cloud Native Computing FoundationKubernetes is widely used in production environments to handle Docker containers although it supports other containers tools such as rkt in a fault-tolerant manner. Security should be a top priority for any production system and must be even stricter when securing clusters since they involve more moving parts that need to cooperate with one another.

Securing a simple system involves maintaining good practices and updated dependencies, but to secure an environment, whether clustered or not, the operator needs to evaluate the communications, images, operational system, and hardware issues.

Data breaches, Denial of Service attacks, stolen sensitive information or simply downtime, can all be avoided with solid security policies. As an open-source system for automating the deployment, scaling, and management of containerized applications, Kubernetes impacts many runtime security functions. As with any open source project, issues are rapidly discovered but any user must keep their software updated to avoid opportunistic attacks. In the following sections, we'll take a deep dive into some security practices that will help you avoid issues when deploying your own Kubernetes instance.

Clusters are a group of servers working together as a single system. As such, they are complex, need to be constantly updated and monitored and, as with any distributed system, can be more prone to failure. In addition to the typical security issues that involve any computer software, such as bad programming and out-of-date dependencies, clusters have their own specific security pitfalls. For example, a bad network configuration can expose the entire computing system to an unauthorized user; a single node with an outdated operating system can lead to a breach of all your machines; a system subjected to a DoS attack could lead to one or more machines being rendered unusable.

In this case, one of the most important tasks assigned to system administrators is to ensure the security of cluster installations while, at the same time, permitting easy access to legitimate users. We should divide the topic into three areas: the cluster, as a set of virtual machines that should be orchestrated and maintained, the container's structure and its communications, and the applications inside those containers.

Kubernetes is the world's most popular container orchestration tool and is here to stay. Still, in the Kubernetes environment, there are threats that may result in compromises and undesirable scenarios, including an elevation of privileges, exfiltration of sensitive data, a compromise of operations, or a breach of compliance policies. These issues can be mitigated by following strict security practices such as the following:.

A cluster can be used for different environments and different purposes: it can have services for several production products and even for a variety of purposes: testing, staging, production, and so on. It is important to separate these into different namespacesso you can control access to the resources the service has access to. Namespaces create a network layer with resources within the same space.Comment 0. Cloud computing is one of the most active and significant fields in modern computer science.

For both businesses and individuals, it allows the kind of interconnectedness and productivity gains that can completely transform the way that we work and live. In order to utilize this technology to its full potential, businesses need to carefully consider the exact setup that they use.

Compatibility is one of the biggest challenges that any dynamic IT system faces. In situations where new products, hardware, and software are regularly being introduced into the ecosystem, all it takes is one incompatible component to completely disrupt the workflow.

An elegant solution to this problem used to be to make use of virtual machines. Containers are essentially mini-VMs. Docker is probably the best-known container for Linux, and Microsoft Azure has also been expanding Windows capabilities in this regard too. A Kubernetes Pod refers to a group of containers which have been deployed on a single host. They can, therefore, work together more efficiently. This is a very powerful concept in container management and orchestration.

By adhering to the following best practices, you can utilize the massive potential behind Kubernetes to its fullest effect. Before you start looking around for base images, you should have a good idea of what it is you need to get out of your final setup in terms of functionality. If you only have a vague idea, try and refine it as much as possible before you begin searching for base images to use.

This will allow you to analyze potential packages in detail and make sure that they contain what you need, with as little excess as possible.

If the app you need is only 15 MB in size, it would be a waste of resources to use an image with a MB library. Of course, you will have to contend with some excess in most situations. However, the smaller the image you can use, the faster your container will build, the less space it will require, and often it will reduce the attack surface, therefore, enhancing your overall security. For the most part, it makes sense for a Pod to run as an abstraction over one container only. However, you can employ multiple Pods to tightly couple helper processes to your primary lead Pod, such as for log monitoring.

Running multiple containers in Pods is also a viable solution when using a Service Mesh to connect, manage, and secure microservices as these will intercept all communication between the individual microservice components. Many of the most common mistakes people make when using Kubernetes occur when they are selecting the base image to build their container from.

kubernetes cluster size best practices

You could find yourself with the wrong version of the package you need which will throw up numerous compatibility and functionality issues. Worse still, the image could contain malware, spyware, or dreaded ransomware. Any malicious content on a corporate network is cause for serious concern. When packages within your container are updated, the privileges of the root user are required.Everyone running a Kubernetes cluster in production wants it to be reliable.

Many administrators implement a multi-master setup, but often this isn't enough to consider a cluster highly available. If one part of the system fails in any way, the system can recover without significant downtime.

Getting started

So how exactly can you achieve a highly available, highly reliable, and multi-master Kubernetes cluster? The Kubernetes control plane consists of the controller manager, scheduler, and API server. When running a highly available Kubernetes cluster, the first thing to focus on is running multiple replicas of these control plane components. Kubernetes is designed to handle brief control plane failures. Workloads will continue to run and be accessible on the worker nodes.

However, if worker nodes fail, the control plane is not available to reschedule the work or to reconfigure routing within the cluster. This can cause workloads and services to be inaccessible. Google Kubernetes Engine's regional clusters run the machines that make up the clusters across Google Compute Engine zones.

On Google Compute Engine, regions are independent geographic areas, or a campus of data centers that consists of zones. Zones are deployment areas for cloud platform resources within a region. From an availability standpoint, zones can be considered a single point of failure. Currently on Google Compute Engine, you have a choice of 18 regions, 55 zones, and over points of presence across 35 countries.

Once you have chosen the failure domains that you care about, you can run multiple replicas of the control plane across those domains. The idea behind replication is to run multiple copies of a process so that if one fails, another one can pick up the job.

If you're running multiple copies of the same workload, you need to define where those are running. For us that means different Google Compute Engine zones that are physically located in different places. For someone else that could mean simply running on different racks within a data center.

Its regional clusters reduce the risk of this because the replicas are run across Google Compute Engine zones. In fact, Google Kubernetes Engine itself is a global service.

In a reliable system, etcd needs to be able to handle failures without losing data. It's critical to keep etcd running because it functions as the brain of the entire cluster. Like the control plane components above, etcd can run with multiple replicas. However, etcd is a bit more complicated because it is stateful, so the data stored in one replica needs to match all the others.

A common pattern for running in a distributed system is active-passive replication.

Kubernetes Security Best Practices

A single instance is elected as the leader, while other instances wait for the leader to go down to take over. This is a simple form of leader election. This pattern works well for stateless components, such as the controller-manager, but it has reliability implications for stateful components, such as etcd. Etcd uses a quorum-based, leader election algorithm that requires a strict majority of replicas to elect a leader.

Cluster members elect a single leader, and all other members become followers. You can never prevent or predict all the failures that could happen in a system.

Ideally, backups run regularly and are stored somewhere that is isolated from where the cluster is running. It's critical to test the freshness of your periodic backups in addition to regularly testing the restore action from your backups.

Kubernetes makes running your own applications with high availability easier, but it is not automatic. The first question to ask when configuring your application is whether you need to use a leader election in your application.

There is no one right answer, since different leader election algorithms might make sense for different applications. For example, etcd uses a quorum-based leader election algorithm, while the Kubernetes scheduler uses an active-passive leader election. Similar to what you do with the control plane components, you want to run multiple replicas of your own workload.

Kubernetes cluster size best practices