Co przyniesie rok 2022 dla DevOps

Co to był za rok! Chyba nigdy tyle nie obejrzałem seriali na Netflix siedząc w domu, ale też nigdy nie przeczytałem tak wielu książek. A ilu rzeczy się też nauczyłem – dużo o technologii, a jeszcze więcej o sobie. Ile też konferencji się odbyło, które zostały przeniesione do online – niestety moim zdaniem niekorzystnie dla nich. Wyszły 3 wersje Kubernetesa, pojawił się Terraform 1.0 i było też dużo ciekawych, dużych awarii clouda. Oj działo się, nie powiem. Ale najważniejszym dla mnie był początek wydawania regularnego tego newsletter – to była wyśmienita decyzja!
Postanowiłem tym razem wyciągnąć szklaną kulę i zabawić się we wróżbitę Tomasza 🙂 Przedstawiam Ci moje przewidywania na rok 2022. Skupię się na technologii, DevOps i moich planach.

Kubernetes się starzeje dojrzewa

To już nie nowość – Kubernetes to już standard. Projekt wciąż prężnie się rozwija, a jego API dojrzewa. Nie ma spektakularnych nowości i nie będzie ich zbyt dużo również kolejnych wersjach wydawanych w 2022. Taki los dobrych produktów – przestaje być o nich głośno, gdyż stają się częścią naszej codzienności. Podobnie było z Linuksem – kiedyś grupa fascynatów zaczynała go używać również na desktopach, a teraz to codzienność w całym świecie IT. Jednym z ważnych trendów jest dalsza modularyzacja Kubernetesa i wyodrębnienie poszczególnych komponentów spoza głównego kodu. Dotyczy to głównie aspektów sieciowych (to się już dzieje – pluginy CNI), ale też coraz częściej obsługi wolumenów (drivery CSI zamiast providerów w głównym kodzie). To powinno jeszcze przyspieszyć adopcję Kubernetesa na mniejszych chmurach i na środowiskach on-prem.

Terraform standardem

W końcu menedżerzy i dyrektorzy mogą spokojnie patrzeć jak ich działy IT wykorzystują Terraform do obsługi serwisów w chmurach. W końcu nie jest to jakaś wersja 0.X, a już 1.0 wydana w tamtym roku. Niektórzy mogą uśmiechnąć się z politowaniem, ale nawet czasem takie małe elementy mogą decydować o decyzji w wyborze narzędzia. To był ogólnie świetny rok dla HashiCorp i od niedawna jest spółką publiczną notowaną na nowojorskiej giełdzie. Sam Terraform pozostanie jeszcze bardziej wykorzystywanym narzędziem przez ludzi od DevOps. Podobnie jak w przypadku Kubernetes jego adopcji sprzyja modularyzacja. W końcu można go rozszerzać pisząc własnych providerów, umieszczać ja na własnych zasobach, a do tego używać mnóstwo gotowych modułów lub napisać własne. Prognozuję dalszy wzrost użycia Terraform i większą ilość providerów. Nie wyobrażam sobie poważnego środowiska, które w roku 2022 wyklikuje się z GUI zamiast z kodu.

Everything as Code

Idąc za Kubernetes i Terraform, łatwo przewidzieć dokąd zmierzamy – to koncepcja Everything as Code. Nie tylko Infrastructure as Code, ale wszystko będzie opisywane kodem. To będzie przekładać się na potrzebę jeszcze większej automatyzacji. Będzie jeszcze więcej operatorów dla platform opartych o Kubernetes, będzie coraz więcej usług u dostawców chmur publicznych, więcej providerów Terraform umożliwi ich obsługę, a aplikacje będą w całości dostarczane przez pipeline’y CI/CD. Coraz więcej wiedzy i praktyk dotyczących infrastruktury będzie ukryta w oprogramowaniu operowanym przez API. Będzie tu królować Go jako język najłatwiej integrujący się z platformami (m.in. dzięki natywnym klientom dla Kubernetes, Terraform, różnych platform chmurowych i serverless) i świetnie działającymi w kontenerach (statyczne binaria i jednoplikowe obrazy kontenerów). W dużej mierze to się już dzieje w nowoczesnych organizacjach, a reszta po prostu będzie nadganiać.

Bezpieczeństwo priorytetem

Chyba niewiele poważnych organizacji rzuca się na nowe technologie bez przeanalizowania tego pod względem wpływu na bezpieczeństwo. Być może dlatego tak wiele z nich boi się zmian i wprowadzania kontenerów czy Kubernetesa. Wydaje się im, że lepiej jest pozostać przy czymś znanym. To jeszcze można zrozumieć jeśli dana technologia jest odpowiednio łatana, ale coraz częściej producenci oprogramowania życzą sobie taczki pieniędzy za utrzymywanie przy życiu takich produktów. Bezpieczne platformy można budować zarówno korzystając z maszyn wirtualnych, chmur publicznych czy prywatnych, ale coraz częściej to właśnie Kubernetes odpala skonteneryzowane aplikacje. I niezależnie od wyboru zawsze pozostaje temat bezpieczeństwa. W roku 2022 jeszcze częściej będzie słychać pojęcie DevSecOps, czyli zagnieżdżenia mechanizmów kontroli bezpieczeństwa na jak najwcześniejszym etapie (czyli Shift Left). Nie wyobrażam sobie, aby bezpieczeństwo było procesem opisanym w dokumentach, a nie w kodzie. Narzędzia takie jak OPA, Kyverno czy Vault będą jeszcze bardziej potrzebne.

Praca zdalna (lub przynajmniej hybrydowa)

Ostatnie dwa lata pokazały, że praca zdalna jest możliwa i nie wpływa ona negatywnie na efektywność organizacji. To świetna wiadomość dla tych spędzających wiele godzin na dojazdach i mieszkających z dala od zgiełku miast. Będą mogli oni teraz swobodniej wybierać pracodawców i uczestniczyć w ciekawych projektach bez konieczności przeprowadzki do innego miasta lub kraju. To już wprowadziło niezłe zamieszanie na rynku pracy otwierając możliwości dla DevOpsów i nie tylko. Dla tych jak ja, którzy czasem potrzebują kontaktu na żywo, pozostanie model hybrydowy. Ja osobiście wierzę, że jest on nam potrzebny do lepszej komunikacji. I mogą to być spotkania raz na tydzień lub nawet raz na miesiąc. Moje doświadczenie pokazuje mi, że lepiej mi się rozmawia na żywo – wyraźniej odbieram sygnały niewerbalne, mimikę twarzy rozmówcy co pozwala mi lepiej zrozumieć drugą stronę. Z drugiej strony z czasem myślę, że szczęście w tym pandemicznym nieszczęściu polega na tym, że COVID przyszedł w czasach, gdy technologia zmniejsza jego wpływ na nasze życie.

Większe zapotrzebowanie na DevOps

To jeszcze a propos pracy zdalnej i pandemii. Otóż okazało się, że ci którzy byli przygotowani na pracę zdalną nie tylko w firmach, ale urzędach, radzą sobie lepiej. Istnieją jednak kraje i organizacje, gdzie postęp nie był wystarczająco szybki. Pandemia zaś pokazała braki w przygotowaniu urzędów i organizacji, które teraz muszą szybko te niedostatki transformacji cyfrowej nadrobić. I nawet jak są już gotowe rozwiązania, oprogramowanie wytworzone i używane przez podobne jednostki, to wciąż trzeba to wszystko ze sobą spiąć.
I tu wracamy do roli DevOps w tym wszystkim. To dzięki zdolnym inżynierom potrafiącym wykorzystać chmurę, kontenery, różnego rodzaju API i tradycyjną infrastrukturę (często chmura jest po prostu poza zasięgiem), taka przyspieszona transformacja ma szanse na powodzenie. Zatem głowa do góry (ewentualnie do książek i edukacji), a ręce na klawiatury – czas zakasać rękawy i wprowadzić pozostałych w XXI wiek!

Cloudowski urośnie

I to nie tylko przez zbyt dużą ilość sernika, który pochłonąłem w ostatnie święta! Mówię tutaj o moich ambitnych planach, które sobie postawiłem na rok 2022. Nie mogę zdradzić ich wszystkich, ale uchylę przed Tobą rąbka tajemnicy. Oto co się wydarzy u mnie:

  • 🎙 Zacznę publikować mój własny podcast o DevOps – pierwsze odcinki już wkrótce!
  • 📸 Odsłonię kulisy mojej pracy – założyłem konto na instagramie!
  • 📰 Będę tworzył więcej treści po polsku – więcej artykułów i więcej ciekawych filmów na YouTube
  • 📚 Będę nagrywał i prowadził kolejne warsztaty z tematyki DevOps, Kubernetes, Terraform i Cloud
  • ⎈ Wydam odświeżoną wersję mojego kursu “Kubernetes po polsku”
  • 💻 Poprowadzę tradycyjne szkolenia, aby pomóc innym zacząć efektywniej używać narzędzi i procesów DevOps (na tą chwilę mam już zajęte kilka pierwszych miesięcy)

To tylko wycinek tego co mam w planach. Zapowiada się bardzo pracowity rok.

A czy ty już masz spisane swoje plany? Z mojego doświadczenia podpowiem, że tylko te spisane ręcznie mają większe szanse realizacji. I najlepiej noś ze sobą tą listę lub powieś w widocznym miejscu. Nic nie działa tak dobrze jak koncentracja na rzeczach dla Ciebie ważnych.

A recipe for a bespoke on-prem Kubernetes cluster

So you want to build yourself a Kubernetes cluster? You have your reasons. Some may want to utilize the hardware they own, some may not fully trust these fancy cloud services or just simply want to have a choice and build themselves a hybrid solution. There are a couple of products available that I’ve reviewed, but you’ve decided to build a platform from scratch. And again, there are a myriad of reasons why it might be a good idea and also many that would convince you it’s not worth your precious time. In this article, I will focus on providing a list of things to consider when starting a project building a Kubernetes-based platform using only the most popular open source components.

Target groups

Before we jump into the technicalities, I want to describe three target groups that are referred to in the below sections.

  • (SUP) – very small companies or the ones with basic needs; their focus is on using basic Kubernetes API and facilitating services around it
  • Medium businesses (MBU) – medium companies which want to leverage Kubernetes to boost their growth and innovation; their focus is on building a scalable platform that is also easy to maintain and extend
  • Enterprises (ENT) – big companies with even bigger needs, scale, many policies, and regulations; they are the most demanding and are focused on repeatability, security, and scalability (in terms of the growing number of developers and teams working on their platform)

All these groups have different needs and thus they should build their platform in a slightly different way with different solutions applied to particular areas. I will refer to them using their abbreviations or as ALL when referring to all of them.

Installation

When to apply: Mandatory for ALL Purpose: To have a robust and automated way of management your cluster(s)

When deciding on installing Kubernetes without using any available distribution you have a fairly limited choice of installers. You can try using kubeadm directly or use more generic kubespray. The latter one will help you not only install, but also maintain your cluster (upgrades, node replacement, cluster configuration management). Both of these are universal and are unaware of how cluster nodes are provisioned. If you wish to use an automated solution that would also handle provisioning cluster nodes then Metal3 could be something you might want to try. It’s still in the alpha stage, but it looks promising.

If you want a better and more cloud-native way of managing your clusters that would enable easy scaling then you may want to try ClusterAPI project. It supports multiple cloud providers, but it can be used on on-prem environments with the aforementioned Metal3, vSphere, or OpenStack.

One more thing worth noting here: the operating system used by cluster nodes. Since the future of CentOS seems unclear, Ubuntu becomes the main building block for bespoke Kubernetes clusters. Some may want to choose a slim alternative that has replaced CoreOS – Flatcar Linux.

Cluster autoscaler

When to apply: Highly recommended for  ENT, optional for others Purpose: Scale up and down automatically your platform

If you choose ClusterAPI or your cluster uses some API in another way to manage cluster nodes (e.g. vSphere, OpenStack, etc.) then you should also use the cluster autoscaler component. It is almost a mandatory feature for ENT but it can also be useful for MBU organizations. By forcing nodes to be ephemeral entities that can be easily replaced/removed/added, you decrease the maintenance costs.

Network CNI plugin

When to apply: Mandatory for ALL Purpose: Connect containers with optional additional features such as encryption

The networking plugin is one of the decisions that need to be taken prudently, as it cannot be easily changed afterward. To make things brief I would shorten the list to two plugins – Calico or Cilium. Calico is older and maybe a little bit more mature, but Cilium looks very promising and utilizes Linux Kernel BPF. For a more detailed comparison I would suggest reading this review of multiple plugins. Choose wisely and avoid CNI without NetworkPolicy support – having a Kubernetes cluster without the possibility to implement firewall rules is a bad idea. Both Calico and Cilium support encryption, which is a nice thing to have, but Cilium is able to encrypt all the traffic (Calico encrypts only pod-to-pod).

Ingress controller

When to apply: Mandatory for ALL Purpose: Provide an easy and flexible way to expose web applications with optional advanced features

Ingress is a component that can be easily swapped out when the cluster is running. Actually, you can have multiple Ingress controllers by leveraging IngressClass introduced in Kubernetes 1.18. A comprehensive comparison can be found here, but I would limit it to a select few controllers depending on your needs.

For those looking for compatibility with other Kubernetes clusters (e.g. hybrid solution), I would suggest starting with the most mature and battle-tested controller – nginx ingress controller. The reason is simple – you need only basic features described in Ingress API that have to be implemented by every Ingress controller. That should cover 90% of cases, especially for SUP group.

If more features are required (such as sophisticated http routing, authentication, authorization, etc.) then the following options are the most promising:

  • Contour – it’s the only CNCF project that is in the Incubating maturity level group. And it’s based on Envoy which is the most flexible proxy available out there.
  • Ambassador – has nice features, but many of them are available in the paid version. And yes – it also uses Envoy.
  • HAproxy from HAproxytech – for those who are familiar with HAproxy and want to leverage it to provide a robust Ingress controller
  • Traefik – they have an awesome logo and if you’ve been using it for some Docker load-balancing then you may find it really useful for Ingress as well

Monitoring

When to apply: Mandatory for ALL (unless an existing monitoring solution compatible with Kubernetes exists) Purpose: Provide insights on cluster state for operations teams

There is one king here – just use Prometheus. Probably the best approach would be using an operator that would install Grafana alongside some predefined dashboards.

Logging

When to apply: Mandatory for ALL (unless an existing central logging solution is already is) Purpose: Provide insights on cluster state for operations teams

It’s quite similar to monitoring – the majority of solutions are based on Elasticsearch, Fluentd and Kibana. This suite has broad community support and many problems have been solved and described thoroughly in many posts on the web. ALL should have a logging solution for their platforms and the easiest way to implement it is to use an operator like this one or a Helm Chart like this based on Open Distro (it’s an equivalent of Elasticsearch with more lenient/open source license).

Tracing

When to apply: Optional for ALL Purpose: Provide insights and additional metrics useful for application troubleshooting and performance tuning

Tracing is a feature that will be highly coveted in really big and complex environments. That’s why ENT organizations should adopt it and the best way is to implement it using Jaeger. It’s one of graduated CNCF projects which only makes it more appealing, as it’s been proven to be not only highly popular but also has a healthy community around it. Implementation requires some work on the application’s part, but the service itself can be easily installed and maintained using this operator.

Backup

When to apply: Mandatory for ENT, optional for the rest Purpose: Apply the *“Redundancy is not a backup solution” approach

ALL should remember that redundancy is not a backup solution. Although with a properly implemented GitOps solution, where each change of the cluster state goes through a dedicated git repository, the disaster recovery can be simplified, in many cases, it’s not enough. For those who plan to use persistent storage, I would recommend implementing Velero.

Storage

When to apply: For ALL if stateful applications are planned to be used Purpose: Provide flexible storage for stateful applications and services

The easiest use case of Kubernetes is stateless applications that don’t need any storage for keeping their state. Most microservices use some external service (such as databases) that can be deployed outside of a cluster. If persistent storage is required it can still be provided using already existing solutions from outside a Kubernetes cluster. There are some drawbacks (i.e. the need to provision persistent volumes manually, less reliability and flexibility) in many of them and that’s why keeping storage inside a cluster can be a viable and efficient alternative. I would limit the choices for such storage to the following projects:

Rook is the most popular and when properly implemented (e.g. deployed on a dedicated cluster or on a dedicated node pool with monitoring, alerting, etc.) can be a great way of providing storage for any kind of workloads, including even production databases (although this topic is still controversial and we all need time to accustom to this way of running them).

Security

This part is crucial for organizations that are focused on providing secure platforms for the most sensitive parts of their systems.

Non-root containers

When to apply: Mandatory for ENT and probably MBU Purpose: Decrease the risk of potential exploiting of vulnerabilities found in applications or the operating system they use

OpenShift made a very brave and good decision by providing a default setting that forbids running containers under the root account. I think this setting should be also implemented for ALL organizations that want to increase the level of workloads running on their Kubernetes clusters. It is quite easy to achieve by implementing PodSecurityPolicy admission controller and applying proper rules. It’s not even an external project, but it’s a low-hanging fruit that should be mandatory to implement for larger organizations. This, however, brings consequences in what images would be used on a platform. Most ”official” images available on Docker Hub run as root, but I see how it changes, and hopefully, it will change in the future.

Enforcing policies with OpenPolicyAgent

When to apply: Mandatory for ENT, optional for others Purpose: Enforce security and internal policies

Many organizations produce tons of security policies written down in some documents. They are often enforced by processes and audited yearly or rarely. In many cases, they aren’t adjusted to the real world and were created mostly to meet some requirements instead of protecting and ensuring best security practices are in place. It’s time to start enforcing these policies on the API level and that’s where OpenPolicyAgent comes to play. Probably it’s not required for small organizations, but it’s definitely mandatory for larger ones where risks are much higher. In such organizations properly configured rules that may:

  • prevent pulling images from untrusted container registries
  • prevent pulling images outside of a list of allowed container images
  • enforce the use of specific labels describing a project and its owner
  • enforce the applying of best practices that may have an impact on the platform reliability (e.g. defining resources and limits, use of liveness and readiness probes)
  • granularly restrict the use of the platform’s API (Kubernetes RBAC can’t be used to specify exceptions)

Authentication

When to apply: Mandatory for ALL, for some SUP it may be optional Purpose: Provide a way for user to authenticate and authorize to the platform

This is actually a mandatory component for all organizations. One thing that may surprise many is how Kubernetes treats authentication and how it relies on external sources for providing information on users. This means almost unlimited flexibility and at the same time adds even more work and requires a few decisions to be made. To make it short – you probably want something like DEX that acts as a proxy to your real Identity Provider (DEX supports many of these, including LDAP, SAML 2.0, and most popular OIDC providers). To make it easier to use you can add Gangway. It’s a pair of projects that are often used together.

You may find Keycloak as an alternative that is more powerful, but at the same time is also more complex and difficult to configure.

Better secret management

When to apply: Mandatory for ENT Purpose: Provide a better and more secure way of handling confidential information on the platform

For smaller projects and organizations encrypting Secrets in a repo where they are stored should be sufficient. Tools such as git-crypt , git-secret or SOPS do a great job in securing these objects. I recommend especially the last one – SOPS is very universal and combined with GPG can be used to create a very robust solution. For larger organizations, I would recommend implementing HashiCorp Vault which can be easily integrated with any Kubernetes cluster. It requires a bit of work and thus the use of it for small clusters with few applications seems to make no sense. For those who have dozens or even hundreds of credentials or other confidential data to store Vault can make their life easier. Auditing, built-in versioning, seamless integration, and what is the killer feature – dynamic secrets. By implementing access to external services (i.e. various cloud providers, LDAP, RabbitMQ, ssh and database servers) using credentials created on-demand with a short lifetime, you set a different level of security for your platform.

Security audits

When to apply: Mandatory for ENT and MBU Purpose: Get more information on potential security breaches

When handling a big environment, especially one that needs to be compliant with some security standards, providing a way to report suspicious activity is one of the most important requirements. Setting auditing for Kubernetes is quite easy and it can even be enhanced by generating more granular information on specific events generated not by API components, but by containers running on a cluster. The project that brings these additional features is Falco. It’s really amazing how powerful this tool is – it uses the Linux kernel’s internal API to trace all activity of a container such as access to files, sending or receiving network traffic, access to Kubernetes API, and many, many more. The built-in rules already provide some useful information, but they need to be adjusted for specific needs to get rid of false positives and triggers when unusual activities are discovered on the cluster.

Container images security scanning

When to apply: Mandatory for ALL Purpose: Don’t allow to run containers with critical vulnerabilities found

The platform security mostly comes down to vulnerabilities in the containers running on it. That’s why it is so important to ensure that the images used to run these containers are scanned against most critical vulnerabilities. This can be achieved in two ways – one is by scanning the images on a container registry and the other is by including an additional step in the CI/CD pipeline used for the deployment.

It’s worth considering keeping container images outside of the cluster and relying on existing container registries such as Docker HubAmazon ECRGoogle GCR or Azure ACR. Yes – even when building an on-prem environment sometimes is just easier to use a service from a public cloud provider. It is especially beneficial for smaller organizations that don’t want to invest too much time in building a container registry and at the same time they want to provide a proper level of security and reliability.

There is one major player in the on-prem container registries market that should be considered when building such a service. It’s Harbor which has plenty of features, including security scanning, mirroring of other registries, and replication that allows adding more nines to its availability SLO. Harbor has a built-in Trivy scanner that works pretty well and is able to find vulnerabilities on the operating system level and also in the application packages.

Trivy can also be used as a standalone tool in a CI/CD pipeline to scan the container image built by one of the stages. This one-line command might protect you from serious troubles as many can be surprised by the number of critical vulnerabilities that exist even in the official docker images.

Extra addons

On top of basic Kubernetes features there are some interesting addons that extend Kubernetes basic features.

User-friendly interface

When to apply: Mandatory for ENT and MBU Purpose: Allow less experienced users to use the platform

Who doesn’t like a nice GUI that helps to get a quick overview of what’s going on with your cluster and applications running on it? Even I crave such interfaces and I spend most of my time in my command line or with my editor. These interfaces when designed properly can speed up the process of administration and just make the work with the Kubernetes environment much more pleasant. The ”official” Kubernetes dashboard project is very basic and it’s not the tool that I would recommend for beginners, as it may actually scare people off instead of drawing them to Kubernetes. I still believe that OpenShift’s web console is one of the best, but unfortunately it cannot be easily installed with any Kubernetes cluster. If it was possible then it would definitely be my first choice. Octant looks like an interesting project that is extensible and there are already useful plugins available (e.g. Aqua Security Starboard). It’s rather a platform than a simple web console, as it actually doesn’t run inside a cluster, but on a workstation. The other contestant in the UI category is Lens. It’s also a standalone application. It works pretty well and shows nice graphs when there’s a prometheus installed on the cluster.

Service mesh

When to apply: Optional for ALL Purpose: Enable more advanced traffic management, more security and flexibility for the applications running on the platform

Before any project name appears here there’s a fundamental question that needs to be asked here – do you really need a service mesh for your applications? I wouldn’t recommend it for organizations which just start their journey with cloud native workloads. Having an additional layer can make non-so-trivial management of containers even more complex and difficult. Maybe you want to use service mesh only to encrypt traffic? Consider a proper CNI plugin that would bring this feature transparently. Maybe advanced deployment seems like a good idea, but did you know that even basic Nginx Ingress controller supports canary releases? Introduce a service mesh only then when you really need a specific feature (e.g. multi-cluster communication, traffic policy, circuit breakers, etc.). Most readers would probably be better off without service mesh and for those prepared for the additional effort related to increased complexity the choice is limited to few solutions. The first and most obvious one is Istio. The other that I can recommend is Consul Connect from HashiCorp. The former is also the most popular one and is often provided as an add-on in the Kubernetes services in the cloud. The latter one seems to be much simpler, but also is easier to use. It’s also a part of Consul and together they enable creation and management of multi-cluster environments.

External dns

When to apply: Optional for ALL, recommended for dynamic environments Purpose: Decrease the operational work involved with managing new DNS entries

Smaller environments will probably not need many dns records for the external access via load balancer or ingress services. For larger and more dynamic ones having a dedicated service managing these dns records may save a lot of time. This service is external-dns and can be configured to manage dns records on most dns services available in the cloud and also on traditional dns servers such as bind. This addon works best with the next one which adds TLS certificates to your web applications.

Cert-manager

When to apply: Optional for ALL, recommended for dynamic environments Purpose: Get trusted SSL certificates for free!

Do you still want to pay for your SSL/TLS certificates? Thanks to Let’s Encrypt you don’t need to. But this is just one of the Let’s Encrypt’s features. Use of Let’s Encrypt has been growing rapidly over the past few years. Tand the reason why is that it’s one of the things that should be at least considered as a part of the modern Kubernetes platform is how easy it is to automate. There’s a dedicated operator called cert-manager that makes the whole process of requesting and refreshing certificates very quick and transparent to applications. Having trusted certificates saves a lot of time and trouble for those who manage many web services exposed externally, including test environments. Just ask anyone who had to inject custom certificate authority keys to dozens of places to make all the components talk to each other without any additional effort. And cert-manager can be used for internal Kubernetes components as well. It’s one of my favourite addons and I hope many will appreciate it as much as I do.

Additional cluster metrics

When to apply: Mandatory for ALL Purpose: Get more insights and enable autoscaling

There are two additional components that should be installed on clusters used in production. They are metrics-server and kube-state-metrics. The first is required for the internal autoscaler (HorizontalPodAutoscaler) to work, as metrics-server exposes metrics gathered from various cluster components. I can’t imagine working with a production cluster that lack of these features and all the events that should be a part of standard security review processes and alerting systems.

GitOps management

When to apply: Optional for ALL, recommended for ENT Purpose: Decrease the operational work involved with cluster management

It is not that popular yet, but cluster and environment management is going to be an important topic, especially for larger organizations where there are dozens of clusters, namespaces and hundreds of developers working on them. Management techniques involving git repositories as a source of truth are known as GitOps and they leverage the declarative nature of Kubernetes. It looks like ArgoCD has become a major player in this area and installing it on the cluster may bring many benefits for teams responsible for maintenance, but also for security of the whole platform.

Conclusion

The aforementioned projects do not even begin to exhaust the subject of the solutions available for Kubernetes. This list merely shows how many possibilities are out there, how rich the Kubernetes ecosystem is, and finally how quickly it evolves. For some it may be also surprising how standard Kubernetes lacks some features required for running production workloads. Even the multiple versions of Kubernetes-as-a-Service available on major cloud platforms are missing most of these features, let alone the clusters that are built from scratch for on-prem environments. It shows how difficult this process of building a bespoke Kubernetes platform can become, but at the same time those who will manage to put it all together can be assured that their creation will bring their organization to the next level of automation, reliability, security and flexibility. For the rest there’s another and easier path – using a Kubernetes-based product that has most of these features built-in.

Which Kubernetes distribution to choose for on-prem environments?

Most people think that Kubernetes was designed to bring more features and more abstraction layers to cloud environments. Well, I think the biggest benefits can be achieved in on-premise environments, because of the big gap between those environments and the ones that can be easily created in the cloud. This opens up many excellent opportunities for organizations which for some reasons choose to stay outside of the public cloud. In order to leverage Kubernetes using on-premise hardware, one of the biggest decisions that needs to be made which software platform to use for Kubernetes. According to the official listing of available Kubernetes distributions, there are dozens of options available. If you look closely at them, however, there are only a few viable ones, as many of them are either inactive or have been merged with other projects (e.g. Pivotal Kubernetes Service merged with VMware Tanzu). I expect that 3-5 of these distributions will eventually prevail in the next 2 years and they will target their own niche market segments. Let’s have a look at those that have stayed in the game and can be used as a foundation for a highly automated on-premise platform.

1. OpenShift

I’ll start with the obvious and probably the best choice there is – OpenShift Container Platform. I’ve written about this product many times and still there’s no better Kubernetes distribution available on the market that is so rich in features. This also comes with its biggest disadvantage – the price that for some is just too high. OpenShift is Red Hat’s flagship product that is targeted at enterprises. Of course they sell it to medium or even small companies, but the main target group is big enterprises with a big budget. It has also become a platform for Red Hat’s other products or other vendors’ services that are easily installable and available at https://www.operatorhub.io/. OpenShift can be installed in the cloud, but it’s on-premise environments is where it shows its most powerful features. Almost every piece of it is highly automated and this enables easy maintenance of clusters (installation, upgrades and scaling), rapid deployment of supplementary services (databases, service mesh) and platform configuration. There is no other distribution that has achieved that level of automation. OpenShift is also the most complete solution which includes integrated logging, monitoring and CI/CD (although they are still working on switching from Jenkins to Tekton engine which is not that feature-rich yet).

When to choose OpenShift

  • If you have a big budget – money can’t bring happiness, but it can buy you the best Kubernetes distribution, so why hesitate?
  • If you want to have the easiest and smoothest experience with Kubernetes – a user-friendly web console that is second to none and comprehensive documentation.
  • You don’t plan to scale rapidly but you need a bulletproof solution – OpenShift can be great for even small environments and as long as they won’t grow it can be financially reasonable
  • Your organization has few DevOps/Ops people – OpenShift is less demanding from a maintenance perspective and may help to overcome problems with finding highly skilled Kubernetes and infrastructure experts
  • The systems that your organization builds are complex – in cases where the development and deployment processes require a lot of additional services, there’s no better way to create and maintain clusters on on-premise environments than by using operators (and buying additional support for them if needed)
  • If you need support (?) – I’ve put it here just for the sake of providing some reasonable justification for the high price of an OpenShift subscription, but unfortunately many customers are not satisfied with the level of product support and thus it’s not the biggest advantage here

When to avoid OpenShift

  • All you need is Kubernetes API – maybe all these fancy features are just superfluous and just plain Kubernetes distribution is enough, provided that you have a team of skilled people that could build and maintain it
  • If your budget is tight – that’s obvious, but many believe they can somehow overcome the high price of OpenShift by efficiently bin packing their workloads on smaller clusters or get a real bargain when ordering their subscriptions (I guess it’s possible, but only for really big orders for hundreds of nodes)
  • Your organization is an avid supporter of open source projects and avoids any potential vendor lock-ins – although OpenShift includes Kubernetes and can be fully compatible with other Kubernetes distributions, there are some areas where a potential vendor lock-in can occur (e.g. reliance on builtin operators and their APIs)

2. OKD

Back in the day Red Hat used upstream-downstream strategy for product development where open source upstream projects were free to use and their downstream, commercial products were heavily dependent on their upstreams and built on top of them. That has changed with OpenShift 4 where its open source equivalent – OKD – was released months after OpenShift had been redesigned, with help from guys from CoreOS (Red Hat acquired CoreOS in 2018). So OKD is an open source version of OpenShift and it’s free. It’s a similar strategy that Red Hat has been using for years – to attract people and accustom them to the free (upstream) versions and also give them a very similar experience to their paid products. The only difference is of course lack of support and few features that are available in OpenShift only. That’s what the key factors to consider are when deciding on a Kubernetes platform – does your organization need support or will it get by without it? Things got a little bit more complicated after Red Hat (who own CentOS project) has announced that CentOS 8 will cease to exist in the form that has been known for years. CentOS is widely used by many companies as a free version of RHEL (Red Hat Enterprise Linux) and it looks like it has changed and we don’t know what IBM will do with OKD (I suspect it was their business decision to pull the plug). There’s a risk that OKD will also no longer be developed, or at least it will not resemble OpenShift like it does now. As for now being still very similar to OpenShift, OKD can be also considered as one of the best Kubernetes platforms to use for on-premise installations.

When to choose OKD

  • You don’t care about Red Hat addons, but still need a highly automated platform – OKD can still brings your environment to a completely different level by leveraging operators, builtin services (i.e. logging, monitoring)
  • You don’t need support, because you have really smart people with Kubernetes skills – either you pay Red Hat for its support or build an internal team that would act as 1st, 2nd and 3rd line of support (not mentioning the vast resources available on the web)
  • You plan to run internal workloads only without exposing them outside – Red Hat brags about providing curated list of container images while OKD relies on community’s work on providing security patches and this causes some delays; for some this can be an acceptable risk, especially if the platform is used internally
  • You need a Kubernetes distribution that is user-friendly – web console in OKD is almost identical to the one in OpenShift which I already described before as second to none; it often helps less experienced users to use it and even more experienced ones can use it to perform daily tasks even faster by leveraging all the information gathered in a concise form
  • You want to decrease costs of OpenShift and use it for testing environments only – this idea seems to be reasonable from the economic point of view and if planned and executed well it makes sense; there are some caveats though (e.g. it is against Red Hat license to use most of their container images)

When to avoid OKD

  • Plain Kubernetes is all you need – with all these features comes complexity that may be just not what your organization needs and you’d be better off with some simpler Kubernetes distribution
  • You expect quick fixes and patches – don’t get me wrong, it looks like they are delivered, but it’s not guaranteed and relies solely on community (e.g. for OpenShift Origin 3, a predecessor of OKD, some container images used internally by the platform haven’t been updated for months whereas OpenShift provided updates fairly quickly)
  • You need a stable and predictable platform – nobody expected CentOS 8 would no longer be an equivalent to RHEL and so similar decisions of IBM executives can affect OKD and there’s a risk that sometime in the future all OKD users would have no choice but to migrate to some other solution

3. Rancher

After Rancher had been accquired by SUSE, a new chapter opened for this niche player on the market. Although SUSE already had their own Kubernetes solution, it’s likely that they will only have a single offering of that type and it’s going to be Rancher. Basically, Rancher offers an easy management of multiple Kubernetes clusters that can be provisioned manually and imported into the Cluster Manager management panel or provisioned by Rancher using its own Kubernetes distribution. They call it RKE – Rancher Kubernetes Engine and it can be installed on most major cloud providers, but also on vSphere. Managing multiple clusters using Rancher is very easy and combining it with plenty of authentication options makes it a really compelling solution for those who plan to manage hybrid, multi-cluster, or even multi-cloud environments. I think that Rancher has initiated many interesting projects, including K3S (simpler Kubernetes control plane targeted for edge computing) , RKE (the aforementioned Kubernetes distribution), and Longhorn (distributed storage). You can see they are in the middle of an intensive development cycle – even by looking at the Rancher’s inconsistent UI which is divided into two: Cluster Manager with a fresh look, decent list of options, and Cluster Explorer that is less pleasant, but offers more insights. Let’s hope they will continue improving Rancher and its RKE to be even more usable so that it would become an even more compelling Kubernetes platform for on-premise environments.

When to choose Rancher

  • If you already have VMware vSphere – Rancher makes it very easy to spawn new on-premise clusters by leveraging vSphere API
  • If you plan to maintain many clusters (all on-premise, hybrid or multi-cloud) – it’s just easier to manage them from a single place where you log in using unified credentials (it’s very easy to set up authentication against various services)
  • You focus on platform maintenance more than on features supporting development – with nice integrated backup solution, CIS benchmark engine and only few developer-focused solution (I think their CI/CD solution was put there just for the sake of marketing purposes – it’s barely usable) it’s just more appealing to infrastructure teams
  • If you really need paid support for your Kubernetes environment – Rancher provides support for its product, including their own Kubernetes distribution (RKE) as well as for custom installations; When it comes to price it’s a mystery that will be revealed when you contact Sales
  • You need a browser-optimized access to your environment – with builtin shell it’s very easy to access cluster resources without configuring anything on a local machine

When to avoid Rancher

  • You don’t care about fancy features – although there are significantly less features in Rancher than in OpenShift or OKD, it is still more than just a nice UI and some may find it redundant and can get by without them
  • You’re interested in more mature products – it looks like Rancher has been in an active development over the past few months and probably it is going to be redesigned and some point, just like it happened with OpenShift (version 3 and 4 are very different)
  • You don’t plan or need to use multiple clusters – maybe one is enough?

4. VMware Tanzu

The last contender is Tanzu from the biggest on-premise virtualization software vendor. When they announced project Pacific I knew it was going to be huge. And it is. Tanzu is a set of products that leverage Kubernetes and integrate them with vSphere. The product that manages Kubernetes clusters is called Tanzu Kubernetes Grid (TKG) and it’s just the beginning of Tanzu offering. There’s Tanzu Mission Control for managing multiple clusters, Tanzu Observability for.. observability, Tanzu Service Mesh for.. yes, it’s their service mesh, and many more. For anyone familiar with enterprise offering it may resemble any other product suite from a big giant like IBM, Oracle and so on. Let’s be honest here – Tanzu is not for anyone that is interested in “some” Kubernetes, it’s for enterprises accustomed to enterprise products and everything that comes with it (i.e. sales, support, software that can be downloaded only for authorized users, etc.). And it’s especially designed for those whose infrastructure is based on the VMware ecosystem – it’s a perfect addition that meets requirements of development teams within an organization, but also addresses operations teams concerts with the same tools that’s been known for over a decade now. When it comes to features they are pretty standard – easy authentication, cluster scaling, build services based on buildpacks, networking integrated with VMware NSX, storage integrated with vSphere – wait, it’s starting to sound like a feature list of another vSphere addon. I guess it is an addon. For those looking for fancy features I suggest waiting a bit more for VMware to come up with new Tanzu products (or for a new acquisition of another company from cloud native world like they did with Bitnami).

When to choose Tanzu

  • When your company already uses VMware vSphere – just contact your VMware sales guy who will prepare you an offer and the team that takes care of your infrastructure will do the rest
  • If you don’t plan to deploy anything outside of your own infrastructure – although VMware tries to be a hybrid provider by enabling integration with AWS or GCP, it will stay focused on on-premise market where it’s undeniably the leader
  • If you wish to use multiple clusters – Tanzu enables easy creation of Kubernetes clusters that can be assigned to development teams
  • If you need support – it’s an enterprise product with enterprise support

When to avoid Tanzu

  • If you don’t have already vSphere in your organization – you need vSphere and its ecosystem, that Tanzu is a part of, to start working with VMware’s Kubernetes services; otherwise it will cost you a lot more time and resources to install it just to leverage them
  • When you need more features integrated with the platform – although Tanzu provides interesting features (my favourite is Tanzu Build Service) it still lacks of some distinguished ones (although they provide some for you to install on your own from Solutions Hub) that would make it more appealing

Conclusion

I have chosen these four solutions for Kubernetes on-premise platform because I believe they provide a real alternative to custom-built clusters. These products make it easier to build and maintain production clusters, but also in many cases help to speed up the development process and provide insights for the deployment process as well. So here’s what I would do if I were to choose one:

  • if I had a big budget I would go with OpenShift, as it’s just the best
  • if I had a big budget and already existing VMware vSphere infrastructure I would consider Tanzu
  • if I had skilled Kubernetes people in my organization and I wanted to have an easy way to manage my clusters (provisioned manually) without vSphere I would choose Rancher (and optionally I would buy a support for those clusters when going to prod)
  • if I had skilled Kubernetes people in my organization and I would like to use these fancy OpenShift features I would go with OKD, as it’s the best alternative to custom-built Kubernetes cluster

That’s not all. Of course you can build your own Kubernetes cluster and it’s a path that is chosen by many organizations. There are many caveats and conditions that need to be met (e.g. scale of such endeavour, type of workloads to be deployed on it) for this to succeed. But that’s a different story which I hope to cover in some other article.

How to modify containers without rebuilding their image

Containers are a beautiful piece of technology that ease the development of modern applications and also the maintenance of modern environments. One thing that draws many people to them is how they reduce the time required to set up a service, or a whole environment, with everything included. It is possible mainly because there are so many container images available and ready to use. You will probably need to build your own container images with your applications, but many containers in your environment will use prebuilt images prepared by someone else. It’s especially worth considering for software that is provided by the software vendor or a trusted group of developers like it has been done in the case of “official” images published on Docker Hub. In both cases, it makes your life easier by letting someone else take care of updates, packaging new versions, and making sure it works.
But what if you want to change something in those images? Maybe it’s a minor change or something bigger that is specific for your particular usage of the service. The first instinct may tell you to rebuild that image. This, however, brings some overhead – these images will have to be published, rebuilt when new upstream versions are published, and you lose most of the benefits that come with those prebuilt versions.
There is an alternative to that – actually, I found four of them which I will describe below. These solutions will allow you to keep all the benefits and adjust the behaviour of running containers in a seamless way.

Method 1 – init-containers

Init-containers were created to provide additional functionality to the main container (or containers) defined in a Pod. They are executed before the main container and can use a different container image. In case of any failure, they will prevent the main container from starting. All logs can be easily retrieved and troubleshooting is fairly simple – they are fetched just like any other container defined in a Pod by providing its name. This methods is quiote popular among services such as databases to initialize and configure them based on configuration parameters.

Example

The following example uses a dedicated empty volume for storing data initialized by an init-container. In this specific case, it’s just a simple “echo” command, but in a real-world scenario, this can be a script that does something more complex.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx-init
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      initContainers:
        - name: prepare-webpage
          image: busybox:1.28
          command: ["sh", "-c"]
          args: [
              "set -x;
              echo '<h2>Page prepared by an init container</h2>' > /web/index.html;
              echo 'Init finished successfully'
              ",
            ]
          volumeMounts:
            - mountPath: /web
              name: web
      containers:
        - image: nginx:1.19
          name: nginx
          volumeMounts:
            - mountPath: /usr/share/nginx/html/
              name: web
          ports:
            - containerPort: 80
              name: http
      volumes:
        - name: web
          emptyDir: {}

Method 2 – post-start hook

A Post-start hook can be used to execute some action just after the main container starts. It can be either a script executed in the same context as the container or an HTTP request that is executed against a defined endpoint. In most cases, it would probably be a shell script. Pod stays in the ContainerCreating state until this script ends. It can be tricky to debug since there are no logs available. There are more caveats and this should be used only for simple, non-invasive actions. The best feature of this method is that the script is executed when the service in the main container starts and can be used to interact with the service (e.g. by executing some API requests). With a proper readinessProbe configuration, this can give a nice way of initializing the application before any requests are allowed.

Example

In the following example a post-start hook executes the echo command, but again – this can be anything that uses the same set of files available on the container filesystem in order to perform some sort of initialization.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx-hook
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - image: nginx:1.19
          name: nginx
          ports:
            - containerPort: 80
              name: http
          lifecycle:
            postStart:
              exec:
                command:
                  [
                    "sh",
                    "-c",
                    "sleep 5;set -x; echo '<h2>Page prepared by a PostStart hook</h2>' > /usr/share/nginx/html/index.html",
                  ]

Method 3 – sidecar container

This method leverages the concept of the Pod where multiple containers run at the same time sharing IPC and network kernel namespaces. It’s been widely used in the Kubernetes ecosystem by projects such as Istio, Consul Connect, and many others. The assumption here is that all containers run simultaneously which makes it a little bit tricky to use a sidecar container to modify the behaviour of the main container. But it’s doable and it can be used to interact with the running application or a service. I’ve been using this feature with the Jenkins helm chart where there’s a sidecar container responsible for reading ConfigMap objects with Configuration-as-Code config entries.

Example

Nothing new here, just the “echo” command with a little caveat – since sidecar containers must obey restartPolicy setting, they must run after they finish their actions and thus it uses a simple while infinite loop. In more advanced cases this would be rather some small daemon (or a loop that checks some state) that runs like a service.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx-sidecar
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - image: nginx:1.19
          name: nginx
          volumeMounts:
            - mountPath: /usr/share/nginx/html/
              name: web
          ports:
            - containerPort: 80
              name: http
        - name: prepare-webpage
          image: busybox:1.28
          command: ["sh", "-c"]
          args: [
              "set -x;
              echo '<h2>Page prepared by a sidecar container</h2>' > /web/index.html;
              while :;do sleep 9999;done
              ",
            ]
          volumeMounts:
            - mountPath: /web
              name: web
      volumes:
        - name: web
          emptyDir: {}

Method 4 – entrypoint

The last method uses the same container image and is similar to the Post-start hook except it runs before the main app or service. As you probably know in every container image there is an ENTRYPOINT command defined (explicitly or implicitly) and we can leverage it to execute some arbitrary scripts. It is often used by many official images and in this method we will just prepend our own script to modify the behavior of the main container. In more advanced scenarios you could actually provide a modified version of the original entrypoint file.

Example

This method is a little bit more complex and involves creating a ConfigMap with a script content that is executed before the main entrypoint. Our script for modifying nginx entrypoint is embedded in the following ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: scripts
data:
  prestart-script.sh: |-
    #!/usr/bin/env bash

    echo '<h2>Page prepared by a script executed before entrypoint container</h2>' > /usr/share/nginx/html/index.html

    exec /docker-entrypoint.sh nginx -g "daemon off;" # it's "ENTRYPOINT CMD" extracted from the main container image definition

One thing that is very important is the last line with exec. It executes the original entrypoint script and must match it exactly as it is defined in the Dockerfile. In this case it requires additional arguments that are defined in the CMD.

Now let’s define the Deployment object

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx-script
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - image: nginx:1.19
          name: nginx
          command: ["bash", "-c", "/scripts/prestart-script.sh"]
          ports:
            - containerPort: 80
              name: http
          volumeMounts:
            - mountPath: /scripts
              name: scripts
      volumes:
        - name: scripts
          configMap:
            name: scripts
            defaultMode: 0755 # <- this is important

That is pretty straightforward – we override the entrypoint with command and we also must make sure our script is mounted with proper permissions (thus defaultModeneeds to be defined).

Comparison table

Here’s the table that summarizes the differences between the aforementioned methods:

Conclusion

Containers are about reusability and often it’s much easier to make small adjustments without rebuilding the whole container image and take over the responsibility of publishing and maintaining it. It’s just an implementation of the KISS principle.

4 ways to manage Kubernetes resources

Kubectl is the new ssh

When I started my adventure with linux systems the first tool I had to get to know was ssh. Oh man, what a wonderful and powerful piece of software it is! You can not only log in to your servers, copy files, but also create vpns, omit firewalls with SOCKS proxy and port-forwarding rules, and many more. With Kubernetes, however, this tool is used mostly for node maintenance provided that you still need to manage them and you haven’t switched to CoreOS or another variant of the immutable node type. For any other cases, you use kubectl which is the new ssh. If you don’t use API calls directly then you probably use it in some form and you feed it with plenty of yaml files. Let’s face it – this is how managing Kubernetes environment looks like nowadays. You create those beautiful, lengthy text files with the definitions of the resources you wish to be created by Kubernetes and then magic happens and you’re the hero of the day. Unless you want to create not one but tens or hundreds of them with different configurations. And that’s when things get complicated.

Simplicity vs. flexibility

For basic scenarios, simple yaml files can be sufficient. However, with the growth of your environment, the number of resources and configurations grows. You may start noticing how much more time it takes to create a new instance of your app, reconfigure the ones that are running already or share it with the community or with your customers wishing to customize it to their needs.
Currently, I find the following ways to be the most commonly used:

They all can be used to manage your resources and they also are different in many ways. One of the distinguishing factors is complexity which also implies much effort to learn, use and maintain a particular method. On the other hand, it might pay off in the long run when you really want to create complex configurations. You can observe this relationship in the following diagram:

Flexibility vs. Complexity

So there’s a trade-off between how much flexibility you want to have versus how simple it can be. For some simplicity can win and for some, it’s just not enough. Let’s have a closer look at these four ways and see in which cases they can fit best.

1. Keep it simple with plain yamls

I’ve always told people attending my courses that by learning Kubernetes they become yaml programmers. It might sound silly, but in reality, the basic usage of Kubernetes comes down to writing definitions of some objects in plain yaml. Of course, you have to know two things – the first is what you want to create, and the second is the knowledge on Kubernetes API which is the foundations of these yaml files.
After you’ve learned how to write yaml files you can just use kubectl to send it to Kubernetes and your job is done. No parameters, no templates, not figuring out how to change it in a fancy way. If you want to create an additional instance of your application or the whole environment you just copy and paste. Of course, there will be some duplication here but it’s the price you pay for simplicity. And besides, for a couple of instances it’s not a big deal and most of the organizations probably can live with this imperfect solution, at least at the beginning of their journey when they are not as big as they wish to be.

When to use:

  • For projects with less than 4 configurations/instances of their apps or environments
  • For small startups
  • For bigger companies starting their first Kubernetes projects (e.g. as a part of PoC)
  • For individuals learning Kubernetes API

When to avoid:

  • organizations and projects releasing their products or services for Kubernetes environments
  • in projects where each instance varies significantly and requires a lot of adjustments

2. Customize a bit with Kustomize

Kustomize is a project that is one of Kubernetes official SIG groups. It has the concept of inheritance based Kubernetes resources defined in.. yaml files. That’s right – you cannot escape from them! This time, however, with Kustomize you can apply any changes you want to your already existing set of resources. To put it simply Kustomize can be treated as a Kubernetes-specific patch tool. It lets you override all the parts of yaml files with additional features, including the following:

  • Changing repositories, names, and tags for container images
  • Generating ConfigMap objects directly from files and generate hashes ensuring that Deployment will trigger a new rollout when they change
  • Using kustomize cli to modify configurations on the fly (useful in CI/CD pipelines)

From version 1.14 it is built-in to kubectl binary which makes it easy to start with. Unfortunately, new features are added much faster in standalone kustomize project and its release cycle doesn’t sync up with the official releases of kubectl binaries. Thus, I highly recommend using its standalone version rather than kubectl’s built-in functionality.
According to its creators, it encourages you to use Kubernetes API directly without creating another artificial abstraction layer.

When to use:

  • For projects with less than 10 configurations/instances that don’t require too many parameters
  • For startups starting to grow, but still using Kubernetes internally (i.e. without the need to publish manifests as a part of their products)
  • For anyone who knows Kubernetes API and feels comfortable with using it directly

When to avoid:

  • If your environments or instances vary up to between 30-50%, because you’ll just rewrite most of your manifests by adding patches
  • In the same cases as with plain yamls

3. Powerful Helm Charts for advanced

If you haven’t seen Helm Hub then I recommend you to do it and look for your favorite software, especially if it’s a popular open-source project, and I’m pretty sure it’s there. With the release of Helm 3 most of its flaws have been fixed. Actually the biggest one was the Tiller component that is no longer required which makes it really great tool for your deployments. For OpenShift users that could also be a great relief since its templating system is just too simple (I’m trying to avoid word terrible but it is).
Most people who have started using Helm for deploying these ready services often start writing their own Charts for applications and almost everything they deploy on Kubernetes. It might be a good idea for really complex configurations but in most cases, it’s just overkill. In cases when you don’t publish your Charts to some registry (and soon even to container registries) and just use them for their templating feature (with Helm 3 it is finally possible without downloading Chart’s source code), you might be better of with Kustomize.
For advanced scenarios, however, Helm is the way to go. It can be this single tool that you use to release your applications for other teams to deploy to their environments. And so can your customers who can use a single command – literally just helm upgrade YOURCHART – to deploy a newer version of your app. All you need to do in order to achieve this simplicity is “just”:

  • write Chart templates in a way that would handle all these cases and configuration variants
  • create and maintain the whole release process with CI/CD pipeline, testing, and publishing

Many examples on Helm Hub shows how complex software can be packed in a Chart to make installation a trivial process and customization much more accessible, especially for end-users who don’t want to get into much details. I myself use many Helm Charts to install software and consider it as one of the most important projects in Kubernetes ecosystem.

When to use:

  • For big projects with more than 10 configurations/instances that have many variants and parameters
  • For projects that are published on the Internet to make them easy to install

When to avoid:

  • If your applications are not that complex and you don’t need to publish them anywhere
  • If you don’t plan to maintain CI/CD for the release process cause maintaining Charts without pipelines is just time-consuming
  • If you don’t know Kubernetes API in-depth yet

4. Automated bots (operators) at your service

Now, the final one, most sophisticated, and for some superfluous. In fact, it’s a design pattern proposed by CoreOS (now Red Hat) that just leverages Kubernetes features like Custom Resource Definition and custom logic embedded in software running directly on Kubernetes and leveraging its internal API called controllers. It is widely used in the OpenShift ecosystem and it’s been promoted by Red Hat since the release of OpenShift 4, as the best way to create services on OpenShift. They even provide an operator for customizing OpenShift’s web interface. That’s what I call an abstraction layer! Everything is controlled there with yaml handled by dozens of custom operators, because the whole logic is embedded there.
To put it simply what is operator I would say that operator is an equivalent of cloud service like Amazon RDS, GCP Cloud Pub/Sub or Azure Cosmos DB. You build an operator to provide a consistent, simple way to install and maintain (including upgrades) your application in ”-as-a-Service” way on any Kubernetes platform using its native API. It does not only provide the highest level of automation, but also allows for including complex logic such as built-in monitoring, seamless upgrades, self-healing and autoscaling. Once again – all you need to do is provide a definition in yaml format and the rest will be taken care of by the operator.
“It looks awesome!” one can say. Many think it should and will be a preferred way of delivering applications. I cannot agree with that statement. I think if you’re a software vendor providing your application to hundreds of customers (even internally) then this is the way to go. Otherwise, it can be too complex and time consuming to write operators. Especially if you want to follow best practices, use Golang and provide an easy upgrade path (and it can get tricky).

I found the following projects to be very helpful in developing and maintaining Operators:

  • kubebuilder – one of the first operator frameworks for Go developers, the most poweful and the most complex one
  • kopf – framework for developing operators in python
    KUDO – write operators in a declarative way
  • operator-sdk – framework from CoreOSRed Hat for writing operators in Go and Ansible
  • operator-lifecycle – a must have for anyone interested in getting serious with operators and their lifrecycle (installation, maintenance, upgrades)

When to use:

  • If you need to create your own service (e.g. YourProduct-as-a-Service) available on Kubernetes
  • If you plan to add additional features to your service (e.g. monitoring, autoscaling, autohealing, analytics)
  • If you’re a software vendor providing your software for Kubernetes platforms
  • If you want to develop software installed on OpenShift and be a part of its ecosystem (e.g. publish your software on their ”app marketplace”operatorhub.io)

When to avoid:

  • For simple applications
  • For other applications when Helm Chart with some semi-complex templates will do
  • When no extra automation is needed or it can be acomplished with simple configuration of the existing components

Conclusion

Each of these methods and tools I have described are for organizations at different point of their journey with Kubernetes. For standard use-cases simple yamls may be sufficient and with more applications Kustomize can be great enhancement of this approach. When things get serious and applications get more complex, Helm Chart presents a perfect balance between complexity and flexibility. I can recommend Operators for vendors delivering their applications in Kubernetes in a similar way to cloud services, and definitely for those who plan to provide it for enterprise customers using OpenShift.

Why Vault and Kubernetes is the perfect couple

The (not so) secret flaws of Kubernetes Secrets

When you’re starting learning and using Kubernetes for the first time you discover that there is this special object called Secret that is designed for storing various kinds of confidential data. However, when you find out it is very similar to ConfigMap object and is not encrypted (it can be optionally encrypted at rest) you may start wondering – is it really secure? Especially when you use the same API to interact with it and the same credentials. This, combined with a rather simple RBAC model, can create many potential risks. Most people would stick with one of three default roles for regular users – view, edit, and admin – with view as the only one that forbids viewing Secret objects. You need to be very careful when assigning roles to users or deciding to create your custom RBAC roles. But again, this is also not that easy since RBAC rules can only whitelist API requests – it is not possible to create exceptions (i.e. create blacklists) without using the external mechanism such as Open Policy Agent.

Managing Secrets is Hard

On top of that managing Secret object definitions (e.g. yaml files) is not an easy task. Where should you store it before sending it to your Kubernetes cluster – in a git repo? Outside of it? Who should have access to view and modify it? What about encryption – should it be encrypted with a single key shared by the trusted team members or with gpg (e.g. git-secret, git-crypt)?
One thing is for sure – it is hard to maintain Secret object definitions in the same way as other Kubernetes objects. You can try to come up with your own way of protecting them, auditing changes and other important things you’re not even aware of, but why reinvent the wheel when there’s something better? Much, MUCH better.

HashiCorp Vault to the Rescue

Now some may say I am a HashiCorp fanboy which… might be partially true 🙂 I not only love their products but more their approach towards managing infrastructure and the fact that most features they provide are available in open source versions.
It is not a surprise that the best product they have on offer (in terms of commercial success) is Vault. It is a project designed to help you store and securely access your confidential data. It is designed for this purpose only and has many excellent features among which you will also find many that are specific for Kubernetes environments.

Best features of Vault

I’m not going to list out all of the features – they are available in the official documentation. Let me focus on the most important ones and the ones also related to Kubernetes.

One security-dedicated service enterprise features

The fact that it’s a central place where you store all your confidential data may be alarming at first, but Vault offers many interesting functionalities that should remove any doubts in its security capabilities. One of them is the concept of unsealing Vault after start or restart. It is based on a Shamir’s Secret Sharing concept which requires the usage of multiple keys that should be owned and protected by different people. This definitely decreases the chance of interfering or tampering with stored data, as the whole process imposes transparency of such actions.
Of course there’s audit, high availability and access defined with well-documented policies.

Ability to store various type of data

The first thing that people want to store in places like Vault are passwords. This is probably because we use them most often. However, if you want to deploy Vault only for this purpose you should reconsider it, cause it’s much more powerful and it’s like driving a Ferrari using 1st gear only. Vault has many secret engines designed for different kind of data. The basic one – KV (Key-Value) – can be used for store any arbitrary data with advanced versioning. It can also act as your PKI or Time-based One Time Passwords(similar to Google Authenticator). But that’s not all. In my opinion, the real power of Vault lies in dynamic secrets.

Forget your passwords with dynamic secrets

It’s my personal opinion and I think that many people will agree with me that dynamic secrets are the best feature of Vault. If there was a single reason for me to invest my time and resources to implement Vault in my organization that would be it. Dynamic secrets change the way you handle authentication. Instead of configuring static passwords you let Vault create logins and passwords on the fly, on-demand, and also with limited usage time. I love the fact that Vault also rotates not only users passwords but also administrators as well, cause let’s be honest – how often do you change your password to your database and when the last time you did it?
Vault can manage access to your services instead of only storing static credentials and this is a game-changer. It can manage access to databases, cloud (AWS,Azure,Google), and many others.

No vendor lock-in

There are various cloud services available that provide similar, however limited in features, functionality. Google recently announced their Secret Manager, AWS has Parameter Store, and Azure offers Key Vault. If you’re looking for a way to avoid vendor lock-in and keep your infrastructure portable, multi-cloud enabled and feature-rich then Vault will satisfy your needs. Let’s not forget about one more important thing – not every organization uses cloud and since Vault can be installed anywhere it also suits these environments perfectly.

Multiple authentication engines with excellent Kubernetes support

In order to get access to credentials stored in Vault you need to authenticate yourself and you have plenty of authentication methods to choose from. You can use simple username and password, TLS certificates but also use your existing accounts from GitHub, LDAP, OIDC, most cloud providers and many others. These authentication engines can be used by people in your organization and also by your applications. However, when designing access for your systems, you may find other engines to be more suitable. AppRole is dedicated to those scenarios and it is a more generic method for any applications, regardless of the platform they run on. When you deploy your applications on Kubernetes you will be better off with native Kubernetes support. It can be used directly by your application, your custom sidecar or Vault Agent.

Native Kubernetes installation

Since Vault is a dedicated solution for security, proper deployment can be somewhat cumbersome. Fortunately, there is a dedicated installation method for Kubernetes that uses a Helm Chart provided and maintained by Vault’s authors (i.e. HashiCorp).
Although I really like and appreciate that feature I would use it only for non-production environments to speed up the learning process. For production deployments, I would still use traditional virtual machines and automate it with Terraform modules – they are also provided by HashiCorp in Terraform registry (e.g. for GCP).

Why it is now easier than ever

Until recently using Vault with Kubernetes required some additional, often complicated steps to provide secrets stored in Vault to an application running on Kubernetes. Even with Vault Agent you’re just simplifying only the token fetching part and leaving you with the rest of the logic i.e. retrieving credentials and making sure they are up to date. With additional component – Agent Sidecar Injector – the whole workflow is very simple now. After installing and configuring it (you do it once) any application can be provided with secrets from Vault in a totally transparent way. All you need to do is to add a few annotations to your Pod definitions such as these:

spec:
  template:
    metadata:
      annotations:
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/agent-inject-secret-helloworld: "secrets/helloworld"
        vault.hashicorp.com/role: "myapp"

No more writing custom scripts, placing them in a separate sidecar or init containers – everything is managed by Vault components designed to offload you from these tasks. It has really never been that easy! In fact, this combined with dynamic secrets described earlier, creates a fully passwordless solution. Access is managed by Vault and your (or your security team’s) job is to define which application should have access to particular services. That’s what I call a seamless and secure integration!

Conclusion

I’ve always been a fan of HashiCorp products and at the same time I’ve considered Kubernetes Secrets as an imperfect solution for providing proper security for storing credentials. With the excellent support for Kubernetes in Vault now we finally have a missing link in a form of a dedicated service with proper auditing, modularity and ease of use. If you think seriously about securing your Kubernetes workloads, especially in an enterprise environment, then HashiCorp Vault is the best solution there is. Look no further and start implementing – you’ll thank me later.

How to build CI/CD pipelines on Kubernetes

Kubernetes as a standard development platform

We started with single, often powerful, machines that hosted many applications. Soon after came virtualization, which didn’t actually change a lot from a development perspective but it did for the field of operations. So developers became mad, and that’s when the public cloud emerged to satisfy their needs instead of operations guys’. Now, this pendulum has moved once again and we have something that is beneficial for both sides – Kubernetes platform. I keep saying and will repeat it here again – I think it’s one of the best projects that have emerged in the last decade. It has completely changed the perspective of how we deliver applications and also how we manage platforms for them.
This time I want to focus on the delivery process and how it can be built and what the real benefits of using Kubernetes for that purpose are.

How CI/CD is different in Kubernetes

There are a couple of differences that actually are results of its design and usage of containers in general.

Everything is distributed

Kubernetes was built to satisfy the growing requirements of applications that run on a larger scale than ever before. Therefore applications can run on any node in a cluster and although you can control it, you shouldn’t unless you really have to. Nodes can also be distributed among different availability zones and you should actually treat them in the same way as containers – they are ephemeral and disposable.

Applications and services are delivered in immutable container images

No more quick changes or hotfixes applied to live (sometimes even production) environments. That’s something that many people cannot comprehend and is sometimes frustrating. In order to change something now, you need to build a new version of an application (the harder part) or change something within your environment (the easier part – e.g. changing firewall rules with NetworkPolicy, changing a configuration with ConfigMap etc.).

Changes in application environments are controlled with yaml files

From the platform management perspective, this is a huge improvement. Since Kubernetes is a declarative system, every change can be described as a set of yaml files. It can be intimidating at first but brings stability, predictability, and security to the whole process.

Everything as code

I guess this just sums up the previous points. When you maintain an environment for your applications you actually need to manage changes through files that you version, test and improve with great care. It can also involve every part of your environment starting from infrastructure (e.g. cloud or on-premise hardware) through Kubernetes cluster(s) and ending with environments containing your applications.

How can you improve your delivery process with Kubernetes features

Maybe you already have existing environments with nice CI/CD pipelines and you may wonder what are the real benefits of using Kubernetes for your applications. I personally see a lot of them and let’s go through the most important ones.

Increased transparency and trackability of changes

Has someone ever made an unpleasant surprise on your test or staging environment? I’m pretty sure it was made with good intentions, but often these unexpected changes lead to unnecessary hours of troubleshooting and frustration. With everything kept in yaml files and maintained as code in versioned git repositories with all good practices around it (i.e. code review, tests) these kinds of surprises should not happen. With the well-designed delivery process, all changes are trackable and no manual action performed outside of the standard process could affect working applications.

Test environments available rapidly, on-demand and in large quantities

Whenever there’s a need for a new environment for testing it often takes not only many resources but also a very long time. Thanks to Kubernetes and its ability to create environments from scratch using just yaml files it’s just really trivial. You can create them as many as you want since they are composed mostly of logical objects to separate and distinguish them – you are only limited by the resources available on your cluster. But cluster is also quite easily expandable it’s just a matter of something that is not – money you are able to put in your project.

Easier management of applications developed by multiple teams

Most people are attracted to Kubernetes as a platform for deployments for their microservices. Although it’s also a perfect place for other services (including queue systems, databases, etc.) it brings a lot of benefits for teams who develop their applications as microservices. Often a single team is responsible for many applications and also many teams work on a system that comprises dozens of microservices. That’s where it may be handy to have a standardized way of setting up and maintain the whole CI/CD process from one place. I recommend my approach with factories which I described in my article. Especially when you have dozens of microservices and teams responsible for them it is crucial to have a common approach that is scalable and maintainable (of course using code).

Increased security of your applications and environments

With Cloud Native approch you don’t build applications – you build complete environments. To be precise you create environments from versioned code and run your service from immutable container image with all dependencies included. And when it comes to these immutable images you can now scan them for vulnerabilities after they are built but also constantly as a background process which marks them as insecure to deploy.
Environments configuration can be secured with a proper process created around maintenance and governance of object definition kept as code. No unapproved or unknown change should be allowed that would compromise platform security and all services running on it.

Easier testing with databases available on-demand

Unless your applications are totally stateless and don’t need any external service then probably one of its dependencies is some kind of database. I know that’s always been a pain point in terms of testing new versions that require a new database instance and it always takes a long time to get one. Now thanks to containers we can create a new instance in a minute or so. And I’m not talking only about MySQL or PostgreSQL – did you know that you can run MS SQL Server and Oracle databases too? (although you need to talk with them about licensing since you never knows is appropriate or not). I would definitely recommend using operator frameworks such as KubeDB or other operators available at OperatorHUB.

Designing and building CI/CD pipeline

Let’s dig into more technical details now. I’m going to describe the whole process of designing and building CI/CD pipeline for an application running on Kubernetes.

Step 1 – Split CI/CD into CI and CD

CI and CD

It sounds trivial but many people believe it should go together. No, it doesn’t. You should split it into two separate processes, even if you’ll put it into a single pipeline. When you join them you often tend to use the same or similar tools for both parts and you just make it too complex at times.
Define clear goals of both parts:

  • CI – build and test application artifacts
  • CD – deploy artifacts created as a part of CI to Kubernetes environment

Tools

You definitely need some orchestrator that would be responsible for the delivery process. Here are some of the best solutions* available now:

  • Jenkins – of course, it’s my favorite one, to find out why please see my article on Jenkins where I show why it’s still one of the best choices for CI/CD orchestrator)
  • GitLab CI – pretty nice and very popular, not as powerful as Jenkins but easy to use with container registry built-in and it’s also a Git server
  • Jenkins-X – a powerful engine for automated builds; you actually don’t create any pipelines by yourself but rather use the one generated by Jenkins-X
  • Tekton – a cloud-native pipeline orchestrator that is under heavy development (it’s still in alpha); in a couple of months it should be a full-fledged CI/CD solution

* projects available on all platforms – cloud and on-premise

Step 2 – Split CI into three parts

Three stages of CI

Let’s split the whole CI into three parts. We’re going to do it in order to simplify and to have more choices for tools that we’re going to use. Continuous Integration for applications running on Kubernetes distinguishes itself with one important detail – we not only provide artifacts with applications but also with its environment definition. We must also take that into consideration.

2.1 Build application artifact

GOAL: Create, test and publish (optionally) artifact with application

In this step, we need to build an artifact. For applications requiring compilation step that would produce a binary package that should be built in a pristine, fresh environment using an ephemeral container.

TOOLS

It depends on your application. The most important thing here is to leverage Kubernetes platform features to speed up and standardize the build process. I recommend using ephemeral containers for the building process to keep it consistent across all builds. These containers should have already necessary tools built-in (e.g. maven with JDK, nodejs, python etc.).
I can also recommend using Source-To-Image (source mode) for small projects, especially if you’re running them on OpenShift. It brings the highest automation and standardization using images based on RHEL which is a nice feature appreciated by security teams.

EXAMPLE

Here’s an excerpt from a Jenkinsfile for my sample application written in go. It uses a Jenkins slave launched as a container in a pod on a Kubernetes cluster.


 stage('Build app') {
            agent {
                kubernetes {
                    containerTemplate {
                        name 'kubectl'
                        image 'golang:1.12-buster'
                        ttyEnabled true
                        command 'cat'
                    }
                }
            }
            steps {
                sh 'go get -d'
                sh 'make test'
                sh 'make build
                stash name: "app", includes: "cow"
            }
        }


2.2 Build container image

GOAL: Build a container image, assign tag and publish it to a container registry

It’s pretty straightforward – we need to publish a container image with our application built in the previous step.

TOOLS

I love the simplicity and I also like Source-To-Image (binary mode) to create container images in a standardized way without defining any Dockerfiles.
If you need to define them please consider using Kaniko – it can use existing Dockerfile but it’s simpler and doesn’t require any daemon running on a host as Docker does.

EXAMPLE

In my sample app, I use kaniko to build a container image without any Docker daemon running (Jenkins runs on Kubernetes and should not have a direct connection to Docker since it’s insecure). I use a custom script to test it also outside of Jenkins pipeline and because Jenkins is meant to be used as an orchestrator, not a development environment (according to its founders).

 stage('Build image') {
            agent {
                kubernetes {
                    yamlFile "ci/kaniko.yaml"
                }
            }
            steps {
                container(name: 'kaniko', shell: '/busybox/sh') {
                    unstash 'app'
                    sh '''
                    /busybox/sh ci/getversion.sh > .ci_version
                    ver=`cat .ci_version`
                    ci/build-kaniko.sh cloudowski/test1:\$ver Dockerfile.embed
                    '''
                }
            }
        }

2.3 Prepare Kubernetes manifests

GOAL: Prepare Kubernetes manifests describing the application and its environment

This step in Kubernetes specific. Building application is not only about providing binary container image but also about delivering proper manifests describing how it should be deployed (how many instances, affinity rules, what storage should be used etc.) and how its environment should be configured (what credentials to use for connecting to database, cpu reservation and limits, additional roles to be used by it etc.).

TOOLS

The simplest solution here is to use plain yaml files. Of course for bigger environments, it’s just too hard to manage and I would still recommend going with yaml files and Kustomize. Simplicity over complexity. When it comes to the latter the most commonly used software is, of course, Helm with its Charts. I don’t like it (and many others as well) mainly because of its Tiller component that makes it vulnerable and insecure. With version 3 and all these problems resolved it would be a better solution with a powerful templating system (if you really, really need it).

Step 3 – Deploy to preview environments

Deployment to preview and permanent environments

GOAL: Enable testing configuration and container image before releasing

It’s a highly coveted feature by developers – to have a dedicated and separated environment that can be used for testing. This preview environment can be created automatically (see example below) by leveraging Kubernetes manifests describing it.

TOOLS

There are no dedicated tools but I find Kustomize with its overlay feature to be perfect for it although you need a custom solution that can be integrated with your CI/CD orchestrator (i.e. triggering a deployment on certain conditions).
Of course, other solutions (e.g. Helm Chart) are also viable – you just need to use namespaces and define all necessary objects there.

EXAMPLE

In my example, I’m using a dedicated container image with Kustomize binary in it and a custom script that handles all the logic. It creates a dedicated namespace and other objects kept in the same repository using Kustomize overlay with a proper name derived from a branch name.


stage('Deploy preview') {
            agent {
                kubernetes {
                    serviceAccount 'deployer'
                    containerTemplate {
                        name 'kubectl'
                        image 'cloudowski/drone-kustomize'
                        ttyEnabled true
                        command 'cat'
                    }
                }
            }
            steps {
                sh 'ci/deploy-kustomize.sh -p'
            }
            when {
                // deploy on PR automatically OR on non-master when commit starts with "shipit"
                anyOf {
                    allOf {
                        changelog '^shipit ?.+


Step 4 – Prepare promotion step

Promote artifacts

GOAL: Release artifacts and start the deployment process

After the container image is ready, all the tests pass successfully, code review acknowledges it’s proper quality, you can now promote your changes and initiate the deployment to the production environment through all required intermediate environments (e.g. test, stage).

TOOLS

Promotion could be done entirely in Git by merging development or features branches with a release branch (e.g. master).

Step 5 – Deploy to permanent environments

GOAL: Deliver applications to production through a hierarchy of test environments

It’s an important step and quite simple at the same time. We have all the necessary artifacts available and we just need to deploy them to all of the persistent environments, including production. In the example below there are only stage and prod as persistent environments and there are no tests here. However, this is a perfect place to perform some more complex tests like stress, performance and security testing.
Please note that the production environment often runs on a separate cluster but that can be easily handled by just changing a context for kubectl before applying changes on it.

TOOLS

No new tools here, just an API request or a click on the web console is required to push your new version through the rest of the pipeline.


stage('Deploy stage') {
            agent {
                kubernetes {
                    serviceAccount 'deployer'
                    containerTemplate {
                        name 'kubectl'
                        image 'cloudowski/drone-kustomize'
                        ttyEnabled true
                        command 'cat'
                    }
                }
            }
            steps {
                sh 'ci/deploy-kustomize.sh -t kcow-stage'
                // rocketSend channel: 'general', message: "Visit me @ $BUILD_URL"
            }
            when {
                allOf {
                    // changelog '^deploy ?.+


Conclusion

I always tell to my clients – it has never been easier to improve your environments and speed up delivery of your applications. With proper CI/CD pipeline, all of these new tools, the declarative nature of Kubernetes, we are now able to keep it under control and continuously improve it. You should leverage containers to make it repeatable, portable and also more flexible – developers will have more ways to test and experiment, security teams will have full insight into all changes being made to the system, and finally, your end-users will appreciate fewer bugs and more features introduced in your applications


10 most important differences between OpenShift and Kubernetes

UPDATED on 10.6.2019 (after the release of OpenShift 4.1): Added information on OpenShift 4.

UPDATED on 30.8.2019: Added information on CodeReady Containers for running single OpenShift node.

If you’re interested in OpenShift 4 please check out also my honest review of it.

OpenShift has been often called as “Enterprise Kubernetes” by its vendor – Red Hat. In this article, I’m describing real differences between OpenShift and Kubernetes.

Continue reading

Maintaining big Kubernetes environments with factories

People are fascinated by containers, Kubernetes and cloud native approach for different reasons. It could be enhanced security, real portability, greater extensibility or more resilience. For me personally, and for organizations delivering software products for their customers, there is one reason that is far more important – it’s the speed they can gain. That leads straight to decreased Time To Market, so highly appreciated and coveted by the business people, and even more job satisfaction for guys building application and platforms for them.

It starts with code

So how to speed it up? By leveraging this new technology and all the goodies that come with it. The real game-changer here is the way you can manage your platform, environments, and applications that run there. With Kubernetes based platforms you do it in a declarative manner which means you define your desired state and not particular steps leading to the implementation of it (like it’s done in imperative systems). That opens up a way to manage the whole system with code. Your primary job is to define your system state and let Kubernetes do its magic. You probably want to keep it in files in a versioned git repository (or repositories) and this article shows how you can build your platform by efficiently splitting up the code to multiple repositories.

Code converted by Kubernetes to various resources

Areas to manage

Since we can manage all the things from the code we could distinguish a few areas to delegate control over a code to different teams.
Let’s consider these three areas:

1. Platform

This is a part where all platform and cluster-wide configuration are defined. It affects all environments and their security. It can also include configuration for multiple clusters (e.g. when using OpenShift’s Machine Operator or Cluster API to install and manage clusters).

Examples of objects kept here:

  • LimitRange, ResourceQuota
  • NetworkPolicy, EgressNetworkPolicy
  • ClusterRole, ClusterRoleBinding
  • PersistentVolume – static pool
  • MachineSet, Machine, MachineHealthCheck, Cluster

2. Environments (namespaces) management

Here we define how particular namespaces should be configured to run applications or services and at the same time keep it secure and under control.

Examples of objects kept here:

  • ConfigMap, Secret
  • Role, RoleBinding

3. CI/CD system

All other configuration that is specific to an application. Also, the pipeline definition is kept here with the details on how to build an application artifact from code, put it in a container image and push it to a container registry.

Examples of objects kept here:

  • Jenkinsfile
  • Jenkins shared library
  • Tekton objects: Pipeline, PipelineRun, Task, ClusterTask, TaskRun, PipelineResource
  • BuildConfig
  • Deployment, Ingress, Service
  • Helm Charts
  • Kustomize overlays

Note that environment-specific configuration is kept elsewhere.

Factories

Our goal here is simple – leverage containers and Kubernetes features to quickly deliver applications to production environments and keep it all as code. To do so we can delegate the management of particular areas to special entities – let’s call them factories.
We can have two types of factories:

  • Factory Building Environments (FBE) – responsible for maintaining objects from area 1 (platform).
  • Factory Building Applications (FBA) – responsible for maintaining objects from area 2 (environments) and area 3 (CI/CD)

Factory Building Environments

First is a Factory Building Environments. In general, a single factory of this type is sufficient because it can maintain multiple environments and multiple clusters.
It exists for the following main reasons:

  • To delegate control over critical objects (especially security-related) to a team of people responsible for platform stability and security
  • To keep track of changes and protect global configuration that affects all the shared services and applications running on a cluster (or clusters)
  • To ease management of multiple clusters and dozens (or even hundreds) of namespaces

FBE takes care of environments and cluster configuration

Main tasks

So what kind of tasks does this factory is responsible for? Here are the most important ones.

Build and maintain shared images

There are a couple of container images that are used by many services inside your cluster and have a critical impact on platform security or stability. This could be in particular:

  • a modified Fluentd container image
  • a base image for all your java (or other types) applications with your custom CA added to a PKI configuration on a system level
  • similarly – a custom s2i (Source to Image) builder
  • a customized Jenkins Image with a predefined list of plugins and even seed jobs

Apply security and global configuration

This is actually the biggest and most important task of this factory. It should read a dedicated repository where all the files are kept and apply it to either at a cluster level or for a particular set of environments (namespaces).

Provide credentials

In some cases, this should also be a place where some credentials are configured in environments – for example, database credentials that shouldn’t be visible by developers or stored in an application repository.

Build other factories

Finally, this factory also builds other factories (FBA). This task includes creating new namespaces, maintaining their configuration and deploying required objects forming a new factory.

How to implement

FBE is just a concept that can be implemented in many ways. Here’s a list of possible solutions:

  1. The simplest case – a dedicated repository with restricted access, code review policy, and manual provisioning process.
  2. The previous solution can be extended with a proper hook attached to an event of merge of a pull request that will apply all changes automatically.
  3. As a part of git integration there can be a dedicated job on CI/CD server (e.g. Jenkins) that tracks a particular branch of the repo and also applies it automatically or on-demand.
  4. The last solution is the most advanced and also follows the best practices of cloud native approach. It is a dedicated operator that tracks the repository and applies it from inside a container. There could be different Custom Resources that would be responsible for different parts of configurations (e.g. configuration of namespace, global security settings of a cluster, etc.).

Factory Building Applications

The second type of factory is a factory building applications. It is designed to deliver applications to end-users to prod environments. It addresses the following challenges of delivery and deployment processes:

  • Brings more autonomy for development teams who can use a dedicated set of namespaces for their delivery process
  • Creates a safe place for experiments with preview environments created on-demand
  • Ease the configuration process by reducing duplication of config files and providing default settings shared by applications and environments
  • Enables grouping of applications/microservices under a single, manageable group of environments with shared settings and an aggregated view on deployment pipelines runs
  • Separates configuration of Kubernetes objects from a global configuration (maintained by FBE) and application code to keep track of changes

FBA produces applications and deploys them to multiple environments

Main tasks

Let’s have a look at the main tasks of this factory.

Build and deploy applications

The most important job is to build applications from a code, perform tests, put them in a container image and publish. When a new container image is ready it can be deployed to multiple environments that are managed by this factory. It is essentially the description of CI/CD tasks that are implemented here for a set of applications.

Provide common configuration for application and services

This factory should provide an easy way of creating a new environment for an application with a set of config files defining required resources (see examples of objects in area 2 and 3).

Namespace management

FBA manages two types of environments (namespaces):

  • permanent environments – they are a part of CI/CD pipeline for a set of applications (or a single app) and their services
  • preview environments – these are environments that are not a part of CI/CD pipeline but are created on-demand and used for different purposes (e.g. feature branch tests, performance tests, custom scenario tests, etc.)

It creates multiple preview environments and destroys them if they are no longer needed. For permanent environments, it ensures that they have a proper configuration but never deletes them (they are special and protected).

How to implement

Here are some implementation tips and more technical details.

  1. A factory can be created to maintain environments and CI/CD pipeline for a single application, however, often many applications are either developed by a single team or are a part of a single business domain and thus it is convenient to keep all the environments and processes around deployment in a single place.
  2. A factory consists of multiple namespaces, for example:
    • FN-cicd – namespace where all build-related and delivery activities take place (FN could be a factory name or some other prefix shared by namespaces managed by it)
    • FN-test, FN-stage, FN-prod – permanent environments
    • various number of preview environments
  3. Main tasks can be implemented by Jenkins running inside FN-cicd namespace and can be defined either by independent Jenkinsfiles or with jobs defined in a shared library configured on a Jenkins instance.
    In OpenShift it’s even easier, as you can use BuildConfig objects of Pipeline type which will create proper jobs inside a Jenkins instance.
  4. A dedicated operator seems to be again the best solution. It could be implemented as the same operator which maintains FBE with a set of Custom Resources for managing namespaces, pipelines and so on.

Summary

A couple of years ago, before docker and containers, I was a huge fan of Infrastructure as Code. With Kubernetes, operators, and thanks to its declarative nature, it is now possible to manage all the aspects of application building process, deployment, management of environments it would run in and even whole clusters deployed across multiple datacenters or clouds. Now it’s becoming increasingly important how are you handling the management of the code responsible for maintaining it. The idea of using multiple factories is helpful for organizations with many teams and applications and allows easy scaling of both and keeping it manageable at the same time.

Honest review of OpenShift 4

We waited over 7 months for OpenShift Container Platform 4 release. We even got version 4.1 directly because Red Hat decided not to release version 4.0. And when it was finally released we almost got a new product. It’s a result and implication of acquisition of CoreOS by Red Hat announced at the beginning of 2018. I believe that most of the new features in OpenShift 4 come from the hands of a new army of developers from CoreOS and their approach to building innovative platforms.
But is it really that good? Let me go through the most interesting features and also things that are not as good as we’d expect from over 7-month development (OpenShift 3.11 was released in October 2018).

If ain’t broke, don’t fix it

Most parts of OpenShift haven’t changed or changed very little. In my comparison of OpenShift and Kubernetes I’ve pointed out the most interesting features of it and there are also a few remarks on version 4.
To make it short here’s my personal list of the best features of OpenShift that just stayed at the same good level comparing to version 3:

  • Integrated Jenkins – makes it easy to build, test and deploy your containerized apps
  • BuildConfig objects used to create container images – with Source-To-Image (s2i) it is very simple and easy to maintain
  • ImageStreams as an abstraction level that eases the pain of upgrading or moving images between registries (e.g. automatic updates)
  • Tightened security rules with SCC that disallows running containers as root user. Although it’s a painful experience at first this is definitely a good way of increasing overall security level.
  • Monitoring handled by the best monitoring software dedicated to container environments – Prometheus
  • Built-in OAuth support for services such as Prometheus, Kibana and others. Unified way of managing your users, roles and permissions is something you’ll appreciate when you start to manage access for dozens of users

Obviously, we can also leverage Kubernetes features remembering that some of them are not supported by Red Hat and you’ll be on your own with any problems they may cause.

The best features

I’ll start with features I consider to be the best and sometimes revolutionary, especially when comparing it to other Kubernetes-based platforms or even previous version 3 of OpenShift.

New flexible and very fast installer

This is huge and probably one of the best features. If you’ve ever worked with Ansible installer available in version 3, then you’d be pleasantly surprised or even relieved you don’t need to touch it ever again. Its code was messy, upgrades were painful and often even small changes took a long time (sometimes resulting in failures at the end) to apply.
Now it’s something far better. Not only because it uses Terraform underneath (the best tool available for this purpose) for managing, is faster and more predictable, but also it’s easier to operate. Because the whole installation is performed by a dedicated operator all you need to do is provide a fairly short yaml file with necessary details.
Here’s the whole file that is sufficient to install a multi-node cluster on AWS:

apiVersion: v1
baseDomain: example.com
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 3
  platform:
    aws:
      rootVolume:
        size: 50
        type: gp2
      type: t3.large
  replicas: 2
controlPlane:
  hyperthreading: Enabled
  name: master
  platform:
    aws:
      rootVolume:
        size: 50
        type: gp2
      type: t3.xlarge
  replicas: 3
metadata:
  creationTimestamp: null
  name: ocp4demo
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineCIDR: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: us-east-1
pullSecret: '{"auths":{"cloud.openshift.com":{"auth":"REDACTED","email":"[email protected]"}}}'
sshKey: |
  ssh-rsa REDACTED [email protected]

The second and yet more interesting thing about the installer is that it uses Red Hat Enterprise Linux CoreOS (RHCOS) as a base operating system. The biggest difference from classic Red Hat Enterprise Linux (RHEL) is how it’s configured and maintained. While RHEL is a traditional system you operate manually with ssh and Linux commands (sometimes they are executed by config management tools such as Ansible), RHCOS is configured with Ignite (custom bootstrap and config tool developed by CoreOS) at the start and shouldn’t be configured in any other way. That basically allows to create a platform that follows an immutable infrastructure principle – all nodes (except control plane with master components) can be treated as ephemeral entities and just like pods can be quickly replaced with fresh instances.

Unified way of managing nodes

Red Hat introduced a new API for node management. It’s called “Machine API” and is mostly based on Kubernetes Cluster API project. This is a game changer when it comes to provisioning of nodes. With MachineSets you can distribute easily your nodes among different availability zones but also you can manage multiple node pools (just like in GKE I reviewed some time ago with different settings (e.g. pool for testing, pool for machine learning with GPU attached). Management of the nodes has never been that easy!
For me, that’s a game changer and I predict it’s going to be also a game changer for Red Hat. With this new flexible way of provisioning alongside with RHCOS as default system, OpenShift becomes very competitive to Kubernetes services available on major cloud providers (GKE,EKS,AKS).

Rapid cluster autoscaling

Thanks to the previous feature we can finally scale our cluster in a very easy and automated fashion. OpenShift delivers cluster autoscaling operator that can adjust the size of your cluster by provisioning or destroying nodes. With RHCOS it is done very quickly which is a huge improvement over the manual, error-prone process used in the previous version of OpenShift and RHEL nodes.
Not only does it work on AWS but also on-premise installation based on VMware vSphere. Hopefully, soon it will be possible on most major cloud providers and maybe on non-cloud environments as well (spoiler alert – it will, see below for more details).
We missed this elasticity feature and finally it minimizes the gap between those who are lucky (or simply prefer) to use cloud and those who for some reasons choose to build it using their own hardware.

Good parts you’ll appreciate

New nice-looking web console that is very practical

This is the most visible for end-user and it looks like it was completely rewritten, better designed, good looking piece of software. We’ve seen a part of it in version 3 responsible for cluster maintenance but now it’s a single interface for both operations and developers.
Cluster administrators will appreciate friendly dashboards where you can check cluster health, leverage tighter integration with Prometheus monitoring to observe workloads running on it.
Although many buttons open a simple editor with yaml template in it, it is still the best web interface available for managing your containers, their configuration, external access or deploying a new app without any yaml.

Red Hat also prepared a centralized dashboard (https://cloud.redhat.com) for managing all your OpenShift clusters. It’s quite simple at the moment but I think it’s an early version of it.

Oh, they also got rid of one annoying thing – now you finally log in once and leverage Single Sign-On feature to access external services i.e. Prometheus, Grafana and Kibana dashboards, Jenkins and others that you can configure with OAuth.

Operators for cluster maintenance and as first-class citizens for your services

Operator pattern leverage Kubernetes API and promises “to put operational knowledge into software” which for end-user brings an easy way for deploying and maintaining complex services. It’s not a big surprise that in OpenShift almost everything is configured and maintained by operators. After all, this concept was born in CoreOS and has brought us the level of automation we could only dream of. In fact, Red Hat deprecated its previous attempt to automate everything with Ansible Service Broker and Service Catalog. Now operators handle most of the tasks such as cluster installation, its upgrades, ingress and registry provisioning, and many, many more. No more Ansible – just feed these operators with proper yaml files and wait for the results.
At the same time, Red Hat created a website with operators ( https://www.operatorhub.io/) and embedded it inside OpenShift. They say it will grow and you’ll be able to find there many services that are very easy to use. Actually, during the writing of this article, the number of operators available on OperatorHub has doubled and it will grow and become stable (some of them didn’t work for me or required additional manual steps).
For any interested in providing their software as operator there is operator-framework project that helps to build it (operator-sdk), run it and maintain it (with Operator Lifecycle Manager). In fact, you can start even without knowing how to write in golang, as it provides a way to create an operator using Ansible (and converts Helm Charts too). With some small shortcomings, it’s the fastest way to try this new way of writing kubernetes-native applications.

Global configuration handled by operators and managed with yaml files kept inside a control plane[/caption]In short – operators can be treated as a way of providing services in your own environment similarly to the ones available on public cloud providers (e.g. managed database, kafka cluster, redis cluster etc.) with a major difference – you have control over the software that provides those services and you can build them on your own (become a producer) while on cloud you are just a consumer.

I think that essentially aligns perfectly with open source spirit that started an earlier revolution – Linux operating system that is the building block for most of the systems running today.

Cluster configuration kept as API objects that ease its maintenance

Forget about configuration files kept somewhere on the servers. They cause too many problems with maintenance and are just too old-school for modern systems. It’s time for “everything-as-code” approach. In OpenShift 4 every component is configured with Custom Resources (CR) that are processed by ubiquitous operators. No more painful upgrades and synchronization among multiple nodes and no more configuration drift. You’re going to appreciate how easy now maintenance has become.
Here are the short list of operators that configure cluster components that were previously maintained in a rather cumbersome way (i.e. different files provisioned by ansible or manually):

  • API server (feature gates and options)
  • Nodes via Machine API (see above for more details)
  • Ingress
  • Internal DNS
  • Logging (EFK) and Monitoring (Prometheus)
  • Sample applications
  • Networking
  • Internal Registry
  • OAuth (and authentication in general)
  • And many more..
Global configuration handled by operators and managed with yaml files kept inside a control plane

Now all these things are maintained from code that is (or rather should be) versioned, audited and reviewed for changes. Some people call it GitOps, I myself call it “Everything as Code” or to put it simply – the way it should be managed from the beginning.

Bad parts (or not good enough yet)

Nothing is perfect, even OpenShift. I’ve found a few things that I consider to be less enjoyable than previous features. I suspect and hope they will improve in future releases, but at the time of writing (OpenShift 4.1) they spoil this overall picture of it.

Limited support for fully automatic installation

Biggest disappointments of it – list of supported platform that leverages automatic installation. Here it is:

  • AWS
  • VMware vSphere

Quite short, isn’t it? It means that when you want to install it on your own machines you need to have vSphere. If you don’t then be prepared for a less flexible install process that involves many manual steps and is much, much slower.
It also implies another flaw – without a supported platform, you won’t be able to use cluster autoscaling or even manual scaling of machines. It will be all left for you to manage manually.

This makes OpenShift 4 usable only on AWS an vSphere. Although it could work anywhere, it is a less flexible option with a limited set of features. Red Hat promises to extend the list of supported platforms in future releases (Azure,GCP and OpenStack is coming in version 4.2) – there are already existing implementation also for bare metal installations so hopefully, this will be covered as well.

You cannot perform disconnected installations

Some organizations have very tight security rules that cut out most of the external traffic. In previous version, you would use a disconnected installation that could be performed offline without any access to the internet. Now OpenShift requires access to Red Hat resources during installation – they collect anonymized data (Telemetry) and provide a simple dashboard from which you can control your clusters.
They promise to fix it in upcoming version 4.2 so please be patient.

Istio is still in Tech Preview and you can’t use it in your prod yet

I’m not sure about you but many organizations (and individuals like me) have been waiting for this particular feature. We’ve had enough of watching demos, listening to how Istio is the best service mesh and how many problems it will address. Give us stable (and in case of Red Hat also supported) version of Istio! According to published roadmap It was supposed to be available already in version 4.0 but it wasn’t released so we obviously expected it to be GA in 4.1. For many, this is one of the main reasons to consider OpenShift as enterprise container platform for their systems. I sympathize with all of you and hope this year we’re all going to move Istio from test systems to production. Fingers crossed!

CDK/Minishift options missing makes testing harder

I know it’s going to be fixed soon but at the moment the only way of testing OpenShift 4 is either use it as a service (OpenShift Online, Azure Red Hat OpenShift) or install it which takes roughly 30 minutes. For version 3 we have Container Development Kit (or its open source equivalent for OKD – minishift) which launches a single node VM with Openshift and it does it in a few minutes. It’s perfect for testing also as a part of CI/CD pipeline.
Certainly, it’s not the most coveted feature but since many crucial parts have changed since version 3 it would be good to have a more convenient way of getting to know it.

UPDATED on 30.8.2019 – there is a working solution for single node OpenShift cluster. It is provided by a new project called CodeReady Containers and it works pretty well.

Very bad and disappointing

Now this is a short “list” but I just have to mention it since it’s been a very frustrating feature of OpenShift 3 that just happened to be a part of version 4 too.

Single SDN option without support for egress policy

I still can’t believe how the networking part has been neglected. Let me start with a simple choice or rather lack of it. In version 3 we could choose Calico as an SDN provider alongside with OpenShift “native” SDN based on Open vSwitch (overlay network spanned over software VXLAN ). Now we have only this single native implementation but I guess we could live with it if it was improved. However, it’s not. In fact when deploying your awesome apps on you freshly installed cluster you may want to secure your traffic with NetworkPolicy acting as Kubernetes network firewall. You even have a nice guide for creating ingress rules and sure, they work as they should. If you want to limit egress traffic you can’t leverage egress part of NetworkPolicy, as for some reason OpenShift still uses its dedicated “EgressNetworkPolicy” API which has the following drawbacks:

  • You should create a single object for an entire namespace with all the rules – although many can be created, only one is actually used (in a non-deterministic way, you’ve been warned) – no internal merge is being done as it is with standard, Kubernetes NetworkPolicy objects
  • You can limit only traffic based on IP CIDR ranges or DNS names but without specifying ports (sic!) – that’s right, it’s like a ‘80s firewall appliance operating on L3 only…
OpenShift web interface for managing NetworkPolicy is currently simple web yaml editor with some built-in tips on how to write them

I said it – for me, it’s the worst part of OpenShift that makes the management of network traffic harder. I hope it will be fixed pretty soon and for now Istio could potentially fix it on an upper layer. Oh wait, it’s not supported yet..

Summary

Was it worth waiting for OpenShift 4? Yes, I think it was. It has some flaws that soon are going to be fixed and it’s still the best platform for Kubernetes workloads that comes with support. I consider this version as an important milestone for Red Hat and its customers looking for a solution to build a highly automated platform – especially when they want to do it on their own hardware, with full control and freedom of choice. Now with operator pattern so closely integrated and promoted, it starts to look like a really good alternative to the public cloud, something that was promised by OpenStack and it looks like it’s going to be delivered by Kubernetes with OpenShift