Note: This applies to Ververica Platform 2.2 or later
Introduction
This article describes the requirements imposed by Ververica Platform and its managed Flink jobs on the underlying Kubernetes cluster. This serves as a guideline for customers to set up and design their Kubernetes cluster.
Kubernetes Version and Cluster
The current Ververica Platform release works with:
-
Kubernetes 1.11+
-
OpenShift 3.11+
Previous versions' Kubernetes compatibility is listed in our documentation.
We recommend setting up a dedicated Kubernetes cluster for Ververica Platform and Flink jobs. If you install Ververica Platform in a shared cluster with other applications, keep in mind that the Kubernetes control plane will be shared in this case.
High-Level Overview
The following diagram depicts a high-level overview of the resources involved in running a Ververica Platform installation. It depicts a recommended setup, where Flink jobs are deployed into separate namespaces (`vvp-jobs-1`, `vvp-jobs-2`) from the namespace where Ververica Platform is installed (`vvp`). Such separation creates the basis for multi-tenant setups. In the remainder of this article, we will use the term “platform namespace” and “job namespace” to refer to these two types of namespaces. As shown in the diagram, the cross-namespace communication from the platform namespace to the job namespaces must be allowed via `8081/TCP` in your network policy if configured.
API access requirements across the namespaces are discussed in more detail in the later section of this article.
Outside Network Access
Docker Registry
By default, Ververica Platform requires access to registry.ververica.com during the installation and later for fetching Flink-related artifacts. As our registry is provided without SLA and customers are recommended to set up their own registry with all the required images. Ultimately, Ververica Platform requires access to the configured docker registry.
Object Store
Flink jobs’ checkpoints/savepoints need to be stored in durable storage like object store (like S3, minio) or HDFS. So, the network connection from the Kubernetes cluster to the object store or HDFS needs to be open. Flink jobs will be configured with access credentials, service accounts, or principals to access checkpoints/savepoints. This is the recommended approach.
Mounted NAS/NFS is another option if it can be mounted into pods as volumes. In this case, any expansion (e.g., adding nodes) of the underlying Kubernetes cluster needs to ensure the corresponding NAS/NFS configurations are in place.
Universal Blob Storage
Ververica Platform supports to use AWS S3, Microsoft ABS, Apache Hadoop® HDFS, Google GCS, or Alibaba OSS as universal blob storage. In addition to storing job artifacts, it handles checkpoints and savepoints (as mentioned in the previous section) automatically. If you plan to use it, the network connection from the Kubernetes cluster to the blob storage needs to be open.
Metrics and Logging Systems
Metrics and logging systems are important for debugging Flink jobs issues. You need to install your own metrics system (e.g., Prometheus/Grafana) and logging systems (e.g., Elasticsearch/Kibana) and enable network access from Flink job pods to those systems.
Other External Systems
If your Flink jobs need to access external systems like Kafka or Elasticsearch, those network connections from the Kubernetes to those systems need to be open as well.
Persistent Volumes
By default, Ververica Platform persists its metadata locally using an SQLite DB on a persistent volume (PV). The PV claim has to be satisfied with:
- 8Gi Capacity (increase if you plan to run lots of Flink deployments)
- ReadWriteOnce access mode
For a production environment, we recommend storing the platform metadata in an external DB such as MySQL, PostgreSQL, or MS SQL Server. So, the network communication from the Kubernetes cluster to the configured DB server needs to be open.
Kubernetes API Access Requirements
By default, Ververica Platform has the following configuration
rbac: create: true serviceAccountName: default
and will attempt to create a service account with the required permissions for managing the Flink clusters running in the job namespaces.
If you want to use a pre-created service account, set `rbac.create` to`false`, and `rbac.serviceAccountName` to the name of your service account. The service account must have the following permissions in the job namespaces:
rules: - apiGroups: ["apps", "extensions"] resources: ["deployments"] verbs: ["create", "delete", "get", "list", "patch", "update", "watch"] - apiGroups: [""] resources: ["configmaps", "pods", "services", "secrets", "serviceaccounts"] verbs: ["create", "delete", "get", "list", "patch", "update", "watch"] - apiGroups: ["batch"] resources: ["jobs"] verbs: ["create", "delete", "get", "list", "patch", "update", "watch"] - apiGroups: ["rbac.authorization.k8s.io"] resources: ["roles", "rolebindings"] verbs: ["create", "delete", "get", "list", "patch", "update", "watch"]
If Kubernetes HA is used in Flink jobs, the service accounts in the job namespace must have the permissions to access Kubernetes ConfigMaps in the same namespace:
- apiGroups: [ "" ] resources: [ "configmaps"] verbs: [ "create", "delete", "get", "list", "patch", "update", "watch" ]
For more information, see our KB article.
Multi-tenant Deployments
Namespaces where Flink jobs will be deployed need to be first created using native Kubernetes tooling and then added to the `values.yaml` file before installing/upgrading via helm, for example:
vvp: ... rbac: additionalNamespaces: - vvp-jobs-1 - vvp-jobs-2
Appending those job namespaces will add the permissions listed above to the ServiceAccount. But if you use an existing service account, you would have to grant the permissions manually, as described in the previous section.
Kubernetes Resources for Deployments and SessionClusters
The Kubernetes resources that Ververica Platform will create depend on the kind of Flink deployment, i.e. a standalone Deployment or a shared SessionCluster, and the features used.
Job Deployment (default)
Session Cluster
OpenShift-specifics
By default, a specific `fsGroup` is set in the platform’s charts. Those are controlled directly by OpenShift and need to be “unset“ by adding an empty `securityContext` to the `values.yaml` file:
vvp:
...
securityContext:
Note: `securityContext` is at the same root configuration level, as `vvp`.
Hardware Resources
Ververica Platform itself (without any Flink jobs running) requests the following resources from Kubernetes. Allocate more resources if you plan to run many jobs.
appmanager:
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: 250m
memory: 1Gi
gateway:
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: 250m
memory: 2Gi
ui:
resources:
limits:
cpu: 100m
memory: 32Mi
requests:
cpu: 100m
memory: 32Mi
Access to Ververica Platform
There are multiple ways to access Ververica Platform. During development or testing, you can use Kubernetes port forwarding, but for a production setup, it is recommended to use a Kubernetes ingress or Service to access Ververica Platform. Ververica Platform installation creates a Kubernetes service called `<release>-ververica-platform`. It can be used to set up a LoadBalancer or an Nginx ingress to provide access to users. For more information, check Ververica Platform documentation.