Question
I have an S3 compatible blob storage system (e.g., Minio, Scality) which sits behind an HTTPS endpoint with a self-signed certificate. How can I use it as a Universal Blob Storage in Ververica Platform?
Answer
Note: This article applies to Ververica Platform 2.0-2.8.
You can use Amazon S3 or other S3 compatible blob storage systems with Ververica Platform. If your S3 has an HTTP endpoint or an HTTPS endpoint with a public CA-signed certificate, it will just work with a simple configuration like the one below:
vvp: blobStorage: baseUri: s3://my-bucket/vvp
s3:
endpoint: "http:// or https:// with public CA signed certificate", or
region: ...
Warning: when configuring `endpoint` or `region`, you must configure only one of them, not both.
If your certificate is a self-signed one, then follow the steps below for applying the additional required configuration.
1. Get your truststore ready
First, make sure your S3 self-signed certificates are valid for the S3 endpoint you want to configure in Ververica Platform. For example, if your S3 endpoint is `https://minio.minio.svc:9000`, then your self-signed certificates must be valid for the FQDN `minio.minio.svc`
The next step is to get the complete truststore into a file, let's say, vvp.truststore
. This could mean
- (Recommended) add your S3 self-signed certificate and all other self-signed certificates needed by your jobs into the existing `cacerts` bundled with the JVM, e.g.,
/usr/local/openjdk-11/lib/security/cacerts
, or - Create a truststore that contains only your self-signed certificates if you are sure that covers all HTTPS endpoints your Flink job will access.
Warning: as you will see in the later steps, we are going to use the environment variable `JAVA_TOOL_OPTIONS` to set the JVM system property `javax.net.ssl.trustStore`. This means that not including the JVM bundled `cacerts` will prevent your jobs from accessing other HTTPS endpoints with the public CA-signed certificates. Unless you are sure this is the desired outcome, we recommend the first approach above.
For your reference, here are the commands to add your S3 self-signed certificate (`cert.pem`) into the JVM bundled `cacerts`. Skip these commands if you already have your truststore ready.
# retrieve the JVM bundled cacerts from the Ververica Platform's docker image into a local file
docker run registry.ververica.com/v2.5/vvp-appmanager:2.5.0 tar -C /usr/local/openjdk-11/lib/security -cf - cacerts | tar xf -
# add your S3 self-signed certificate 'cert.pem' into the cacerts with an alias 's3cert'
keytool -import -alias s3cert -file cert.pem -storetype JKS -keystore cacerts -storepass changeit -noprompt
# rename as we use 'vvp.truststore' in the rest of this article
mv cacerts vvp.truststore
2. Store the truststore in a Kubernetes Secret
You need to store the vvp.truststore
into a K8s secret in the same Kubernetes namespace where Ververica Platform is running and also in each namespace where your Flink jobs will run.
# run this for each namespace
kubectl --namespace=<namespace> \
create secret generic vvp-truststore --from-file=vvp.truststore
3. Configure Ververica Platform
Install/upgrade your Ververica Platform with the following Helm Values file:
env:
- name: JAVA_TOOL_OPTIONS
value: "-Djavax.net.ssl.trustStore=/vvp-truststore/vvp.truststore"
volumeMounts:
- name: vvp-truststore
mountPath: /vvp-truststore
readOnly: true
volumes:
- name: vvp-truststore
secret:
secretName: vvp-truststore
For example:
helm install vvp ververica/ververica-platform --version ... \
--values ... \
--values <the above values file>
Once Ververica Platform is (re-)started, go to 'Deployments | Artifacts', you should be able to list/upload artifacts. If you see errors popping up on the UI, check the Ververica Platform logs. The error "unable to find valid certification path to requested target" means your truststore is still not configured correctly. Double-check the above steps, make sure you followed them correctly. You can also use a `keytool` CLI command to list the certificates contained in the truststore:
keytool -list -v -keystore vvp.truststore -alias s3cert
4. Configure Ververica Platform Deployment
Once the truststore is configured correctly in Ververica Platform, you can now configure your deployment similarly. Each Flink job pod uses an init container called `artifact-fetcher`. It downloads artifacts from the configured universal blob storage before the Flink JobManager/TaskManager container starts and therefore also needs to be configured to use the correct truststore. You can configure the artifact-fetcher by adding the following into your deployment spec:
spec:
template:
spec:
kubernetes:
jobManagerPodTemplate:
spec:
initContainers:
- env:
- name: JAVA_TOOL_OPTIONS
value: '-Djavax.net.ssl.trustStore=/vvp-truststore/vvp.truststore'
name: artifact-fetcher
volumeMounts:
- mountPath: /vvp-truststore
name: vvp-truststore
volumes:
- name: vvp-truststore
secret:
secretName: vvp-truststore
taskManagerPodTemplate:
spec:
initContainers:
- env:
- name: JAVA_TOOL_OPTIONS
value: '-Djavax.net.ssl.trustStore=/vvp-truststore/vvp.truststore'
name: artifact-fetcher
volumeMounts:
- mountPath: /vvp-truststore
name: vvp-truststore
volumes:
- name: vvp-truststore
secret:
secretName: vvp-truststore
5. (Optional) Configure Checkpointing&Savepointing to S3 Storage
If your deployment is configured to write checkpoints (or savepoints) to S3 storage which sits behind an HTTPS endpoint with a self-signed certificate, apart from the S3 access credentials, you will also need to add the truststore file to both 'flink-jobmanager' and 'flink-taskmanager' containers in the Flink JM & TM pods. Here's an example (based on Step-4):
spec:
template:
spec:
kubernetes:
jobManagerPodTemplate:
spec:
initContainers:
- env:
- name: JAVA_TOOL_OPTIONS
value: '-Djavax.net.ssl.trustStore=/vvp-truststore/vvp.truststore'
name: artifact-fetcher
volumeMounts:
- mountPath: /vvp-truststore
name: vvp-truststore
containers:
- name: flink-jobmanager
env:
- name: JAVA_TOOL_OPTIONS
value: '-Djavax.net.ssl.trustStore=/vvp-truststore/vvp.truststore'
volumeMounts:
- mountPath: /vvp-truststore
name: vvp-truststore
volumes:
- name: vvp-truststore
secret:
secretName: vvp-truststore
taskManagerPodTemplate:
spec:
initContainers:
- env:
- name: JAVA_TOOL_OPTIONS
value: '-Djavax.net.ssl.trustStore=/vvp-truststore/vvp.truststore'
name: artifact-fetcher
volumeMounts:
- mountPath: /vvp-truststore
name: vvp-truststore
containers:
- name: flink-taskmanager
env:
- name: JAVA_TOOL_OPTIONS
value: '-Djavax.net.ssl.trustStore=/vvp-truststore/vvp.truststore'
volumeMounts:
- mountPath: /vvp-truststore
name: vvp-truststore
volumes:
- name: vvp-truststore
secret:
secretName: vvp-truststore
Note: Do not confuse the described approach with this documentation section of Ververica Platform which is applicable when your job jar is located directly behind an HTTPS endpoint with a self-signed certificate, not behind an S3 endpoint. In other words, use the Ververica Platform documentation if your `jarUri` is `https://... ` and follow the steps in this Knowledge Base article if your `jarUri` is`s3://...`.
Related Information
Universal Blob Storage in Ververica Platform
Configure artifact-fetcher to fetch from an HTTPS location with a self-signed certificate