Question
We are currently using the default SQLite persistence for storing Ververica Platform metadata. How could we migrate this metadata storage to a remote persistence database such as the Postgres database with minimal downtime?
Answer
Note: Before making this migration in a production environment, always test this migration approach in a test environment first.
Ververica Platform persists its metadata using JDBC, either in a remote RDBMS or locally using SQLite. The default option is storing in an internal SQLite Database. The recommendation for a production environment is to store it in a remote persistence metastore. Currently, the Ververica platform supports MariaDB/MySQL, PostgreSQL, and Microsoft SQL Server Persistence. But for the scope of this article, we will focus on Postgres only.
Below is the step-by-step process to manage this migration smoothly with minimal downtime to running Flink jobs.
1. Create an empty Postgres database named `vvp` to store the persistence data.
2. Create the necessary tables in the target database created in step 1 by running the following command.
Important: Put your Postgres database credentials and Ververica Image repository values into the below command.
docker run --rm <vvp-image-registry>/vvp-persistence:<vvp-version> \
--url="jdbc:postgresql://<host>:<port>/<database-bame>" \
--username="<database-username>" --password="<database-password>" \
update
This will allow you to initialize all necessary tables and configurations in the remote persistence Postgres database.
Tip: The docker image used in the command is x86_64 based. If you run this on Mac M1 or M2, you might face issues due to docker using an ARM image. Run docker in colima with x86_64 arch to avoid this issue. Use this command `colima start --cpu 4 --memory 8 --arch x86_64`
3. To download the existing SQLite DB from the Ververica platform, we need to locate the pod where the Ververica platform is running. To find out the pod, run this command.
export VVP_POD=$(kubectl get pods -n vvp -o=name | grep "vvp-ververica-platform" | sed "s/^.\{4\}//")
echo $VVP_POD
This will give you your Kubernetes pod where the Ververica platform is running.
4. To download the SQLite database, run the following command.
kubectl cp -n vvp $VVP_POD:/vvp/data/vvp.db vvp.db
kubectl cp -n vvp $VVP_POD:/vvp/data/vvp.db-shm vvp.db-shm
kubectl cp -n vvp $VVP_POD:/vvp/data/vvp.db-wal vvp.db-wal
This should download all SQLite relevant files in your current directory.
5. Now that you have extracted the SQLite database to be migrated, confirm the existing tables. Open the database by the following command.
sqlite3 vvp.db
Check the existing tables in the SQLite database by the following command in the SQLite3 shell.
.tables
It should look like this.
6. Delete the two tables ( `DATABASECHANGELOG` and `DATABASECHANGELOGLOCK` ) by the following command in the sqlite3 shell.
drop table DATABASECHANGELOG; drop table DATABASECHANGELOGLOCK;
Confirm that the tables have been deleted. It should look like this.
Exit the in the sqlite3 shell by using this command
.quit
7. Now, you can import the SQLite database to the new Postgres database created in Step 1. For this, you can use pgloader and run the following command:
Important: Put the absolute path to your SQLite database file /xxx/xxx/xx/vvp.db and your Postgres database credentials into the below command.
pgloader sqlite:///<absolute/path/to/file-vvp.db> \
postgresql://<database-username>:<database-password>@<postgres-host-endpoint>:<port>/<database-name>
Tip: Detailed steps on SQLite -> Postgres migration using pgloader can be found in this link. You can also use other methods to move the data from SQLite to Postgres. Some alternatives are mentioned here.
8. Now, you must update connection details in the original Ververica helm chart through the `values.yaml` file. Initially, you should see a configuration with the local type.
vvp:
persistence:
type: local
Change the above variable from local to type jdbc and the connection details below. This is an example of how the Postgres persistence could look like.
Important: Put your Postgres database credentials into the below command.
vvp:
persistence:
type: jdbc
datasource:
url: jdbc:postgresql://<database-host>:<database-port>/<database-name>
username: <database-username>
password: <database-password>
Use the values created in step 1.
9. Re-deploy and run the Ververica Platform using the `helm upgrade` command. You can use the values file you updated in step 7 in the below command.
helm upgrade --install vvp ververica/ververica-platform \
--namespace vvp \
--values <values-vvp.yaml>
Once the Ververica platform runs, it will use the new Postgres database as its persistence layer. This should automatically find the existing Flink jobs running. This way, you change the persistence without impacting the running of Flink jobs.