Issue
When stopping a deployment on Ververica Platform, during a Suspend, Cancel, or modification to your deployment, the previous job does not properly shut down. This may materialize as:
- Unusually long shutdowns
- Task managers appear to restart before being killed
For instance, you may notice that taskmanager logs show it moves from Running to Canceled and then back to Running.
Environment
- Ververica Platform 2.0 or later.
Cause
Your user code contains classes from the Flink runtime and you are experiencing classpath errors where the incorrect class is being loaded into the job's classloader.
You can verify that there are runtime classes in your packaged code by running the following commands.
jar tf user-code.jar | grep org/apache/flink/runtime/
Resolution
Ensure that you are not packaging any Flink runtime classes in your artifact by marking all core Flink dependencies such as flink-java and flink-streaming-java as `provided` in your build. See How to package my jar for deployment for more information.