Question
Should I call uid()
after addSource()
or addSink()
? Even when I add a Kafka source or sink?
Answer
Note: This applies to Flink 1.8 and later.
Sources and sinks are also operators, although they are—as such—not listed in the Flink documentation. Sources and sinks may also be stateful operators. In this case, a Kafka source (consumer) is storing its partition offsets and an at-least-once or exactly-once Kafka sink (producer) is storing information on Kafka transactions in state.
To make sure your job can recover from a savepoint smoothly, even after job topology changes, we thus recommend to always set a uid()
after addSource()
or addSink()
, for example:
env.addSource(...).uid("my-operator-id");
Tip: We recommend to set a uid()
for every operator in your job, even for potentially stateless ones which are not part of a savepoint. This is because it may not be obvious whether an operator is stateful or not. It may just use state internally, like a window operator or the Kafka source and sink.