Should I call
addSink()? Even when I add a Kafka source or sink?
Note: This applies to Flink 1.8+
Sources and sinks are also operators, although they are—as such—not listed in the Flink documentation. Sources and sinks may also be stateful operators. In this case, a Kafka source (consumer) is storing its partition offsets and an at-least-once or exactly-once Kafka sink (producer) is storing information on Kafka transactions in state.
To make sure your job can recover from a savepoint smoothly, even after job topology changes, we thus recommend to always set a
addSink(), for example:
Tip: We recommend to set a
uid() for every operator in your job, even for potentially stateless ones which are not part of a savepoint. This is because it may not be obvious whether an operator is stateful or not. It may just use state internally, like a window operator or the Kafka source and sink.