Each sink connector in Kafka Connect has its own consumer group, with the offset persisted in Kafka itself (pretty clever, right). This is also why if you delete a connector and recreate it with the same name you’ll find it starts from where the previous instance got to.

You can view consumer groups using the kafka-consumer-groups command:

$ kafka-consumer-groups \ --bootstrap-server kafka:29092 \ --list connect-sink_postgres_00 _confluent-ksql-confluent_rmoff_01query_CSAS_JDBC_POSTGRES_TRANSACTIONS_GBP_2 _confluent-ksql-confluent_rmoff_01query_CSAS_JDBC_POSTGRES_TRANSACTIONS_NO_CUSTOMERID_1 connect-sink_postgres_foo_00 connect-SINK_ES_04 _confluent-ksql-confluent_rmoff_01transient_2925897355317205962_1571058964212 _confluent-controlcenter-5-4-0-1 connect-SINK_ES_03 _confluent-controlcenter-5-4-0-1-command connect-SINK_ES_02 connect-SINK_ES_01

There are various ones there, but we’re interested in the one with a connect- prefix that matches our connector name, connect-sink_postgres_foo_00

$ kafka-consumer-groups \ --bootstrap-server kafka:29092 \ --describe \ --group connect-sink_postgres_foo_00 Consumer group 'connect-sink_postgres_foo_00' has no active members. GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID connect-sink_postgres_foo_00 foo 0 1 3 2 - - -

You can see from this that the current offset is 1, and there are two more messages to be read (one of which is the 'poison-pill').

kafkacat is a fantastic tool for this kind of debugging, because we can directly relate offsets with the messages themselves:

$ kafkacat -b localhost:9092 -t foo -C -f 'Offset: %o

Payload: %s

--

' Offset: 0 Payload: { "schema" : { "type" : "struct" , "fields" : [{ "type" : "int32" , "optional" : false, "field" : "c1" } , { "type" : "string" , "optional" : false, "field" : "c2" } , { "type" : "int64" , "optional" : false, "name" : "org.apache.kafka.connect.data.Timestamp" , "version" : 1, "field" : "create_ts" } , { "type" : "int64" , "optional" : false, "name" : "org.apache.kafka.connect.data.Timestamp" , "version" : 1, "field" : "update_ts" }] , "optional" : false, "name" : "foobar" } , "payload" : { "c1" : 10000, "c2" : "bar" , "create_ts" : 1501834166000, "update_ts" : 1501834166000 } } -- Offset: 1 Payload: { "schema" : { "type" : "struct" , "fields" : [{ "type" : "int32" , "optional" : false, "field" : "c1" } , { "type" : "int64" , "optional" : false, "name" : "org.apache.kafka.connect.data.Timestamp" , "version" : 1, "field" : "create_ts" } , { "type" : "int64" , "optional" : false, "name" : "org.apache.kafka.connect.data.Timestamp" , "version" : 1, "field" : "update_ts" }] , "optional" : false, "name" : "foobar" } , "payload" : { "c1" : 10000, "create_ts" : 1501834166000, "update_ts" : 1501834166000 } } -- Offset: 2 Payload: { "schema" : { "type" : "struct" , "fields" : [{ "type" : "int32" , "optional" : false, "field" : "c1" } , { "type" : "string" , "optional" : false, "field" : "c2" } , { "type" : "int64" , "optional" : false, "name" : "org.apache.kafka.connect.data.Timestamp" , "version" : 1, "field" : "create_ts" } , { "type" : "int64" , "optional" : false, "name" : "org.apache.kafka.connect.data.Timestamp" , "version" : 1, "field" : "update_ts" }] , "optional" : false, "name" : "foobar" } , "payload" : { "c1" : 10001, "c2" : "bar2" , "create_ts" : 1501834166000, "update_ts" : 1501834166000 } } -- % Reached end of topic foo [ 0 ] at offset 3

So at offset 0 is the good message which Connect read, thus the current offset is 1. When the connector restarts from its failure it will be at offset 1, which is the 'bad' message. The end of the topic currently is offset 3, i.e. the position after the third message which is at offset 2 (zero-based offsets).

What we want to do is tell Kafka Connect to resume from the next-good message, which we can see from kafkacat above is at offset 2.

kafka-consumer-groups \ --bootstrap-server kafka:29092 \ --group connect-sink_postgres_foo_00 \ --reset-offsets \ --topic foo \ --to-offset 2 \ --execute

GROUP TOPIC PARTITION NEW-OFFSET connect-sink_postgres_foo_00 foo 0 2

Now we can restart the failed task:

curl -X POST http://localhost:8083/connectors/sink_postgres_foo_00/tasks/0/restart

and this time the connector stays running:

$ curl -s "http://localhost:8083/connectors?expand=info&expand=status" | \ jq '. | to_entries[] | [ .value.info.type, .key, .value.status.connector.state,.value.status.tasks[].state,.value.info.config."connector.class"]|join(":|:")' | \ column -s : -t| sed 's/\"//g' | sort sink | sink_postgres_foo_00 | RUNNING | RUNNING | io.confluent.connect.jdbc.JdbcSinkConnector

and in Postgres we get the new rows of data (except for the bad one, which is lost to us):