Skip to content

Latest commit

 

History

History
66 lines (51 loc) · 3.15 KB

kafka-extraction-namespace.md

File metadata and controls

66 lines (51 loc) · 3.15 KB
id title
kafka-extraction-namespace
Apache Kafka Lookups

Lookups are an experimental feature.

To use this Apache Druid extension, include druid-lookups-cached-global and druid-kafka-extraction-namespace in the extensions load list.

If you need updates to populate as promptly as possible, it is possible to plug into a Kafka topic whose key is the old value and message is the desired new value (both in UTF-8) as a LookupExtractorFactory.

{
  "type":"kafka",
  "kafkaTopic":"testTopic",
  "kafkaProperties":{"zookeeper.connect":"somehost:2181/kafka"}
}
Parameter Description Required Default
kafkaTopic The Kafka topic to read the data from Yes
kafkaProperties Kafka consumer properties. At least"zookeeper.connect" must be specified. Only the zookeeper connector is supported Yes
connectTimeout How long to wait for an initial connection No 0 (do not wait)
isOneToOne The map is a one-to-one (see Lookup DimensionSpecs) No false

The extension kafka-extraction-namespace enables reading from a Kafka feed which has name/key pairs to allow renaming of dimension values. An example use case would be to rename an ID to a human readable format.

The consumer properties group.id and auto.offset.reset CANNOT be set in kafkaProperties as they are set by the extension as UUID.randomUUID().toString() and smallest respectively.

See lookups for how to configure and use lookups.

Limitations

Currently the Kafka lookup extractor feeds the entire Kafka stream into a local cache. If you are using on-heap caching, this can easily clobber your java heap if the Kafka stream spews a lot of unique keys. off-heap caching should alleviate these concerns, but there is still a limit to the quantity of data that can be stored. There is currently no eviction policy.

Testing the Kafka rename functionality

To test this setup, you can send key/value pairs to a Kafka stream via the following producer console:

./bin/kafka-console-producer.sh --property parse.key=true --property key.separator="->" --broker-list localhost:9092 --topic testTopic

Renames can then be published as OLD_VAL->NEW_VAL followed by newline (enter or return)