After many hours of frustration, I was finally able to push messages into Apache Kafka, running on my VirtualBox guest machine, from Windows host. Since I did not find complete steps on the web, I wanted to document them quickly, hoping to save someone's time. Big thanks to my teammate Greg Haskell who helped me finally make it happen.
I wanted to connect to Apache Kafka, installed on my VirtualBox guest machine, so I can publish messages from my Windows guest machine. Same set up can be used to connect multiple guest VMs to emulate a multi-node cluster.
Kafka VM (Linux)
- Figure out a hostname by typing hostname in shell:
$ hostname bigdatalite.localdomain
- Edit Kafka broker properties (kafka.properties). If you use Cloudera Kafka distribution via parcels, make sure to add these properties to
Kafka Broker Advanced Configuration Snippet (Safety Valve)
- Restart Kafka
This page describes the difference between
advertised.listeners. The key for VM is to bind hostname
0.0.0.0 to to all interfaces but tell consumers and producers to use a proper hostname
Windows OS (host)
Edit Windows OS hosts file to add VM hostname from the fist step to assign IP 127.0.0.1 to that hostname:
In VirtualBox, open your VM Network settings and add new Port Forwarding rule for Kafka broker port 9092 (default one, if you have not changed it):
If VM is already running, there is no need to restart it.
Test that you can indeed access that port from Windows OS. For example, you can use telnet command like so:
telnet bigdatalite.localdomain 9092
I created a simple Python program to push some test messages into Kafka from Windows.
# Wanted to use confluent-kafka client but it does not support Windows currently # ended up using the second most popular client https://github.com/dpkp/kafka-python # pip install kafka-python from kafka import KafkaProducer import logging logging.basicConfig(level=logging.DEBUG) import datetime producer = KafkaProducer(bootstrap_servers=['bigdatalite.localdomain']) for _ in range(10): producer.send('testtopic', str(datetime.datetime.now())) producer.flush() # In VM, we can see the messages now with console consumer: # kafka-console-consumer --bootstrap-server localhost:9092 --topic testtopic --from-beginning # # Console producer: # kafka-console-producer --broker-list localhost:9092 --topic testtopic
Once I ran it without errors (yay!!), I headed over to my VM and fired up console consumer to see my messages there:
$ kafka-console-consumer --bootstrap-server localhost:9092 --topic testtopic --from-beginning ... 2018-10-22 14:26:28.597000 2018-10-22 14:26:28.601000 2018-10-22 14:26:28.601000 2018-10-22 14:26:28.602000 2018-10-22 14:26:28.603000 2018-10-22 14:26:28.604000 2018-10-22 14:26:28.604000 2018-10-22 14:26:28.605000 2018-10-22 14:26:28.606000 2018-10-22 14:26:28.606000