Connecting to Kafka on VirtualBox from Windows
After many hours of frustration, I was finally able to push messages into Apache Kafka, running on my VirtualBox guest machine, from Windows host. Since I did not find complete steps on the web, I wanted to document them quickly, hoping to save someone's time. Big thanks to my teammate Greg Haskell who helped me finally make it happen.
Problem
I wanted to connect to Apache Kafka, installed on my VirtualBox guest machine, so I can publish messages from my Windows guest machine. Same set up can be used to connect multiple guest VMs to emulate a multi-node cluster.
Solution
Kafka VM (Linux)
- Figure out a hostname by typing hostname in shell:
$ hostname
bigdatalite.localdomain
- Edit Kafka broker properties (kafka.properties). If you use Cloudera Kafka distribution via parcels, make sure to add these properties to
Kafka Broker Advanced Configuration Snippet (Safety Valve)
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://bigdatalite.localdomain:9092
- Restart Kafka
This page describes the difference between listeners
and advertised.listeners
. The key for VM is to bind hostname 0.0.0.0
to to all interfaces but tell consumers and producers to use a proper hostname bigdatalite.localdomain
.
Windows OS (host)
-
Edit Windows OS hosts file to add VM hostname from the fist step to assign IP 127.0.0.1 to that hostname:
127.0.0.1 bigdatalite.localdomain
-
In VirtualBox, open your VM Network settings and add new Port Forwarding rule for Kafka broker port 9092 (default one, if you have not changed it):
-
If VM is already running, there is no need to restart it.
-
Test that you can indeed access that port from Windows OS. For example, you can use telnet command like so:
telnet bigdatalite.localdomain 9092
Test everything
I created a simple Python program to push some test messages into Kafka from Windows.
# Wanted to use confluent-kafka client but it does not support Windows currently
# ended up using the second most popular client https://github.com/dpkp/kafka-python
# pip install kafka-python
from kafka import KafkaProducer
import logging
logging.basicConfig(level=logging.DEBUG)
import datetime
producer = KafkaProducer(bootstrap_servers=['bigdatalite.localdomain'])
for _ in range(10):
producer.send('testtopic', str(datetime.datetime.now()))
producer.flush()
# In VM, we can see the messages now with console consumer:
# kafka-console-consumer --bootstrap-server localhost:9092 --topic testtopic --from-beginning
#
# Console producer:
# kafka-console-producer --broker-list localhost:9092 --topic testtopic
Once I ran it without errors (yay!!), I headed over to my VM and fired up console consumer to see my messages there:
$ kafka-console-consumer --bootstrap-server localhost:9092 --topic testtopic --from-beginning
...
2018-10-22 14:26:28.597000
2018-10-22 14:26:28.601000
2018-10-22 14:26:28.601000
2018-10-22 14:26:28.602000
2018-10-22 14:26:28.603000
2018-10-22 14:26:28.604000
2018-10-22 14:26:28.604000
2018-10-22 14:26:28.605000
2018-10-22 14:26:28.606000
2018-10-22 14:26:28.606000
That's all!