Connecting to Kafka on VirtualBox from Windows

After many hours of frustration, I was finally able to push messages into Apache Kafka, running on my VirtualBox guest machine, from Windows host. Since I did not find complete steps on the web, I wanted to document them quickly, hoping to save someone's time. Big thanks to my teammate Greg Haskell who helped me finally make it happen.

Problem

I wanted to connect to Apache Kafka, installed on my VirtualBox guest machine, so I can publish messages from my Windows guest machine. Same set up can be used to connect multiple guest VMs to emulate a multi-node cluster.

Solution

Kafka VM (Linux)

1) Figure out a hostname by typing hostname in shell:

$ hostname
bigdatalite.localdomain  

2) Edit Kafka broker properties (kafka.properties). If you use Cloudera Kafka distribution via parcels, make sure to add these properties to Kafka Broker Advanced Configuration Snippet (Safety Valve)

listeners=PLAINTEXT://0.0.0.0:9092  
advertised.listeners=PLAINTEXT://bigdatalite.localdomain:9092  

3) Restart Kafka

This page describes the difference between listeners and advertised.listeners. The key for VM is to bind hostname 0.0.0.0 to to all interfaces but tell consumers and producers to use a proper hostname bigdatalite.localdomain.

Windows OS (host)

  1. Edit Windows OS hosts file to add VM hostname from the fist step to assign IP 127.0.0.1 to that hostname:

    127.0.0.1 bigdatalite.localdomain

  2. In VirtualBox, open your VM Network settings and add new Port Forwarding rule for Kafka broker port 9092 (default one, if you have not changed it):

  3. If VM is already running, there is no need to restart it.

  4. Test that you can indeed access that port from Windows OS. For example, you can use telnet command like so:

    telnet bigdatalite.localdomain 9092

Test everything

I created a simple Python program to push some test messages into Kafka from Windows.

# Wanted to use confluent-kafka client but it does not support Windows currently
# ended up using the second most popular client https://github.com/dpkp/kafka-python
# pip install kafka-python

from kafka import KafkaProducer  
import logging  
logging.basicConfig(level=logging.DEBUG)  
import datetime

producer = KafkaProducer(bootstrap_servers=['bigdatalite.localdomain'])  
for _ in range(10):  
    producer.send('testtopic', str(datetime.datetime.now()))
producer.flush()

# In VM, we can see the messages now with console consumer:
# kafka-console-consumer --bootstrap-server localhost:9092 --topic testtopic --from-beginning
#
# Console producer:
# kafka-console-producer --broker-list localhost:9092 --topic testtopic

Once I ran it without errors (yay!!), I headed over to my VM and fired up console consumer to see my messages there:

$ kafka-console-consumer --bootstrap-server localhost:9092 --topic testtopic --from-beginning
...
2018-10-22 14:26:28.597000  
2018-10-22 14:26:28.601000  
2018-10-22 14:26:28.601000  
2018-10-22 14:26:28.602000  
2018-10-22 14:26:28.603000  
2018-10-22 14:26:28.604000  
2018-10-22 14:26:28.604000  
2018-10-22 14:26:28.605000  
2018-10-22 14:26:28.606000  
2018-10-22 14:26:28.606000

That's all!

Boris Tyukin

Big Data, BI and Data Warehousing

Orlando, Florida