After many hours of frustration, I was finally able to push messages into Apache Kafka, running on my VirtualBox guest machine, from Windows host. Since I did not find complete steps on the web, I wanted to document them quickly, hoping to save someone's time. Big thanks to my teammate Greg Haskell who helped me finally make it happen.
I wanted to connect to Apache Kafka, installed on my VirtualBox guest machine, so I can publish messages from my Windows guest machine. Same set up can be used to connect multiple guest VMs to emulate a multi-node cluster.
Kafka VM (Linux)
1) Figure out a hostname by typing hostname in shell:
$ hostname bigdatalite.localdomain
2) Edit Kafka broker properties (kafka.properties). If you use Cloudera Kafka distribution via parcels, make sure to add these properties to
Kafka Broker Advanced Configuration Snippet (Safety Valve)
3) Restart Kafka
This page describes the difference between
advertised.listeners. The key for VM is to bind hostname
0.0.0.0 to to all interfaces but tell consumers and producers to use a proper hostname
Windows OS (host)
Edit Windows OS hosts file to add VM hostname from the fist step to assign IP 127.0.0.1 to that hostname:
In VirtualBox, open your VM Network settings and add new Port Forwarding rule for Kafka broker port 9092 (default one, if you have not changed it):
If VM is already running, there is no need to restart it.
Test that you can indeed access that port from Windows OS. For example, you can use telnet command like so:
telnet bigdatalite.localdomain 9092
I created a simple Python program to push some test messages into Kafka from Windows.
# Wanted to use confluent-kafka client but it does not support Windows currently # ended up using the second most popular client https://github.com/dpkp/kafka-python # pip install kafka-python from kafka import KafkaProducer import logging logging.basicConfig(level=logging.DEBUG) import datetime producer = KafkaProducer(bootstrap_servers=['bigdatalite.localdomain']) for _ in range(10): producer.send('testtopic', str(datetime.datetime.now())) producer.flush() # In VM, we can see the messages now with console consumer: # kafka-console-consumer --bootstrap-server localhost:9092 --topic testtopic --from-beginning # # Console producer: # kafka-console-producer --broker-list localhost:9092 --topic testtopic
Once I ran it without errors (yay!!), I headed over to my VM and fired up console consumer to see my messages there:
$ kafka-console-consumer --bootstrap-server localhost:9092 --topic testtopic --from-beginning ... 2018-10-22 14:26:28.597000 2018-10-22 14:26:28.601000 2018-10-22 14:26:28.601000 2018-10-22 14:26:28.602000 2018-10-22 14:26:28.603000 2018-10-22 14:26:28.604000 2018-10-22 14:26:28.604000 2018-10-22 14:26:28.605000 2018-10-22 14:26:28.606000 2018-10-22 14:26:28.606000