Moises Trelles Blog - Sample Post

I will create a storm topology to read data via tweets "twitter API" for java. This topology will look at the content of tweets to found the key words related to the disciplines of current Olympic Games "Rio 2016". In this example I worked with the HDP (Hortonworks Data Platform) 2.4 Sandbox.

Step 1: Start Storm Service by Ambari

Go to ambari page and start Storm Service in the cluster.

Step 2: Twitter application

The Twitter API use OAuth broadcast to authorize requests. To use OAuth, the first step is to create a new application on the Twitter Developer site. To create the ID and the secrets of a Twitter application:

I connect to Twitter Applications by sign into my Twitter account.

Click Create New App.

Enter a Name, Description and Website. The name of the Twitter application must be unique. Website field is not actually used. It is not necessary that it be a valid URL.

Select the check box Yes, I agree, and click Create your Twitter application.

Click the Permissions tab. The default permission is Read only. These steps are sufficient for this exercise.

Click the Access Keys and Tokens tab.

Click Create my access token.

Click OAuth Test in the upper right corner of the page.

Fill in the values of key consumer, Secret Question of the client, access and secret question of the access token Token.

Step 3: Topology Creation

I decided to create a simple topology. This topology is going to read data via tweets "twitter API" for java and in a second time it will found in the content of tweets the key words related to the disciplines of current Olympic Games "Rio 2016". Once treatment has recovered the searched keywords (disciplines in the Olympics) it will count the number of occurrences for each word to generate a ranking of the most popular sports I have used in my topology one Spout and 4 Bolts. Here is a simplified diagram of the topology :

Spout: "Twitter Spout" Reads Twitter's sample feed using the twitter4j library.

Bolt: "WordSplitterBolt» Receives tweets and emits its words over a certain length.

Bolt: "FilterWords Bolt" Bolt filter a predefined set of words about the disciplines of Olympic Games Rio 2016.

Bolt: " OlympicWordsCounterBolt" Keeps stats on word count, calculates and logs top words about the Olympics disciplines every X second to stdout and top list every Y seconds.

Bolt: "HdfsBoltWriter" Write the ranking of disciplines in Hdfs: /path/in/hdfs

Here is an excerpt of the Java code.