Thursday, March 7, 2013

Pulling Twitter Updates Using Talend Open Studio -Part I

Twitter is most popular Micro Blogging site, and people like to get the details of users, events, elements, and followers. We will see how Talend Open Studio help us to automate Twitter user detail scrapping. It 

I splitted this post into 4 parts, so readers can go to the specific topic. 




Requirement for Demo. 
Talend Open Studio. 
Twitter API Access. 
JDK installed. 

Access to below API.

followers/ids
users/show
statuses/friends
statuses/followers

Above 4 API will used to get followers, Friends, and user details, our sample Twitter user name is "pubscode". 

first we will call API to get all the followers Id`s then because we don`t have any other information associated with each Id so we have used these Id`s to get the detail information about each Id,  and each detail information will be stored in .CSV file.


Id`s API will return all the followers Id`s in below XML format so before move ahead we will create Mapping/Schema. 



<id_list>


<next_cursor>0</next_cursor>
<previous_cursor>0</previous_cursor>
</id_list>



It`s a simple XML file so i am skipping part creating XML schema using Talend, and directly jumping on how we can call the API through Talend and stored details in XML file. 

There are various ways to get API call done in Talend so i am explaining you a simplest way which i use.

Create a job with name "Twitter_API" and drop tFileFetch. Select tFileFetch and click on "Component Tab" you will see all the properties of tFileFetch so below screen will help you to configure properties. 



To make sure we have configured all the things properly just run the job and check whether you got the file "pubscode.xml" in your "Destination directory" text box you specified. 

Now we have our sample file ready to process using tFileInputXML 
Drop tFileInputXML component from Palette and  click on "Component Tab" to read XML file. select tFileInputXML component to configure all the required properties as shown in below screen. 



once you configured tFileInputXML Component connect tFileFetch to tFileInputXML using "OnComponentOK" trigger. final job will look like below screen. 





We have downloaded users ID`s in XML format, Next Blog will see, how can we get details of each user and stored their details to .CSV file. 

3 comments:

  1. Waiting For More Posts like this !!It was really Helpful..

    ReplyDelete
  2. How to pass content of csv file which is a single URL to tFileFetch component to download the Data URL, any help thanks

    ReplyDelete
    Replies
    1. You need to design job like
      tFileinputDelimited---tFlowToIterate---Iterate----tFileFetch.

      Now you will have input urls in global variable which is created in flow to iterate component then use that directly to the tFileFetch component.

      Delete

Contact Us

Name

Email *

Message *