Friday, September 27, 2013

How to redirect standard output to a file

Many times you need to get log details on external files which Talend is not able to redirect or catch using standard Log management, but you know many times error displays on Talend run console but is not caught in error connector. so here is sample code which you can use in your job to redirect error or run console messages to file.

just at the beginning of you job place tJava, and inside a tjava, redirect globally the standard output... insert this code (add buffering if you need):

-------------------------
java.io.File file = new java.io.File("C:/data/mylogfile_test.txt");
java.io.PrintStream ps = new java.io.PrintStream(new java.io.FileOutputStream(file));
System.setErr(ps);
System.setOut(ps);
-------------------------

Thursday, September 26, 2013

JobScript Step by Step 1 - What is Job Script

JobScript is simple text file with .jobscript extension, this file is used by Talend API  to generate Talend Job, in JobScript you can define components, schema, transformations, connections between component. and all the things which can be done using Talend Job designer.

Note: JobScript feature not available in Talend Open Studio. 

JobScript looks like a plain Text having JSON like structure. if you aware of Json then it is very easy to understand JobScript. Talend help center has good explanation on JobScript check once so you have good understanding of JobScript and the terminology we will use.

We will create job script to create a job which will be used for loading CSV data to SQL server with transformations.
so our job will have following components.

  • tFileInputDelimited

  • tMap

  • tMSSQLOutput


below screen you can see a sample JobScript which has exact hierarchy which start with basic setting of job then job parameters, components , and ends with component connections. I have marked those in numbers with block so there are total 3 blocks which i am going explain in detail.

JobScript

Tuesday, September 17, 2013

Create Talend JobScript Step by Step

Talend is great code generator having 200+ connectors,  which gives you ability to transform data from one system to another. Talend is good for mid size organisation where you have to process few MB of data not GB`s and TB`s of data. because having lack of parallel processing, generic schema load model and batch processing features. there are some component and feature available which Talend claims it will give you parallel and batch processing but it fails at certain level. any way we are not going to discuss Talend perhaps we will discuss how can we automate Talend Job creation? instead creating hundreds of jobs for hundreds of metadata?

I am ETL developer and i have been assigned task to create one such job which will be used like generic data loader where metadata will be stored in SQL database tables, and these tables will be used by my job to create schema, apply transformations and then load the SQL, is short Dynamic schema using Talend.

I thought it`s great idea and Talend has Dynamic schema feature, then it can be done in few days. but when i started working on it became nightmare, so finally i dropped the idea of Dynamic schema. because of following reason.

  • Reject Connectore will not work.

  • You can not apply custom transformations during load

  • You have to apply transformation using SQL.

  • Your file must have header row.

  • All the fields loaded with string data type.

  • You have to change data type at the SQL side.

  • No escape character support.

  • SQL Table must present before start the load.

  • Log management will not work.


I communicated with Talend using help center, Talend Forged and after so long found solution which is not Dynamic Schema but Dynamic Job creation using JobScript.  Yes JobScript , it is Json like structure  with nodes and child nodes, properties with values, components and settings, connections, context, routines and many more. every can be define using JobScript.

In next post i will explain what  JobScript exactly, it basic structure and basic things need to create JobScript.

Thursday, August 8, 2013

Syntax error on tokens, delete these tokens

You may face this error when start development using Talend, once you get hands on with Talend  this type of errors will be disappeared because, this error comes whenever you do typo mistakes or any configuration during populating schema from one component to another. if you follow simple steps in working environment you will not face this any more.

  • All string which are set for any component must be closed with double quotes

    • File name and paths

    • line feed string

    • column separator

    • Host names

    • URL

    • Table name

    • Database name and many more



  • if you are using schema for any component then make sure the same schema is assign to next components schema other wise it throw error.

    • tJavaRow is most popular component and many times it throws this error because schema is different than code generated column list.

    • variables are used instead of column names

    • if you create empty row in schema (with no column name)




if you follow these rules you will not this error again.

Sunday, August 4, 2013

How to solve "GC overhead limit exceeded" error

As the names suggested Java try to remove unused object but fail, because it not able to handle so many object created by Talend code generator.  There's simply too much objects being created too fast, and the standard Java GC mechanism (on 1.6 at least) is not able to handle it.

This error may occur during compiling job or at running job, so we have two way to fix.

For Run-Time solution is

  • Opne Run TAB

  • click on Advance Tab

  • dobule click on -Xmas and incrise the size upto GB eg. -Xmas2G or -Xmas-100M

  • double click on -Xmax and incrise the size upto GB eg. -Xmas4G or -Xmas-200M


For runtime Error Configuration of jobs JVM parameters is different from studio jvm startup parameters

In run-time case, you have to add/customize the JVM parameters to your binary.ini file in <TIS Install> directory if you are using The .ini files affect the studio (including compilation of jobs) but not the running of jobs.
For the studio memory, if you run TOS_DI-win-x86_64.exe then you need to modify TOS_DI-win-x86_64.ini.

These two save your life from "GC overhead limit exceeded" error

Saturday, July 6, 2013

NoClassDefFoundError: org/apache/commons/codec/binary/Base64

Some times you may face this error, but may not get the right answer so i though it is better to post with steps to over come this error.

This problem occurs at run-time not at compile time, first search commons-codec.jar file in your talend installed directory like TalendInstall\TOS_DI-Win32-r78327-V5.0.2\plugins\org.talend.designer.components.localprovider_5.0.2.r78327\components\tMDMBulkLoad

if commons-codec.jar file is not on above mention location then download if from  commons-codec.jar file and. now we have our jar file ready so follow the steps to add it to build path.

  • click on Windows menu

  • then click on Preferences

  • search "ClassPath variables" at top left search text box.

  • result will be come then click on ClassPath variables. new popup window will be appeared.

  • Click on new button give name for your ClassPath variable, and then browse for commons-codec.jar and add it. 

  • here on done click on OK button and closed all the windows.


see the below screen for more details.

[caption id="attachment_148" align="alignnone" width="300"]Talend Classpath Setting Talend Classpath Setting[/caption]

Friday, July 5, 2013

Failed to generate error in Talend Open Studio

Here is on of the solution on failed to generate error.

Just after having the error:

  1. - Open the log file ..../[Talend INSTALL Folder]/workspace/.metadata/.log in notepad.

  2. - Go to the end of the file.

  3. - You will have a stack trace like that:


!STACK 0

org.eclipse.emf.codegen.jet.JETException: InvocationTargetException
at org.eclipse.emf.codegen.jet.JETEmitter.generate(JETEmitter.java:475)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:549)
[...]
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:504)
at org.eclipse.equinox.launcher.Main.run(Main.java:1236)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[...]
at org.eclipse.emf.codegen.jet.JETEmitter.generate(JETEmitter.java:467)
... 53 more
Caused by: java.lang.NullPointerException
at org.talend.designer.codegen.translators.processing.TFilterRowMainJava.generate(TFilterRowMainJava.java:134)
... 58 more

  1. - Get the last error in stack trace and find the component raising the error. In my case, the component is TFilterRowMainJava.

  2. - Deactivate all components of the family which raises the error. tFilterRow_1 in my case.

  3. - When I click on "Code" tab, Java source code is displayed.

  4. - Correct errors if exists.

  5. - If you click on run, the job is running...

Thursday, June 27, 2013

Set Encoding to tMySQLOutPut

As we know we have option set encoding for Input Component of MySQL in Talend but there is direct option present set Encoding for Output component. so i thought it is good to share experience with all on how to set Encoding for tMySQLOuput component.

Select tMySQLOuPut component and then in the advanced DB components setting panel, set the Additional JDBC Parameters field to:










"useUnicode=true&characterEncoding=utf8"


see the below example for better view.

http://umeshrakhe.wordpress.com/wp-admin/post.php?post=123&action=edit&message=10


Monday, March 11, 2013

Pulling Twitter Updates Using Talend Open Studio -Part II

In previous post you have seen how to get a user ID`s from twitter API. In this post we will see how can we get the Users details.

Requirement for Demo. 
Talend Open Studio. 
Twitter API Access. 
JDK installed. 

Access to below API.

followers/ids
users/show
statuses/friends
statuses/followers


To get user`s detail we will have to use users/show API so make sure you are able to access this API. Once you  click on mention API you will gate a XML file which has many details, so making it simple and understandable we will take below details from XML, if you want then you can take all the details. 

Create a Job in Talend Open Studio and follow the step for creating mapping/Schema of users/show XML.


Click on Metadata Node, and right Click on File XML node, then click on Create File XML option from Pop up.

Provide mapping name and users/show XML File. 

Once done go to the next Tab and configure all the properties like below screen. and select below listed fields for display.




Our Xpath Loop expression is: /user/status
select below list of field using Ctrl+click and drag and drop to "Fileds to extract" and click on "Refresh Preview" button to make sure you have parsed XML properly. 


created_at
description
es_count
favourites_count
followers_count
following
friends_count
id
location
name
screen_name 
time_zone
usl
verified

Now we have ready our sample file with twitter user details. we have to store this information into CSV file, hence you need to drag and drop tFileOutPutDelimited, Drag and Drop schema mapping we created recently for XML on designer and select tFileInputXML. Connect tFileInputXML to tFileOutPutDelimited using Main connector , and synch source schema to tFileOutPutDelimited component, 

Give the output file path and name other configuration. once done execute the job to sure every thing is working fine. your final job look like below with output. 





Output




Here we have completed two part of Twitter API, one to get the user ID`s and other one is to get user details. 

In next part of this post we will integrate both jobs in single one to retrieve each user id and their details in CSV file. 




Thursday, March 7, 2013

Pulling Twitter Updates Using Talend Open Studio -Part I

Twitter is most popular Micro Blogging site, and people like to get the details of users, events, elements, and followers. We will see how Talend Open Studio help us to automate Twitter user detail scrapping. It 

I splitted this post into 4 parts, so readers can go to the specific topic. 




Requirement for Demo. 
Talend Open Studio. 
Twitter API Access. 
JDK installed. 

Access to below API.

followers/ids
users/show
statuses/friends
statuses/followers

Above 4 API will used to get followers, Friends, and user details, our sample Twitter user name is "pubscode". 

first we will call API to get all the followers Id`s then because we don`t have any other information associated with each Id so we have used these Id`s to get the detail information about each Id,  and each detail information will be stored in .CSV file.


Id`s API will return all the followers Id`s in below XML format so before move ahead we will create Mapping/Schema. 



<id_list>


<next_cursor>0</next_cursor>
<previous_cursor>0</previous_cursor>
</id_list>



It`s a simple XML file so i am skipping part creating XML schema using Talend, and directly jumping on how we can call the API through Talend and stored details in XML file. 

There are various ways to get API call done in Talend so i am explaining you a simplest way which i use.

Create a job with name "Twitter_API" and drop tFileFetch. Select tFileFetch and click on "Component Tab" you will see all the properties of tFileFetch so below screen will help you to configure properties. 



To make sure we have configured all the things properly just run the job and check whether you got the file "pubscode.xml" in your "Destination directory" text box you specified. 

Now we have our sample file ready to process using tFileInputXML 
Drop tFileInputXML component from Palette and  click on "Component Tab" to read XML file. select tFileInputXML component to configure all the required properties as shown in below screen. 



once you configured tFileInputXML Component connect tFileFetch to tFileInputXML using "OnComponentOK" trigger. final job will look like below screen. 





We have downloaded users ID`s in XML format, Next Blog will see, how can we get details of each user and stored their details to .CSV file. 

Wednesday, March 6, 2013

Parse XML from Google Drive Using Talend Open Studio

On this part we will see how we can load XML from Google Drive. I have below XML stored in my won Google Drive account, which is available for Download and view. So first of all please check you are able to download it. 

Required Things For Demo.
Sample XML (Download)
Talend Open Studio Installed

Here is our Sample XMl, you can Download it from above link. 



<?xml version="1.0" encoding="UTF-8"?>
<Itmes>
    <item id="111" clientName="SB">
        <details>
            <detail child_id="1">
                <name>Pen Drive</name>
                <amount>2</amount>
            </detail>
            <detail child_id="2">
                <name>Flash Drive</name>
                <amount>20</amount>
            </detail>
        </details>
        <tags>
            <tag tag_id="1">
                <name>CD</name>
            </tag>
        </tags>
    </item>
    <item id="112" clientName="GJ">
        <details>
            <detail child_id="1">
                <name>Flopy</name>
                <amount>1</amount>
            </detail>
        </details>
        <tags>
            <tag tag_id="1">
                <name>USB Drive</name>
            </tag>
            <tag tag_id="2">
                <name>USB 2.0</name>
            </tag>
        </tags>
    </item>
</Itmes>


  • Create a new job named as "XML_From_GDrive_2_CSV".
  • Create Metadata for above sample XML file using Metadata repository wizard see below screen. 
XML Metdata

  • Once you Created Schema/ Mapping then Drop the tFileInputXML from metadata you created in the Job and make it as Built in. 
  • Drop tFileFetch, tFileOutPutDelimited.
  • Configure tFileFetch property as below screen.
  • Set above XML download path in URI property.



  •  Now configure tFileInputXML with our Download Directory path with file name as below. 


  • Drop tFileOutPutDelimited and configure as mention in below picture.



  • Your final job will be look like this run the application and check for result it will be shown you result we can see here.




  • Output 




Contact Us

Name

Email *

Message *