Showing posts with label Talend Open Studio. Show all posts
Showing posts with label Talend Open Studio. Show all posts

Friday, January 9, 2015

Read XML with Optional Elements

This post I will describe how to parse XML with optional element.

We will use below source xml file which has three customer details, along with awards details, and <CUSTOMERAWARDS> is a optional xml element.

Sample XML file
Sample XML File

We will parse this file using tXMLMap component. so fist of all add tFileInputXML and configure as below.

  • Assign source file path

  • Create single column in schema named as

  • Create CUSTOMERS column with "Document" data type in schema.

  • Put loop Xpath query = "/CUSTOMERS"

  • In Mapping section add XPath Query ="."

  • Select Get Nodes check box.

Add tXMLMap component and connect with tFileInputXML component using Main link and create source tree structure as shown in image.
Note: You can create create sub elements manually or  it can be  populated from XSD file or from repository.

Add two Outputs and drag and drop relevant source columns to output (Refer image).



tXMLMap Configuration
tXMLMap Configuration

Click on first output`s "set loop function" short menu and add one sequence then select xpath = customerid xpath, see the image for more details.


tXml Map First Output
tXml Map First Output

Our first Output is ready now you have to configure second output so follow the steps we did for first output and select xpath= customerawards, see the image for more details.



tXml Map Second Output
tXml Map Second Output

Add tlogrow for each output and then execute the job you will see output like below. If you observe, customer id 1236 it has no awards extracted but customer id 1234 and 1235 awards extracted completely.



OutPut
Out Put

Difference between tMap and tJoin

tMap is frequently used component for joins and lookup purpose, it is also use for verity of operations and transformations, whereas tJoin is used for join and look-ups only.


tMap


tJoin


It accepts more than one input one is main and rests of the lookups.


It accepts only two inputs and only one is main and other one is lookup.


We can create more than one output


It has two default outputs one is "Main" and another one is " Inner join reject"


tMap has "inner join " and " left outer join" joining model


tJoin offer`s only "inner join"


tMap offers three match model


  1. Unique Match
  2. First Match
  3. All Matches


tJoin defaulted with Unique match


tMap allows to store data on file option for lookup data processing


tJoin doesn`t offer this feature


In tMap you can filter data using filter expression


tJoin doesn`t offer this feature


You can write transformation using expression builder at each column level


tJoin doesn`t offer this feature

Split Rows to Columns

This post I will describe you how to split rows into columns, we will use below sample as input records.

Input Rows.

Input Rows
Input Rows

Expected Output.

out put
Out Put

Create a Job and add tFixedFlowInput component and  put above input as "Use inline content" and create schema as shown in image.

Input Schema
Input Schema

Add tPivotToColumnsDelimited  component and connect with tFixedFlowInput component as main connection then configured this component shown in below image.

tPivot component Configuration
tPivot component Configuration

Configurations :

Pivot Column ="Type"

Aggregation column="Value"

Aggregation Function ="last"

Group by "ID" and "Name" column.

Rest of the configuration is for output file, where our output will be transferred. to read output file we can use either delimited component but for quick review I`ll use tFileInputFullRow.

Add tFileInputFullRow below the tFixedFlowInput component and connect with "On Sub Job Ok" trigger. and provide previously created file path and rest of the details.

add tLogRow and connect to tFileInputFullRow component and execute the job you will get above out put on console.

Final Job Design.

Job with OutPut
Job with OutPut

This component will create N number of columns based on your input, if you are dealing with fix schema then it will create complexity for further processing.

Thursday, January 1, 2015

Split large XML into multiple XML

In this post, I will describe you how to split large XML into several xml.

Here is our Sample XML file. ( which is not huge but just a sample)

Split Xml Talend

We are expecting three XML files from sample xml hence lets start with metadata creation for this sample file.

Once you created metadata then you can drag and drop schema to job designer. for the scenario we will choose tFileInputXML component.

Now add another component tXMLMap and link tFileInputXML to tAdvancedFileOutputXml then configure tAdvancedFileOutputXml as shown in image.

tAdvanceOutputXMl Mapping

Now we have mapped our source column to output columns, but it will output all the rows in single file, to create a file for each row we have to configured tAdvancedFileOutputXML component using Advance property of component tab. use "Spit output in Several files" option with value as "1". by doing this it will create new file for each row.

tAdvancedOutputXML Setting

After run, this job will create three files on mention path like below.

Output Xml Files

And here is the final output.

Output Xml Files splitxml6 splitxml7

Friday, December 19, 2014

Creating XML metadata

This post i will show you how to create XML metadata using Talend and simple use of tFileinputXML component.

List of topics covering in this post.

  1. Simple XMl Parsing using tFileInputXML.

    1. MetatData Creation.

    2. XPATH finding.

Prerequisite

  1. Talend Open Studio

  2. XML File.

Source XML file for demonstration.

<?xml version="1.0"?>
<PurchaseOrder PurchaseOrderNumber="99503" OrderDate="1999-10-20">
<Address Type="Shipping">
<Name>Ellen Adams</Name>
<Street>123 Maple Street</Street>
<City>Mill Valley</City>
<State>CA</State>
<Zip>10999</Zip>
<Country>USA</Country>
</Address>
<Address Type="Billing">
<Name>Tai Yee</Name>
<Street>8 Oak Avenue</Street>
<City>Old Town</City>
<State>PA</State>
<Zip>95819</Zip>
<Country>USA</Country>
</Address>
<DeliveryNotes>Please leave packages in shed by driveway.</DeliveryNotes>
<Items>
<Item PartNumber="872-AA">
<ProductName>Lawnmower</ProductName>
<Quantity>1</Quantity>
<USPrice>148.95</USPrice>
<Comment>Confirm this is electric</Comment>
</Item>
<Item PartNumber="926-AA">
<ProductName>Baby Monitor</ProductName>
<Quantity>2</Quantity>
<USPrice>39.98</USPrice>
<ShipDate>1999-05-21</ShipDate>
</Item>
</Items>
</PurchaseOrder>

Step 1: Create XML Metadata using Talend.

  1.  Go to the Talend repository Metadata node and right click on File XML .
    Create XML Metdata Talend 

  2. Provide valid name for metadata.

  3. In third Step we have to find out which node should be our XPATH LOOP Expression? for that you can find out the leaf node or last node of XML, in above file <Item> is the leaf node, so we will take that one. see the screen for mapping.



XML Metatdata 

  1. Just Drag and drop all required columns from XML to see above

  2. Once column selection completes then Click on preview button to review the extracted result.

  3. If every thing looks fine then proceed with finish button which will take you to the next screen with generated schema, here you can change data type, column name, length if required.

Generated XML Metadata

Now our XML metadata ready to use, then just drag and drop xml schema to job design it will ask two component

  1. tFileinputXML.

  2. tExtractXMLField.

For now we will use tFileinputXML to extract XML data, lets drop that.

Connect tlogRow to tFileinputXML component and execute the job you see the result, if result not displayed then repeat above steps.

This post we have created

  • XML Metadata

  • Simple XML parsing.

  • Simple use of tFileinputXML component.

Monday, August 25, 2014

Set/change Workspace default location

To set this workspace location, edit "config.ini" file in "configuration" folder at root of Talend installation folder.
In this file you can use variable "osgi.instance.area" or "osgi.instance.area.default" to specify your workspace location.
Follow below steps to complete this activity.

Step 1. Go to the Talend Installed directory then configuration folder like C:\Talend\TOS_DI-Win32-........\configuration.


Step 2. Open config.ini file.

Step 3. Write down below code at first line of file.

osgi.instance.area=provide your workspace path.

e.g.

osgi.instance.area=C:/Talend/TOS_DI-Win32...../workspace

Step 4. Save config.ini file keep open.

Step 5. Restart Talend, once you see first window then remove above line of code, save and close the config.ini file.

 

Friday, September 27, 2013

How to redirect standard output to a file

Many times you need to get log details on external files which Talend is not able to redirect or catch using standard Log management, but you know many times error displays on Talend run console but is not caught in error connector. so here is sample code which you can use in your job to redirect error or run console messages to file.

just at the beginning of you job place tJava, and inside a tjava, redirect globally the standard output... insert this code (add buffering if you need):

-------------------------
java.io.File file = new java.io.File("C:/data/mylogfile_test.txt");
java.io.PrintStream ps = new java.io.PrintStream(new java.io.FileOutputStream(file));
System.setErr(ps);
System.setOut(ps);
-------------------------

Thursday, September 26, 2013

JobScript Step by Step 1 - What is Job Script

JobScript is simple text file with .jobscript extension, this file is used by Talend API  to generate Talend Job, in JobScript you can define components, schema, transformations, connections between component. and all the things which can be done using Talend Job designer.

Note: JobScript feature not available in Talend Open Studio. 

JobScript looks like a plain Text having JSON like structure. if you aware of Json then it is very easy to understand JobScript. Talend help center has good explanation on JobScript check once so you have good understanding of JobScript and the terminology we will use.

We will create job script to create a job which will be used for loading CSV data to SQL server with transformations.
so our job will have following components.

  • tFileInputDelimited

  • tMap

  • tMSSQLOutput


below screen you can see a sample JobScript which has exact hierarchy which start with basic setting of job then job parameters, components , and ends with component connections. I have marked those in numbers with block so there are total 3 blocks which i am going explain in detail.

JobScript

Tuesday, September 17, 2013

Create Talend JobScript Step by Step

Talend is great code generator having 200+ connectors,  which gives you ability to transform data from one system to another. Talend is good for mid size organisation where you have to process few MB of data not GB`s and TB`s of data. because having lack of parallel processing, generic schema load model and batch processing features. there are some component and feature available which Talend claims it will give you parallel and batch processing but it fails at certain level. any way we are not going to discuss Talend perhaps we will discuss how can we automate Talend Job creation? instead creating hundreds of jobs for hundreds of metadata?

I am ETL developer and i have been assigned task to create one such job which will be used like generic data loader where metadata will be stored in SQL database tables, and these tables will be used by my job to create schema, apply transformations and then load the SQL, is short Dynamic schema using Talend.

I thought it`s great idea and Talend has Dynamic schema feature, then it can be done in few days. but when i started working on it became nightmare, so finally i dropped the idea of Dynamic schema. because of following reason.

  • Reject Connectore will not work.

  • You can not apply custom transformations during load

  • You have to apply transformation using SQL.

  • Your file must have header row.

  • All the fields loaded with string data type.

  • You have to change data type at the SQL side.

  • No escape character support.

  • SQL Table must present before start the load.

  • Log management will not work.


I communicated with Talend using help center, Talend Forged and after so long found solution which is not Dynamic Schema but Dynamic Job creation using JobScript.  Yes JobScript , it is Json like structure  with nodes and child nodes, properties with values, components and settings, connections, context, routines and many more. every can be define using JobScript.

In next post i will explain what  JobScript exactly, it basic structure and basic things need to create JobScript.

Thursday, August 8, 2013

Syntax error on tokens, delete these tokens

You may face this error when start development using Talend, once you get hands on with Talend  this type of errors will be disappeared because, this error comes whenever you do typo mistakes or any configuration during populating schema from one component to another. if you follow simple steps in working environment you will not face this any more.

  • All string which are set for any component must be closed with double quotes

    • File name and paths

    • line feed string

    • column separator

    • Host names

    • URL

    • Table name

    • Database name and many more



  • if you are using schema for any component then make sure the same schema is assign to next components schema other wise it throw error.

    • tJavaRow is most popular component and many times it throws this error because schema is different than code generated column list.

    • variables are used instead of column names

    • if you create empty row in schema (with no column name)




if you follow these rules you will not this error again.

Sunday, August 4, 2013

How to solve "GC overhead limit exceeded" error

As the names suggested Java try to remove unused object but fail, because it not able to handle so many object created by Talend code generator.  There's simply too much objects being created too fast, and the standard Java GC mechanism (on 1.6 at least) is not able to handle it.

This error may occur during compiling job or at running job, so we have two way to fix.

For Run-Time solution is

  • Opne Run TAB

  • click on Advance Tab

  • dobule click on -Xmas and incrise the size upto GB eg. -Xmas2G or -Xmas-100M

  • double click on -Xmax and incrise the size upto GB eg. -Xmas4G or -Xmas-200M


For runtime Error Configuration of jobs JVM parameters is different from studio jvm startup parameters

In run-time case, you have to add/customize the JVM parameters to your binary.ini file in <TIS Install> directory if you are using The .ini files affect the studio (including compilation of jobs) but not the running of jobs.
For the studio memory, if you run TOS_DI-win-x86_64.exe then you need to modify TOS_DI-win-x86_64.ini.

These two save your life from "GC overhead limit exceeded" error

Saturday, July 6, 2013

NoClassDefFoundError: org/apache/commons/codec/binary/Base64

Some times you may face this error, but may not get the right answer so i though it is better to post with steps to over come this error.

This problem occurs at run-time not at compile time, first search commons-codec.jar file in your talend installed directory like TalendInstall\TOS_DI-Win32-r78327-V5.0.2\plugins\org.talend.designer.components.localprovider_5.0.2.r78327\components\tMDMBulkLoad

if commons-codec.jar file is not on above mention location then download if from  commons-codec.jar file and. now we have our jar file ready so follow the steps to add it to build path.

  • click on Windows menu

  • then click on Preferences

  • search "ClassPath variables" at top left search text box.

  • result will be come then click on ClassPath variables. new popup window will be appeared.

  • Click on new button give name for your ClassPath variable, and then browse for commons-codec.jar and add it. 

  • here on done click on OK button and closed all the windows.


see the below screen for more details.

[caption id="attachment_148" align="alignnone" width="300"]Talend Classpath Setting Talend Classpath Setting[/caption]

Friday, July 5, 2013

Failed to generate error in Talend Open Studio

Here is on of the solution on failed to generate error.

Just after having the error:

  1. - Open the log file ..../[Talend INSTALL Folder]/workspace/.metadata/.log in notepad.

  2. - Go to the end of the file.

  3. - You will have a stack trace like that:


!STACK 0

org.eclipse.emf.codegen.jet.JETException: InvocationTargetException
at org.eclipse.emf.codegen.jet.JETEmitter.generate(JETEmitter.java:475)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:549)
[...]
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:504)
at org.eclipse.equinox.launcher.Main.run(Main.java:1236)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[...]
at org.eclipse.emf.codegen.jet.JETEmitter.generate(JETEmitter.java:467)
... 53 more
Caused by: java.lang.NullPointerException
at org.talend.designer.codegen.translators.processing.TFilterRowMainJava.generate(TFilterRowMainJava.java:134)
... 58 more

  1. - Get the last error in stack trace and find the component raising the error. In my case, the component is TFilterRowMainJava.

  2. - Deactivate all components of the family which raises the error. tFilterRow_1 in my case.

  3. - When I click on "Code" tab, Java source code is displayed.

  4. - Correct errors if exists.

  5. - If you click on run, the job is running...

Thursday, June 27, 2013

Set Encoding to tMySQLOutPut

As we know we have option set encoding for Input Component of MySQL in Talend but there is direct option present set Encoding for Output component. so i thought it is good to share experience with all on how to set Encoding for tMySQLOuput component.

Select tMySQLOuPut component and then in the advanced DB components setting panel, set the Additional JDBC Parameters field to:










"useUnicode=true&characterEncoding=utf8"


see the below example for better view.

http://umeshrakhe.wordpress.com/wp-admin/post.php?post=123&action=edit&message=10


Contact Us

Name

Email *

Message *