I recently started working with Talend as a Customer Success Architect. As such, I help clients comply with the architectural rules and best practices so that they can manage their data strategies with Talend. Before joining Talend, I worked on several implementations of data warehouses using primarily Informatica PowerCenter as ETL tool. Any transition from one technology to another can be a challenge. But, instead of trying to “replicate” the way Talend PowerCenter works, let’s take a step back to understand how Talend works, what it does, and how it differs from PowerCenter. In this blog, I’m going to share with you the experience I’ve learned from
Talend and Informatica PowerCenter: what’s the difference?
These two tools do pretty much the same thing: transfer data from one source to a target, but in different ways. Each of these approaches has advantages. It is important to understand these advantages and disadvantages before designing your ETL job.
The first thing to understand is that even if both tools have a graphical user interface and they extract the data from sources, transform them and load them into a target, their implementation is very different. Talend generates native Java code that allows it to run anywhere. PowerCenter, on the other hand, generates metadata that is stored in an RDBMS repository used by its proprietary engine for execution.
It is important to understand that since Talend is a code generator, it can be run as an ETL engine (running on its own stand-alone server) or ELT (running natively on the target server). Talend-generated Java code can be run on any platform that supports Java: on a server in your data center, in the cloud, or even on your laptop. While both platforms offer components that handle most of the tasks required for data integration, there are situations that require a custom element. When PowerCenter is used, this often translates into custom coding, a painful and inefficient process, in my opinion. Yet in Talend, it is possible to develop your own components in Java and integrate them without difficulty into the Studio. These are important considerations to consider when designing your data integration job.
Download Talend Open Studio for Data Integration
How are my jobs designed?
Another big difference is the way a job is designed. Let’s start with PowerCenter. The first step is to develop a mapping (ie, essentially, a “data flow”). It is at this stage that the mapping between the source and the target, and the transformation logic are defined. Once the mapping is validated and its metadata saved in the repository, the sessions and workflows (the “process flow”) are created. Physical connections to source and target objects are then assigned, tasks are sequenced in the execution order, and error handling / reporting procedures can be implemented.
In Talend, data and process flows are implemented together seamlessly. We design a job that defines the “process flow” using a wide range of components that offer specific features that implement the “data flow”. The “process flow” is implemented using “triggers” and the “data flow” between components using “lines” based on a particular schema.
To better understand, compare the concepts of PowerCenter to those of Talend:
Talend’s PowerCenter repository and project repository contain reusable metadata objects (jobs, database connections, schema definitions, and so on). In Talend, these objects are fully integrated into SVN or Git source code control systems and do not use proprietary source control systems.
Folders organize objects according to their functionality. PowerCenter does not allow subfolders, unlike Talend.
The workflow or job implements the ETL process flow while all connections and dependencies are defined. In Talend, a job represents both the process flow and the data flow.
Combination of a reusable task set in all workflows / jobs. You can use it for reusable codes such as error handling, notifications, or for repeated processes.
PowerCenter defines connections, file locations, and error handling separately during a session. In Talend, on the other hand, the mapping and session function are combined and implemented within a component or set of components linked by a process or data flow.
Talend has a large library of components that support a variety of transformations, such as: ex. one of the most commonly used components, tMap, combines Informatica Expression, Lookup, Router, and Joiner transformations.
In Talend, definitions and schema connections can be hard-coded in each component. But it is highly recommended that best practices be defined in the repository metadata and reused in the components.
Zoom on the interface
Finally, let’s take a look at the Talend Studio interface that is based on Eclipse and try to understand it from the point of view of a PowerCenter developer.
It is in the repository (in PowerCenter: Navigator) that all resources (folders, jobs, schema definitions and connections, parameters and variables) are defined.
It is in the design area (in PowerCenter: Workspace) that jobs are assembled.
The contextual tabs at the bottom allow you to configure and document the components and run the job. They combine several functions contained in the PowerCenter Designer and Workflow Manager tools.
The palette (in PowerCenter: Transform toolbar) is a library of all available components.
The perspective defines the overall structure of the Studio and the organization of its different areas. Each great Talend product offers a different perspective. Not insignificant advantage: a developer does not have to use several tools. The unique user interface found in all products improves productivity.
After working for several years as Informatica Architect, the most important lesson I learned is that the quality of a technology is measured by the good practices on which it is based. Talend is no exception to this rule. If you want to take full advantage of your investment in Talend, you need to implement best practices and observe them as part of your software development lifecycle. Here are some links that can shed some light on Talend’s job design templates and best practices: Part 1 , Part 2 , Part 3 and Part 4 .