Wednesday, 29 June 2011

Informatica B2B Data Transformation: Basic concepts

Well, I am returning to this topic after a long time. Looking at the stats of this blog, I realized that many people are visiting mainly because of the informatica B2B post here. So, I have decided to continue with it.
In this post, I plan to cover the next level basic components in a Data transformation(DT) project (the file has a format of .cmw). In a bird's eye view, the DT project can comprise of three components: Parser, Mapper, Serializer. 

A Parser is a component which converts raw unstructured (non-relational) data into structured XML data. For ex: It can take an excel file as an input, and put the required excel cell data into the target XML.
A Mapper is a component which helps in mapping the data from one XML structure to another XML structure. This is generally used after the raw unstructured data has been brought into a staging XML structure. Once, the data is in XML, if we need to refine the structure more, we can use a mapper.
A Serializer is a component which prints the XML data into a file of any other format (.txt , don't know about other formats as I have not tried them yet). It does the opposite of a Parser, and can be used anywhere in the project. Sometimes to generate text files to check statuses of few variables or a final text file which contains some information (Ex: Invoice ).

Now, to implement all of the above three components, we need few internal and global components. In any programming language, we use variables, data types, data structures, built in functions to build bigger and customized objects. In the same way, here we make use of "Anchors". As the name suggests, an Anchor is an object which helps keep track of the objects we are dealing with- in this case our input files and XML schema. There are many types of Anchors but the two most basic and most important ones are - Marker and Content
A Marker is an anchor which searches for patterns within the input file, and moves the cursor to that position. Say, we are looking for the word "apple" in the input file. Then, if we build a marker which searches for "apple", it will go and find the first "apple" that appears in the file, and place the cursor just after the letter 'e' i.e. after the word apple. This is the default operation of a Marker, but we can customize it to place the cursor in the beginning also. This can be done with the help of Marking property. It can have four values: None (no marking, the cursor remains at the beginning of the file), Begin (begin position), End(end position) and Full(which marks the whole word and places the cursor at the end). 
A Content is an anchor which is used to capture the relevant data between various Marker Anchors. You can imagine the Marker Anchors as borders, and Content Anchors as anything within those borders. The Content Anchor captures the relevant data in the input file and puts it into a target XML structure. This anchor also has the Marking property with the same options as above. This enables you to pick up the same thing again, and put it into different target data elements, as the cursor position remains unchanged. Also, there are Opening and Closing Markers within a content anchor, and they can act as the borders also instead of a separate Marker anchor. But, we need to be careful when we use it, since we should use either the Content's properties or the Marker defined separately. We cannot use the two things together, since the cursor moves with each marker. 
The last thing that I am going to touch base today is the Phase property. It is one of the important concepts in DT. Phase defines the order of execution. There can be three types of phases : Initial, Main, Final. Anchors can have any of these three phases. All the anchors in a particular phase are executed in one go but in the order of their appearance in the code i.e. from top to bottom. As expected, the objects in the initial phase are detected first and then executed in the order of appearance. Similarly,  the main and the final phase are detected next. By default, all Marker Anchors are in initial phase, and all Content Anchors are in main phase. So, the back end engine, first recognizes the Markers and after it has the borders, it searches for the Content in between those borders, only if we want to search for them.
The next post will contain diagrams and will also contain guidelines to start working with Informatica B2B. Happy Learning !

9 comments:

  1. This was nice post. I have a question.
    Is B2B Data Exchange and B2B Data Transformation are different.
    I mean. Will they both have separate licence to get from Informatica.
    My requiremnt is to read only HIPPA, Industry format files. So at this time what product do i need to get from informatica.
    If i got B2B Data Exchange will this includes everything?

    ReplyDelete
  2. No, B2B Data Exchange is different. It is a sort of file managing system which allows us to move files from one flow/part of an organization to another. If you want to handle HIPPA, then you need B2B Data transformation license at least. If you have a unix server, you can run B2B DT from there using unix shell scripts.

    ReplyDelete
  3. This is a great article with very useful information. I have been researching data transformation and I could not find the basic concepts about the whole process. I know it can be confusing but I was told there are systems that can make the process a lot quicker.

    ReplyDelete
  4. Hi,
    I feel there is a bit of confusion between the different variations of the technology - B2B Data Transfromation , B2B Data Exchange, Advanced XML option, and UDO (Unstructured Data Option)
    to clarify :

    B2B Data Transformation (B2B DT) - A technical add on to PowerCenter which provides the capability to extract and create complex data formats, such as office documents, complex XMLs, industry Standards such as EDI, and more. Technically it is used within your PowerCenter mapping, just as another PowerCenter Transfromation.

    Advanced XML option (AXO) – A PowerCenter option to handle complex XMLs and flat files (not office documents, not PDF’s). Technically, it is the same product as the B2B Data Transfromation, the difference is only from the licensing perspective. Important : PowerCenter RealTime customers gets AXO for free as part of the RealTime bundle !

    Unstructured Data Option - a PowerCenter option to handle office documents and complex flat files . Technically, it is the same product as the B2B Data Transfromation, the difference is only from the licensing perspective.

    B2B Data Exchange - Data Exchange provides complete end to end solution to Exchanging information with external trading partners . This product includes the following Components :
    PowerCenter RT edition
    B2B DataTransfromation
    Trading Partner Portal – a monitoring and management user interface (web based) to manage the exchange. Its capabilitiesc are : creating a list of trading partners, viewing the transaction executed for every trading partner, and their status, and more.
    Managed File Transfer option (MFT) – Handles the technical connectivity and establishing secure channels beyond the organization firewall . MFT allows different protocols, such as AS2, SFTP, HTTP, HTTPs and more.


    The writer is the Founder of A.B Link consulting, a consultancy firm specializing in Informatica Data Transfromation consulting and Informatica Data Exchange consulting .

    ReplyDelete
  5. Hi,

    We are trying to implement PDF file processing using PowerCenter.
    We see that we can possibly use UDO option, but also we heard that there is 'Data transformation' package freely available with PowerCenter .
    Can you please help with below queries on this
    1. What is difference between UDO and Data transformation
    2. Whether both these options are supported in 9.1 and 9.6 version
    3. Which is better option to do PDF file processing out of these two

    We will really appreciate your help on this topic.

    Thanks And Regards,
    Ashish A Potnis

    ReplyDelete
  6. Hey

    Can u pls continue with u r concept of covering the B2b Next concepts as well ? This content helped me but am eagerly awaiting for the next in series

    ReplyDelete
  7. HI Guys... Here I have an Issue regarding Date and Time Format. My source is an Excel file and I am processed through B2B.
    1. I have used Unstructured data Transformation in Power center and I have exported the XSD file and in Informatica studio, I have created a new project.
    2. From Excel file I am getting the Date and time format '6/29/15 10:07 AM' how to achieve this in B2B Data Transformation.

    Thanks in Advance
    Suresh.

    I am not able to Process the Date and time Stamp in B2B...We are processing an Excel file...

    Please provide me ur Contact number....I will be very help full for me....

    Thanks,

    ReplyDelete
  8. Nice post! This blog gives very important info about BI tools
    Thanks for sharing Informatica Online Training

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete