Etched in Black: 06/29/11

Wednesday, 29 June 2011

Diary 62: And Just Another Day Again

Yesterday I slept at around 4:30 am. I recorded a new version of the Kisine and named it Kisine Upgrade. I also uploaded it in muziboo and here is the link. Anyways, I watched a lot of Youtube videos, and meanwhile completed the Day 10 of the JAVA book. Today I woke up at around 10:30 am. As usual, I brushed my teeth and went online to check the mails and all. I was depressed because the FeastHunt was going nowhere, and it seemed that I am the only one interested. Akshay and Dallas were too busy to do anything. So, in my mind, I decided that whatever money it takes, I will give the job to the webprint guy. Anyways, I started with JAVA's Day 11 chapter. I was pacing through the chapter till I reached a part where things got tough, so I left it for the time being as I was already too full of JAVA. Since I was dealing with the same type of things, so I was a bit bored too. After that, I realized that it is June 29th. Two years ago, this day marked the beginning of my job life. Everyone of that batch was updating his/her status message. I dedicated the song "Kisine Upgrade" to them. Few of them heard it and liked it. Rest didn't care to listen. Anyways, the next part of the day was dedicated to write posts on Informatica B2B Data transformation. It took a lot of time, as I had to take snapshots, upload them etc. But, I was happy after that. Meanwhile, I got a call from Abhishek agarwala, and we decided to meet up in Andheri tomorrow. Then, I decided to look back into the "Induction Day Song" that I and Janani made, and Ojas wrote. I decided to re-record it again, but soon realized that it doesn't suit my voice. Anyways, I tried convincing my sister to sing it, but she denied. I was also helping her in doing her project work and home work. She is not that good in maths. In fact, not good at all. Her basics are very shaky. During the evening time, we ordered McDonald's. I used one of the coupons for that. We ate that as our dinner. After that, I started watching the India WI test match. India were all out for 201, and WI were chasing with 82/5. Then, I again started helping my sister with the maths problems. I moved back to my room, and browsed through random things. Then, I started writing this post. Next I plan to practice some music, and then sleep by 1:30 am.

Informatica B2B Data Transformation: Basic concepts (working with Parser)

Below, is a diagram which shows how a basic Parser looks like, and also the Marker and Content Anchors. It also shows what are the various ways of looking in a file and searching for a pattern in the file.

In order to open a new blank project in DT, go to File -> New -> Project -> Data Transformation Project -> Blank project. Give the project a name, and then you are ready to go. Initially you will have nothing but a blank code file (.tgp file) under the Scripts section on the left hand side (Data transformation Explorer). If you are not able to see the Data transformation Explorer, go to Window -> Show View -> Data Transformation Explorer.

Before you begin anything, just prepare a basic XSD file (preferably using XML Spy) which contains a Root element and a Data element as the child element of that Root.

Initial Stage (Only a blank code file)

Open the image files in a new tab to view it in a better way.

A simple XSD file

It is better to use a software such as XML Spy to make a Schema definition file. But, if you are well versed with XML and XSD, you can directly code it here.

The basic components of a Parser

It is an empty Parser. If we look at the image, we will see the various properties of a Parser. We will go deep into each one of them, in the coming posts. Example source is a required element. It is the input to the parser.

I used a simple Text as an input

Other input options are also available, but for simplicity I used a Text input. After doing this Press Ctrl-F10, and you will see the input on the right hand side as shown in the next image.

Marker in Action

You can see that once I introduce a Marker, which searches for "apple" in the body of the code, the "apple" word in the input file gets marked. You can use the property expander/minimizer to view the other properties of a marker (Phase, Marking etc).

Note: You can enter a Marker by either selecting the word from the right hand side and selecting "insert Marker" from the right click, or you can just start typing in Marker, and press Enter in the code area.

Content Anchor

Next, if we add a content anchor, by just typing in Content in the code area below the marker, we will see that the text "very much" gets selected in a different color. This denotes the part to be captured and put into the XSD element. For now, the data holder of the Content is blank, but we will add it eventually, otherwise the project will be invalid. Also, note the disable and optional features, which helps you to discard parts of code while executing the code.

Marker Content Maker

In this, we can see how the Markers act as a border for the content. We define one more Marker, which searches for "much", just below the Content anchor and hence we see that the scope of the Content anchor is then limited to the word "very" (along with spaces). Next, we will place this word into a data element in the target XSD.

XSD element in data holder

When we click on the data holder of the Content Anchor, we will find that a new pop up box opens up. This shows the variables schema that are available in the project. It also shows the local variables which contains the system variables as well any custom variables that you might have made. For now, my XSD is placed in no target namespace. So I selected the Data element from there.

Once you done with the above steps, save the code and press Ctrl-F10 to ensure that the markings on the input file are right. Also, if you want to experiment with properties of Marking etc, you can do that. You will understand the concept better by experimenting with the options available. Anyways, now to run the project, you can go to Run in the menu bar or press F5 or Ctrl-F5. Once you do that, you will see the Events tab filled with the execution steps below the code window, and also an output file under the Results section on the left hand side. Open the output file to view the desired XML file.

The final output file

You can browse through the execution in the events tab, and understand how the back end engine works. You will also observe that the Markers are executed first as they are in the initial phase followed by the Content which is in the main phase. Also, you will see that the Data element contains the word "very", hence verifying that the code works fine.

This post showed the working of a Parser. The example here is very basic, but is the foundation of more complex things that can be handled in a parser in DT. Please note the important concepts in here: Marking property, Phase property, Data holder, Opening and Closing Marker.
If you happen to pass by, and have any doubts, then please drop a comment to this post, and I will get back to you.

Informatica B2B Data Transformation: Basic concepts

Well, I am returning to this topic after a long time. Looking at the stats of this blog, I realized that many people are visiting mainly because of the informatica B2B post here. So, I have decided to continue with it.

In this post, I plan to cover the next level basic components in a Data transformation(DT) project (the file has a format of .cmw). In a bird's eye view, the DT project can comprise of three components: Parser, Mapper, Serializer.

A Parser is a component which converts raw unstructured (non-relational) data into structured XML data. For ex: It can take an excel file as an input, and put the required excel cell data into the target XML.

A Mapper is a component which helps in mapping the data from one XML structure to another XML structure. This is generally used after the raw unstructured data has been brought into a staging XML structure. Once, the data is in XML, if we need to refine the structure more, we can use a mapper.

A Serializer is a component which prints the XML data into a file of any other format (.txt , don't know about other formats as I have not tried them yet). It does the opposite of a Parser, and can be used anywhere in the project. Sometimes to generate text files to check statuses of few variables or a final text file which contains some information (Ex: Invoice ).

Now, to implement all of the above three components, we need few internal and global components. In any programming language, we use variables, data types, data structures, built in functions to build bigger and customized objects. In the same way, here we make use of "Anchors". As the name suggests, an Anchor is an object which helps keep track of the objects we are dealing with- in this case our input files and XML schema. There are many types of Anchors but the two most basic and most important ones are - Marker and Content.

A Marker is an anchor which searches for patterns within the input file, and moves the cursor to that position. Say, we are looking for the word "apple" in the input file. Then, if we build a marker which searches for "apple", it will go and find the first "apple" that appears in the file, and place the cursor just after the letter 'e' i.e. after the word apple. This is the default operation of a Marker, but we can customize it to place the cursor in the beginning also. This can be done with the help of Marking property. It can have four values: None (no marking, the cursor remains at the beginning of the file), Begin (begin position), End(end position) and Full(which marks the whole word and places the cursor at the end).

A Content is an anchor which is used to capture the relevant data between various Marker Anchors. You can imagine the Marker Anchors as borders, and Content Anchors as anything within those borders. The Content Anchor captures the relevant data in the input file and puts it into a target XML structure. This anchor also has the Marking property with the same options as above. This enables you to pick up the same thing again, and put it into different target data elements, as the cursor position remains unchanged. Also, there are Opening and Closing Markers within a content anchor, and they can act as the borders also instead of a separate Marker anchor. But, we need to be careful when we use it, since we should use either the Content's properties or the Marker defined separately. We cannot use the two things together, since the cursor moves with each marker.

The last thing that I am going to touch base today is the Phase property. It is one of the important concepts in DT. Phase defines the order of execution. There can be three types of phases : Initial, Main, Final. Anchors can have any of these three phases. All the anchors in a particular phase are executed in one go but in the order of their appearance in the code i.e. from top to bottom. As expected, the objects in the initial phase are detected first and then executed in the order of appearance. Similarly, the main and the final phase are detected next. By default, all Marker Anchors are in initial phase, and all Content Anchors are in main phase. So, the back end engine, first recognizes the Markers and after it has the borders, it searches for the Content in between those borders, only if we want to search for them.

The next post will contain diagrams and will also contain guidelines to start working with Informatica B2B. Happy Learning !