Yesterday I slept at around 4:30 am. I recorded a new version of the Kisine and named it Kisine Upgrade. I also uploaded it in muziboo and here is the link. Anyways, I watched a lot of Youtube videos, and meanwhile completed the Day 10 of the JAVA book. Today I woke up at around 10:30 am. As usual, I brushed my teeth and went online to check the mails and all. I was depressed because the FeastHunt was going nowhere, and it seemed that I am the only one interested. Akshay and Dallas were too busy to do anything. So, in my mind, I decided that whatever money it takes, I will give the job to the webprint guy. Anyways, I started with JAVA's Day 11 chapter. I was pacing through the chapter till I reached a part where things got tough, so I left it for the time being as I was already too full of JAVA. Since I was dealing with the same type of things, so I was a bit bored too. After that, I realized that it is June 29th. Two years ago, this day marked the beginning of my job life. Everyone of that batch was updating his/her status message. I dedicated the song "Kisine Upgrade" to them. Few of them heard it and liked it. Rest didn't care to listen. Anyways, the next part of the day was dedicated to write posts on Informatica B2B Data transformation. It took a lot of time, as I had to take snapshots, upload them etc. But, I was happy after that. Meanwhile, I got a call from Abhishek agarwala, and we decided to meet up in Andheri tomorrow. Then, I decided to look back into the "Induction Day Song" that I and Janani made, and Ojas wrote. I decided to re-record it again, but soon realized that it doesn't suit my voice. Anyways, I tried convincing my sister to sing it, but she denied. I was also helping her in doing her project work and home work. She is not that good in maths. In fact, not good at all. Her basics are very shaky. During the evening time, we ordered McDonald's. I used one of the coupons for that. We ate that as our dinner. After that, I started watching the India WI test match. India were all out for 201, and WI were chasing with 82/5. Then, I again started helping my sister with the maths problems. I moved back to my room, and browsed through random things. Then, I started writing this post. Next I plan to practice some music, and then sleep by 1:30 am.
Wednesday, 29 June 2011
Informatica B2B Data Transformation: Basic concepts (working with Parser)
Below, is a diagram which shows how a basic Parser looks like, and also the Marker and Content Anchors. It also shows what are the various ways of looking in a file and searching for a pattern in the file.
In order to open a new blank project in DT, go to File -> New -> Project -> Data Transformation Project -> Blank project. Give the project a name, and then you are ready to go. Initially you will have nothing but a blank code file (.tgp file) under the Scripts section on the left hand side (Data transformation Explorer). If you are not able to see the Data transformation Explorer, go to Window -> Show View -> Data Transformation Explorer.
Before you begin anything, just prepare a basic XSD file (preferably using XML Spy) which contains a Root element and a Data element as the child element of that Root.
Initial Stage (Only a blank code file) |
Open the image files in a new tab to view it in a better way.
A simple XSD file |
It is better to use a software such as XML Spy to make a Schema definition file. But, if you are well versed with XML and XSD, you can directly code it here.
The basic components of a Parser |
It is an empty Parser. If we look at the image, we will see the various properties of a Parser. We will go deep into each one of them, in the coming posts. Example source is a required element. It is the input to the parser.
I used a simple Text as an input |
Other input options are also available, but for simplicity I used a Text input. After doing this Press Ctrl-F10, and you will see the input on the right hand side as shown in the next image.
Marker in Action |
Note: You can enter a Marker by either selecting the word from the right hand side and selecting "insert Marker" from the right click, or you can just start typing in Marker, and press Enter in the code area.
Content Anchor |
Marker Content Maker |
XSD element in data holder |
Once you done with the above steps, save the code and press Ctrl-F10 to ensure that the markings on the input file are right. Also, if you want to experiment with properties of Marking etc, you can do that. You will understand the concept better by experimenting with the options available. Anyways, now to run the project, you can go to Run in the menu bar or press F5 or Ctrl-F5. Once you do that, you will see the Events tab filled with the execution steps below the code window, and also an output file under the Results section on the left hand side. Open the output file to view the desired XML file.
The final output file |
This post showed the working of a Parser. The example here is very basic, but is the foundation of more complex things that can be handled in a parser in DT. Please note the important concepts in here: Marking property, Phase property, Data holder, Opening and Closing Marker.
If you happen to pass by, and have any doubts, then please drop a comment to this post, and I will get back to you.
Informatica B2B Data Transformation: Basic concepts
Well, I am returning to this topic after a long time. Looking at the stats of this blog, I realized that many people are visiting mainly because of the informatica B2B post here. So, I have decided to continue with it.
In this post, I plan to cover the next level basic components in a Data transformation(DT) project (the file has a format of .cmw). In a bird's eye view, the DT project can comprise of three components: Parser, Mapper, Serializer.
A Parser is a component which converts raw unstructured (non-relational) data into structured XML data. For ex: It can take an excel file as an input, and put the required excel cell data into the target XML.
A Mapper is a component which helps in mapping the data from one XML structure to another XML structure. This is generally used after the raw unstructured data has been brought into a staging XML structure. Once, the data is in XML, if we need to refine the structure more, we can use a mapper.
A Serializer is a component which prints the XML data into a file of any other format (.txt , don't know about other formats as I have not tried them yet). It does the opposite of a Parser, and can be used anywhere in the project. Sometimes to generate text files to check statuses of few variables or a final text file which contains some information (Ex: Invoice ).
Now, to implement all of the above three components, we need few internal and global components. In any programming language, we use variables, data types, data structures, built in functions to build bigger and customized objects. In the same way, here we make use of "Anchors". As the name suggests, an Anchor is an object which helps keep track of the objects we are dealing with- in this case our input files and XML schema. There are many types of Anchors but the two most basic and most important ones are - Marker and Content.
A Marker is an anchor which searches for patterns within the input file, and moves the cursor to that position. Say, we are looking for the word "apple" in the input file. Then, if we build a marker which searches for "apple", it will go and find the first "apple" that appears in the file, and place the cursor just after the letter 'e' i.e. after the word apple. This is the default operation of a Marker, but we can customize it to place the cursor in the beginning also. This can be done with the help of Marking property. It can have four values: None (no marking, the cursor remains at the beginning of the file), Begin (begin position), End(end position) and Full(which marks the whole word and places the cursor at the end).
A Content is an anchor which is used to capture the relevant data between various Marker Anchors. You can imagine the Marker Anchors as borders, and Content Anchors as anything within those borders. The Content Anchor captures the relevant data in the input file and puts it into a target XML structure. This anchor also has the Marking property with the same options as above. This enables you to pick up the same thing again, and put it into different target data elements, as the cursor position remains unchanged. Also, there are Opening and Closing Markers within a content anchor, and they can act as the borders also instead of a separate Marker anchor. But, we need to be careful when we use it, since we should use either the Content's properties or the Marker defined separately. We cannot use the two things together, since the cursor moves with each marker.
The last thing that I am going to touch base today is the Phase property. It is one of the important concepts in DT. Phase defines the order of execution. There can be three types of phases : Initial, Main, Final. Anchors can have any of these three phases. All the anchors in a particular phase are executed in one go but in the order of their appearance in the code i.e. from top to bottom. As expected, the objects in the initial phase are detected first and then executed in the order of appearance. Similarly, the main and the final phase are detected next. By default, all Marker Anchors are in initial phase, and all Content Anchors are in main phase. So, the back end engine, first recognizes the Markers and after it has the borders, it searches for the Content in between those borders, only if we want to search for them.
The next post will contain diagrams and will also contain guidelines to start working with Informatica B2B. Happy Learning !
Subscribe to:
Posts (Atom)