ODI 12c: TROUG BI/DW SIG 2014

Hello all,

On 3rd April 2014 TROUG is organizing BI/DW SIG Day at İTÜ Arı3 Teknokent. You can have detailed information on : http://www.troug.org/?p=684 and you can register for the event on eventbrite : https://www.eventbrite.com/e/troug-bidw-sig-meeting-tickets-10986690487

I willl be there as a speaker, my session will be about Requirements, Installation and New Features of ODI 12c, hope to see you there.

My presentation will be up in this post on 3rd evening. 
You can reach presentation by this link.

Mark your calendars, register for event and have a nice day.

Hadoop: After the big data hackathon

So yesterday we’ve completed a challenge. After uninterrupted coding for around 19 hours, I got some sleep.

First thing is first, I have to admit that I did not get any prize. I couldn’t make it to top three in this hackathon. I completed only 7 of the 9 challenges. If I may give you some statistics,

149 Developer signed up for code challenge on Eventbrite
87 of the came to the hackathon
After midnight we were around 20 left
Only 16 of developers submitted at least one solution (I submitted 7 so I think myself as a good developer)

Now I want to talk about challenges:

There were 9 challenges everyone require to process a little more than 100GB of data, 1TB in total. Except for the second challenge every challenge had a Twitter JSON data, second had an fixed length file. Every challenge had a 150MB sample data to try on.

Challenges were:

1- Counting the tweets grouped by country codes. And output should be ordered by country codes.
2- Every record was an 16 byte array, 8 byte for key and 8 byte for value. Challenge is to sum values grouped by keys. Output should be ordered by keys. But the real challenge was to read this type of file. Developers had to implement their own InputFormat and RecordReader.
3- It was about counting the unique users.
4- Counting tweets from giving language. Parameter should be given as input.
5- Finding the MAX and MIN user id.
6- Finding the person with the max popularity score, which is calculated as if A mentions B and A has X tweets and Y follower B gets X*Y points.
7- Finding the user with MAX and MIN score, which is calculated as if A tweets it gets -1 score and if A gets mentioned it gets +1 score.
8 – Counting all the tweets.
9 – Finding the person which has a maximum number of first and second degree connections, based on mentions.

I could not submit second and nineth challenges. And as I look the challenges now I saw that I misunderstood third one (blame the sleepless night).

Android : Tellal Project

So finally I finished a usable open source library project. I can say a milestone achieved. This was one of my dreams to contribute to open source world, tonight I completed Tellal Libary.

But what is it?
– It is a library for Android to show in-app messages to users.

Why do I need that?
– I needed this spec. when I did not able to update one of my apps, and put new version as a new app. And I was not able to tell my users this change. Then I decided to build a notification library to import in all my mobile apps.

How do it work?
– Pretty simple. While coding you hardcode a URL into TellalConfig class. It checks this URL to find notification file which has information formatted as JSON and is a simple text file. Then shows it on screen. In notification file you need to specify a title, a message and a button text.

Where can I get it?
– It is on github. Try this link to reach codes and library jar file. https://github.com/Canburakt/tellal

How can I use it?
– Just add the jar file to your project, set TellalConfig.sourceURL and execute Tellal object. You can find an example project on github.

It is not complete yet, but it is working. I will work on it, develop it. New versions will come as tellal_vYYYYMMDD.jar so you can always have final and previous version on github. There was a mistake I had in my demo project while I am working on it. Java couldn’t find class definitions, I solved it by editing Project->Properties->Java Bıild Path->Order and Export you can see its state below :

Project Properies
Project Properies

 

I also put a screenshot of my demo project so you can see the result :

Screenshot
Screenshot

Pleas do not hesitate to ask your questions on comment section below, and to contribute this project on github.

Thanks for reading.

 

Hadoop: Introduction to Big Data, Hadoop and Hadoop Ecosystem

The day has come, after playing with hadoop distributions around a year and two trainings; I feel ready to write an introduction post about Big Data, Hadoop and ecosystem projects.

1. What is Big Data?

Big Data is not Hadoop, Hadoop is just an implementation of Big Data concept. Big Data is a young concept in data analysis, ETL, data warehousing and data discovery fields; or data science for short. Every year, every day, every minute we create data, and every time we create more than we’ve created before. And we, data workers, process this data to get valuable pieces to our company, client, industry to increase income.

Here, Big Data comes in our life. Big Data has 5 V’s (in some sources you may find only 3 V’s) which are : Volume, Velocity, Variety, Verification (or Validity) and last but the most important to me Value.

5V for Big Data
5V for Big Data

Volume : Data we need to process and analyze every day is getting bigger day by day. So we need a new approach to data processing, this is where big data comes in.

Velocity : With the use of mobile devices, social media and internet we can create more data in some time, before we could do. For example, before social media if we generate just X MB data on the internet in a day, now we generate more than X GB of data in a day, may be in a couple hours. So we need to catch and process data really fast to catch-up with its speed. (There is also CEP or Fast Data concepts you may interest in this specific topic.)

Variety : I will blame social media again but with the increase of social media usage and internet usage now we generate unstructured and various types of data, like shares, likes, status updates, retweets, vines, videos, texts, gifs, other images. And to create value to our company we have to process this data. They are also coming from variable sources, network systems, internet, forms on corporate website etc.

Verification : Of course, we need the right data to get right results.

Value : This is it! It’s the reason why we process so much data in so short time. We need to get valuable pieces of data, we try to extract VALUE from the data. It is like searching for diamond in a mine.

2. What is Hadoop?

Hadoop is an open source project which implements Big data. It’s a distributed system to store and process data with commodity hardware. It does not require big powerful servers, instead you can create a cluster with desktop computers of quad-core processors with 2GB of RAM. (less than most of the modern laptops, almost all smartphones have 1 or 2 GB of RAM nowadays.)

Apache Hadoop Elephant

Hadoop first started with Google’s white paper publishing on its Big Table and MapReduce architecture. Some cool guys tried to implement and develop these features in open source manner. Then Yahoo, Facebook, Google and Apache Foundation supported them. Now Hadoop is an open source Apache project.

3. Hadoop Distributions

You can install and use Hadoop through Linux’s repositories. But there is also start-ups who bundled Hadoop with some other open-source ecosystem projects for Hadoop and with their own tools as well.

Cloudera is one of these start-ups who has own bundle and it provides a VM also for quick start.

Hortonworks is the one other start-up, who has own bundle and own VM to getting started.

Also there is bigger solutions to Hadoop like Oracle’s Big Data Appliance, Teradata’s solution, IBM and HP also have their own enterprise solutions.

4. Hadoop Ecosystem Projects

Pig : Pig is one of the data analytic tools we can use with Hadoop. It has its own language to code which is a scripting language and called Pig Latin. It is really close to English so you can code like you are writing an English essay.

Hive : Hive is another way to run data analytics. It has a SQL like language, so it is usually preferred by developers who already have SQL knowledge.

Impala : Impala is the rival of Hive, it also has a SQL like language and it is much more faster than Hive, because it does not convert code to MapReduce, instead it runs on HDFS directly. (I will tell more about MapReduce and HDFS next time.)

Oozie : Oozie is a scheduling and job management tool. Where you can define flows as XML files, and it runs jobs as defined in this XML.

Sqoop : Sqoop is a tool to load data to Hadoop from a RDBMS or vice versa. It crates MapReduce jobs to load data and runs them automatically.

Flume : Flume is a listener basically. User defines an input channel and Flume polls it repeatedly, for example user defines a log file as a channel and Flume polls in every five minutes to get latest logs from this file.

Ambari : Ambari is administration (provisioning, managing and monitoring) console for the Hadoop cluster.

 

That’s all for introduction post, soon I will be writing about Hadoop Internals: HDFS and MapReduce. Please do not hesitate to leave comments or ask question in comments section. And hopefully in february I will be building a mini Hadoop cluster at home which will be topic for another blog post.

Thanks for reading.

ODI 11g: Implementing Loops

While using ODI to implement your ETLs, you may need to have loops. Let’s look at the examples, where I will implement loops that will iterate n times (for loops) and loops that will iterate while they ensure the condition. (while loops)

For Loop

In programming we implement for loop as follows,

for (i = 0; i < 10; i++){
//statements
}

This is a simple loop which iterates ten times, if we parse the part in the parenthesis we can see in the first part we assign a value to a variable, second part we define the condition and the last part is change of variable value per iteration.

In ODI 11g we can implement this as follows:

1- Create a variable
I created a variable called V_FOR_LOOP which is numeric and does not have a refreshing code.

2- Create a package
I create a package and name it as P_FOR_LOOP, I will put a screenshot of package’s final status when we complete all steps.

3- Set variable
Set a value to our variable V_FOR_LOOP as an initialization value. I will set it as 0. Also name the step as set initial.

4- Evaluate variable
Evaluate V_FOR_LOOP against iteration condition. I will use “less than 5” as iteration value. You can choose between the options as you wish or your requirement. Name step as Evaluate Value.

5- Place your statements
Now it is time to place your statements which will iterate. I will only put one interface.

6- Increment your variable
Increment your variable one step using SET VARIABLE object’s Increment option, I will increment by one and name this step as Increment.

7- Connect your Increment step to Evaluate Value step
Until this step every object was connected to its following object with an OK line, now connect Increment to Evaluate Value with an OK line. Now it will go back to evaluation and iterate until the evaluation is false.

Here is how our package looks in final form:

For Loop Package
For Loop Package

And the operator screen when we run the package:

 

For Loop Operator View
For Loop Operator View

As seen above steps numbered 1,2,3 repeats 5 times, then Evaluate Value runs one more time, decides that V_FOR_LOOP < 5 is not true enough and package finishes its run.

While Loop

In programming we can implement while loop as follows:

while (flag == true){
//statements
}

So this will iterate unknown times until its condition becomes incorrect. Confession time : I have to admit that I have never felt need of using while loop in ETL/ODI but you may need.

Before implementing this step-by-step, I created a table includes two columns c1 and flag, where I will use flag as my condition. My data is as follows :

C1 F
— –
1 T
2 T
3 T
4 T
5 F
6 T
7 T
8 T
9 T
10 F

Now let’s implement while loop:

1- Create a variable to hold flag value
I create a variable called V_WHILE_LOOP which is alphanumeric and refreshing by : select flag from variable.test where c1 = #V_FOR_LOOP
I will use my V_FOR_LOOP to select flag values, in this sample case. Your case will contain different logic than this sample for sure.

2- Create a package
I create a package named P_WHILE_LOOP.

3- Set Variable (in my case)
Since I am refreshing my flag depending on  V_FOR_LOOP, I set this as first step.

4- Refresh Flag
Refresh your flag variable.

5- Check Flag
Evaluate flag variable.

6- Statements
Place your statements, I will put my sample interface and also I will increment V_FOR_LOOP as I will need this to reach an invalid flag.

7- Set your connections
Until the end of your statements every step will be connected by an OK, when you reach the end connect it to Refresh Flag step, so you will refresh, check and start your statements again and again until flag is false.

Here is a view of package :

While Loop Package
While Loop Package

And the view from operator:

 

While Loop Operator
While Loop Operator

You can see it hits the end when we refresh flag for the 5th time since it will return F as flag value, which is not suitable to our condition.

So here we are at the end of the post, now with the knowledge of “How to implement loops in ODI 11g”

Thank you for your patience to read, and if you have any questions or comments please drop a comment and I will read (and reply if it’s a question) it for sure.

ODI 12c: First Look and Repository Creation

Hello,

After a long pause on blog, here I am again. Oracle Data Integrator 12c is finally available for everyone to download. So in this post I will discuss about my first impressions and I will explain how to create repositories, both master and work. Actually it is pretty simple and almost same with 11g which I told in this post.

So first impressions, when you download ODI 12c through this page, you will get odi_121200.jar (numbers can differ with time since it’s version number) and some opatches bundled with it. Actually it is a bit disturbing for me to have a jar file which is 1.8GB. I’d like to have an exe for Windows.

Anyway I had some problems with running this jar also, first I tried it on my VM which has 32bit Windows 7 and got an error that states it could not reach jar file. So I moved to my physical machine that is 64bit Windows 7 and OUI could not recognize the platform and exited everytime, until I download and install Java 1.7. So after solving the problem with Java, I moved to my VM again to solve other problem where it came out that my path is problematic, since my user name is Canburak Tümer, space created a problem to reach file. I created another user without space that can run the installer.

Finally I could see the installer UI. It was a pretty straight forward installation, I just selected enterprise and went on. After installation, I ran the ODI Studio, it has a really clean and elegant splash screen and it asks to migrate any user settings from old installations. After splash screen, ODI workbench has been load:

ODI 12c Start Screen
ODI 12c Start Screen

Creating Master Repository

As I mentioned before, repository creations are almost same with ODI 11g, we will start by clicking File > New and we will see screen below:

Master - 1
Master – 1

Select “Master Repository Creation Wizard” in ODI tab and click “OK”.

Master - 2
Master – 2

You will see screen above, where we will enter database information, schema where we will create repository and DBA user to run some of the creation scripts.

Master - 3
Master – 3

Define and confirm password for SUPERVISOR user. DO NOT FORGET THIS PASSWORD UNLESS YOU HAVE ANOTHER USER WITH SUPERVISOR PRIVILEGES. For this reason many ODI developers/admins make this password “SUNOPSIS” as an old habit. I prefer to have it as “SUPERVISOR” in my VM and personal development environment.

Master - 4
Master – 4

Select password storage as you wish. Then click finish, it will run scripts now to create master repository, it took around 4 minutes in my VM, probably it will take around 2-3 minutes in your physical machines. Now it’s time to create a connection to master repository.

Connection
Connection

Click on “Connect to Repository” then click to green plus in the pop-up window, then fill required information in the form. Use SUPERVISOR as ODI user and DB user which you have created the master repository with. Make sure you have selected “Master repository only” radio button. Then click “OK”.

Wallet
Wallet

ODI 12c will ask you to if you want to keep passwords in a secure wallet with a master password. I do not have enough information about this wallet yet, but I will learn and write another post about it. I prefer the less secure way which does not include the wallet. Now we have master repository and connection to master repository. Now it’s time to create work repository.

Creating Work Repository

To create work repository, connect your master repository then go to Topology tab and expand Repositories menu.

Work - 1
Work – 1

Right click to “Work Repositories” and click to “New Work Repository” from menu.

Work - 2
Work – 2

Insert connection information of schema which you want to create Work Repository in. (I had a problem with this step, actually I wanted to use odiw_c user but ODI 12c keeps filling the form in upper case so it gives invalid credentials error.)

Work - 3
Work – 3

In final step, insert repository name and select repository type.

Work - 4
Work – 4

You can also define a password for repository, which is different from ODI user password or DB user password. This password is just to secure the repository connection. When you click “Finish” it will run scripts to create work repository and will ask you if you want to create a connection to work repository. It will create a connection without ODI user information. So you will need to edit connection to insert ODI user information.

After all these steps we have installed ODI 12c and setup both master and work repositories for our environment. And we have a final view as below :

We are ready to develop.
We are ready to develop.

Now, it’s time to create our topology connections, models, projects; import or reverse engineer data sources; develop mappings (new name for interfaces), packages and more.

Welcome to ODI 12c, keep following my blog for further posts and please do not hesitate to contact me through comment form below.

 

 

 

[Quote] Confessions of A Job Hopper

“A job, if you’re lucky enough to have one, is not a prison. If you’re bored, feeling underpaid, underappreciated, want to live in another part of the country or world or you’re just too ambitious for your own good it’s okay to change jobs*. (*Just make sure you have the new one before you leave the old one! And never, ever burn bridges.)”

Source : Here

After a series of workshops

Introduction to SQL Session
Introduction to SQL Session

We had a series of workshops in Istanbul Hackerspace about SQL and PL/SQL. Sessions are held by me, and there were three sessions, each focuses on another topic.

 

 

Our road map was:

  •  Introduction to SQL
  •  Introduction to PL/SQL
  •  Introduction to PL/SQL tuning & Oracle catalog tables.

You can find the material I’ve prepared for these workshops on http://www.canburaktumer.com/istanbulhs For me, these session were useful and succesful. I did not break the Istanbul Hackerspace tradition and had a decreasing number of participants. First day we had six participants, second day we had three participants and finally last day it was only me.

By the way, I also would like to introduce about hackerspace concept and Istanbul HS. Hackerspaces are world wide “free project ateliers”. They are basically producing projects with electronics and software. You can see a full list of hackerspaces on hackerspaces.org There are two hackerspaces in Istanbul, one in Anatolian side and one in European side. I am a volunteer in the Anatolian side, we are running an Android application project now, and we are having workshops. You can find more info on istanbulhs.org in Turkish.

That’s all for today, keep following because ODI posts will continue to come.