Hadoop: After the big data hackathon

So yesterday we’ve completed a challenge. After uninterrupted coding for around 19 hours, I got some sleep.

First thing is first, I have to admit that I did not get any prize. I couldn’t make it to top three in this hackathon. I completed only 7 of the 9 challenges. If I may give you some statistics,

149 Developer signed up for code challenge on Eventbrite
87 of the came to the hackathon
After midnight we were around 20 left
Only 16 of developers submitted at least one solution (I submitted 7 so I think myself as a good developer)

Now I want to talk about challenges:

There were 9 challenges everyone require to process a little more than 100GB of data, 1TB in total. Except for the second challenge every challenge had a Twitter JSON data, second had an fixed length file. Every challenge had a 150MB sample data to try on.

Challenges were:

1- Counting the tweets grouped by country codes. And output should be ordered by country codes.
2- Every record was an 16 byte array, 8 byte for key and 8 byte for value. Challenge is to sum values grouped by keys. Output should be ordered by keys. But the real challenge was to read this type of file. Developers had to implement their own InputFormat and RecordReader.
3- It was about counting the unique users.
4- Counting tweets from giving language. Parameter should be given as input.
5- Finding the MAX and MIN user id.
6- Finding the person with the max popularity score, which is calculated as if A mentions B and A has X tweets and Y follower B gets X*Y points.
7- Finding the user with MAX and MIN score, which is calculated as if A tweets it gets -1 score and if A gets mentioned it gets +1 score.
8 – Counting all the tweets.
9 – Finding the person which has a maximum number of first and second degree connections, based on mentions.

I could not submit second and nineth challenges. And as I look the challenges now I saw that I misunderstood third one (blame the sleepless night).