Apache HIVE

Discussion in 'OT Technology' started by Peyomp, Jul 2, 2009.

  1. Peyomp

    Peyomp New Member

    Jan 11, 2002
    Likes Received:

    You can chuck any CSV into this thing (or any format you want, if you write a serde - which isn't that hard, as it is just Java), and throw a bunch of EC2 boxes at it to run SQL on it. Perfectly linear scaling.

    Very fun. I'll bring up a cluster of 20 c1.mediums at $4 an hour, and its pretty fast to chug through a good volume of logs.

    What we've done with is so far: Gigabytes of Tivoli logs copied into HDFS. Easily join across them and query them any whichaway - no indexes, nothing, across 40 cores, including the ability to very easily bring in any dataset for joining. Only pay as you use it.

    Its nice not to worry about indexes. "Fuck it, we'll just do a sequence scan across lots of machines." As long as you can RENT the machines to do that on... beautiful. There's very little setup time, you just chuck data in and you're off. No OLAP cubes, no index tuning... just throwing lots of hardware in a horizontally scalable system at the problem. In terms of the value of an analyst's time... how many problems are there with a lot of data that AREN'T worth throwing $40 an hour at (the cost of 50 XL EC2 instances) to get 200 cores and lots of IO on the problem?

    With Apache Pig... you don't even have to make tables. It will eat any columnar data and you can use it immediately.

    But don't take my word for it. You can start playing with this stuff immediately using the Cloudera amis and scripts. How easy is it to run a 20 host HIVE cluster?

    hadoop-ec2 launch-cluster my-cluster 19

    One master, 19 slaves. To login: haddop-ec2 login my-cluster.

    Now type: hive

    You're in. Make a simple schema for your data. Then something like:

    hadoop -mkdir input
    hadoop -put mydata.csv.gz input/
    hadppp -put mydata2.csv.gz input/

    Your data is in hadoop FS. Your schema loads it, and you have massive parallelism in your analysis. Fun like whoah.

    Last edited: Jul 2, 2009

Share This Page