Map Reduce

Discussion in 'OT Technology' started by Peyomp, Jan 9, 2009.

  1. Peyomp

    Peyomp New Member

    Joined:
    Jan 11, 2002
    Messages:
    14,017
    Likes Received:
    0
    Has anyone played with Map Reduce?
     
  2. Peyomp

    Peyomp New Member

    Joined:
    Jan 11, 2002
    Messages:
    14,017
    Likes Received:
    0
  3. EvilSS

    EvilSS New Member

    Joined:
    Jun 11, 2003
    Messages:
    5,104
    Likes Received:
    0
    Location:
    STL
    I'm sure someone at google has.
     
  4. piratepenguin

    piratepenguin New Member

    Joined:
    Jun 18, 2006
    Messages:
    1,067
    Likes Received:
    0
    Location:
    Ireland
    wotchu using THAT for lol
     
  5. dissonance

    dissonance reset OT Supporter

    Joined:
    May 23, 2006
    Messages:
    5,652
    Likes Received:
    1
    Location:
    KS
    you forgot the voice crack in there...
     
  6. Limp_Brisket

    Limp_Brisket New Member

    Joined:
    Jan 2, 2006
    Messages:
    48,422
    Likes Received:
    0
    Location:
    Utah
    i did a little presentation about it in my algorithms class

    here's a little example perl script i made as an example of how map reduce works, it's just concept though and it doesn't use the actual map reduce framework that google made. it does a file word count using multi (well, only 2) processes.

    Code:
    
    open F, "<text.txt" or die "can't open";
    @F = <F>;
    close F;
    
    $start = time();
    
    $mid = $#F/2;
    @F1 = @F[0..$mid];
    @F2 = @F[$mid+1..$#F];
    
    
    $stuff = 0;
    
    pipe H_IN, H_OUT;
    
    $pid = fork();
    if($pid == 0){  # child
        close H_IN;
        
        map { $hash{lc($_)}++ for /[a-z]+/ig } @F2;
        
        print H_OUT "$k,$v " while ($k, $v) = each %hash;
        close H_OUT;
        
        exit(0);
    }
    else{            # parent
        close H_OUT;
        
        map { $hash{lc($_)}++ for /[a-z]+/ig } @F1;
        
        $stuff = <H_IN>;
        waitpid($pid, 0);
    }
    
    for(split / /,$stuff){
        ($k, $v) = split /,/;
        $child_hash{$k} = $v;
    }
    
    $hash{$k} += $v while ($k, $v) = each %child_hash;
    
    print "'lorem' found: $hash{lorem} time(s)\n";
    print "time: ".(time() - $start)." sec(s)";
    
     
    Last edited: Jan 9, 2009
  7. CodeX

    CodeX Guest

    I helped design it, what do you want to know?
     
  8. FartLighter

    FartLighter Resident Fart Expert OT Supporter

    Joined:
    Jul 5, 2005
    Messages:
    2,853
    Likes Received:
    9
    Location:
    Mammoth Lakes, CA
    I am playing with Hadoop right now. Pretty cool. I'm using it for some analysis on web scraping.
     
  9. Peyomp

    Peyomp New Member

    Joined:
    Jan 11, 2002
    Messages:
    14,017
    Likes Received:
    0
    Awesome, a friend is giving a talk on Hadoop at Cloudcamp next week, and I'm really looking forward to getting my feet wet.
     
  10. zanyspy_dude

    zanyspy_dude King of teh n00bz

    Joined:
    Aug 29, 2002
    Messages:
    4,473
    Likes Received:
    0
    Location:
    Indianapolis, IN
    I've worked alot with Hadoop. Pretty amazing little project. Sped our shit WAY up. Doesn't work for every situation, but when it works it works very nicely :big grin:
     
  11. Peyomp

    Peyomp New Member

    Joined:
    Jan 11, 2002
    Messages:
    14,017
    Likes Received:
    0
    What kind of problems have you used it for?
     
  12. zanyspy_dude

    zanyspy_dude King of teh n00bz

    Joined:
    Aug 29, 2002
    Messages:
    4,473
    Likes Received:
    0
    Location:
    Indianapolis, IN
    an infoviz related app.

    What is it you're going to use it for?
     
  13. FartLighter

    FartLighter Resident Fart Expert OT Supporter

    Joined:
    Jul 5, 2005
    Messages:
    2,853
    Likes Received:
    9
    Location:
    Mammoth Lakes, CA
    If the problem can be solved using "divide and conquer" with multiple machines, map reduce should be at least of some use. I have used it for web scraping and analysis of the scraped data. In general, it is most appropriate for massive data sets. I can't imagine a situation with big data where it can't be used. Am interested to hear one :)
     
  14. Peyomp

    Peyomp New Member

    Joined:
    Jan 11, 2002
    Messages:
    14,017
    Likes Received:
    0
    Fast analytics on big piles of data.
     
  15. zanyspy_dude

    zanyspy_dude King of teh n00bz

    Joined:
    Aug 29, 2002
    Messages:
    4,473
    Likes Received:
    0
    Location:
    Indianapolis, IN


    sounds perfect! Check out the listserv, there are all sorts of cool tricks to pick up.
     

Share This Page