Java/Web programmers in here....got a question

Discussion in 'OT Technology' started by scrotomus, Nov 29, 2007.

Thread Status:
Not open for further replies.
  1. scrotomus

    scrotomus you're a scumbag

    Joined:
    Aug 5, 2002
    Messages:
    70,262
    Likes Received:
    0
    Say you wanted to create a database of product, description and price for a variety of websites. The companies aren't cooperative and don't want to give their databases to you...

    bestbuy, sears, etc.


    How would you accomplish creating a site trawling application? Also weekly updates would be helpful...

    Someone I know wants to accomplish this
     
  2. TheDarkHorizon

    TheDarkHorizon \xC0\xFF\xEE

    Joined:
    Sep 26, 2002
    Messages:
    2,396
    Likes Received:
    0
    Location:
    San Francisco, CA
    Screen scrape -- it's not fun.
     
  3. scrotomus

    scrotomus you're a scumbag

    Joined:
    Aug 5, 2002
    Messages:
    70,262
    Likes Received:
    0
    I wonder how pricegrabber does it
     
  4. whup

    whup I wish you had children and.. so that I could step

    Joined:
    Feb 12, 2007
    Messages:
    1,603
    Likes Received:
    0
    I can't say I've used it (I've used Lucene which it is built on), but Nutch could achieve what you want. http://lucene.apache.org/nutch/about.html

    I'm not sure how easy it is to configure to search, scrape and index those pages. Otherwise you could make a light app yourself to download documents, and scrape what you need from in there using regexes.
     
  5. TheDarkHorizon

    TheDarkHorizon \xC0\xFF\xEE

    Joined:
    Sep 26, 2002
    Messages:
    2,396
    Likes Received:
    0
    Location:
    San Francisco, CA
    AFAIK, they have partnerships with merchants.
     
  6. Peyomp

    Peyomp New Member

    Joined:
    Jan 11, 2002
    Messages:
    14,017
    Likes Received:
    0
    You wouldn't use Java. You would use something like Perl/LWP that is optimized for parsing complex text.
     
  7. br0wer

    br0wer New Member

    Joined:
    Nov 29, 2007
    Messages:
    66
    Likes Received:
    0
    PHP ftw. The preg functions (preg_match(), preg_replace()) use perl compatible regular expression syntax, and PHP is much easier to pick up than Perl (opinion).
     
  8. Peyomp

    Peyomp New Member

    Joined:
    Jan 11, 2002
    Messages:
    14,017
    Likes Received:
    0
    Perl, PHP, Ruby, Python... all valid choices.
     
  9. scrotomus

    scrotomus you're a scumbag

    Joined:
    Aug 5, 2002
    Messages:
    70,262
    Likes Received:
    0
    all I know pretty much is java :hs:



    so that leaves me with JScrape and other variations of XQuery, Lucene, and some other Java APIs , wonder what the learning curve on this shit is
     
  10. br0wer

    br0wer New Member

    Joined:
    Nov 29, 2007
    Messages:
    66
    Likes Received:
    0
    If I were in your position, I'd just use this as an opportunity to pick up a scripting language. PHP's syntax is modeled after Java/C++, so you'd probably be able to pick it up quickly, especially if you're already good at Java. Zend has a good set of beginner tutorials that should have you up and running in no time.

    http://devzone.zend.com/node/view/id/627

    Programmers should always be willing to learn new languages. :p
     
  11. Peyomp

    Peyomp New Member

    Joined:
    Jan 11, 2002
    Messages:
    14,017
    Likes Received:
    0
    He doesn't need Zend. He needs a web CLIENT to do his stuff. Something like LWP.
     
  12. br0wer

    br0wer New Member

    Joined:
    Nov 29, 2007
    Messages:
    66
    Likes Received:
    0
    The link was to Zend's PHP tutorials. I wasn't suggesting that he use the Zend framework, if that's what you're talking about.
     
  13. whup

    whup I wish you had children and.. so that I could step

    Joined:
    Feb 12, 2007
    Messages:
    1,603
    Likes Received:
    0
    Now is not the time for him to try and switch languages when he just needs to get something done! Crazy.

    Stick with Java. There should be at least a handful of libraries that can help you do what you want, or you should be able to write it yourself easily piecing together libraries.

    Did you even take a look at Nutch? I don't know what's the point really; the fact someone's posting about a problem that's been addressed many times already is posting in here instead of using Google beats me.
     
  14. Peyomp

    Peyomp New Member

    Joined:
    Jan 11, 2002
    Messages:
    14,017
    Likes Received:
    0
    Actually getting this done in a dynamic language will probably not take as long as Java, even if he has to learn a new language.
     
  15. scrotomus

    scrotomus you're a scumbag

    Joined:
    Aug 5, 2002
    Messages:
    70,262
    Likes Received:
    0
    Google isnt the best for suggestions, but it is good for information... and it's not like I haven't used google at all for researching this :cjerk:
     
  16. Joe_Cool

    Joe_Cool Never trust a woman or a government. Moderator

    Joined:
    Jun 30, 2003
    Messages:
    299,223
    Likes Received:
    525
    :werd: to all of this. PHP is easy to learn if you know Java, and like PERL, it's perfectly suited to this kind of task.
     
  17. Dnepr

    Dnepr Guest

    You can easily do this in Java :dunno:

    Dunno whats all the fuss is about.
     
  18. Dnepr

    Dnepr Guest

    You will need to create a Generic Database Schema which would contain enough fields to basically fit all major online stores, then I'd probably use something like Hibernate to simplify the work with it or you can use simple SQL queries if you like. You would then need to read up on Java's URL protocols and handling them, then build an application which basically does a full tree traversal based on your criteria. A web page is a tree of sorts if you think about it.
     
  19. Peyomp

    Peyomp New Member

    Joined:
    Jan 11, 2002
    Messages:
    14,017
    Likes Received:
    0
    You can, but its much more code and less elegant than in a dynamic language. They are good at this kind of thing.
     
  20. Dnepr

    Dnepr Guest

    :hsugh:
     
  21. Dnepr

    Dnepr Guest

    I kinda like this project, gonna remember it and then build it for fun when I get time. Its a good exercise.
     
  22. Peyomp

    Peyomp New Member

    Joined:
    Jan 11, 2002
    Messages:
    14,017
    Likes Received:
    0
    I use the Java http stuff frequently, and LWP is a much better way to go at it. Its also very easy to learn. The comparable Java is voluminous and abstract... its Java.
     
  23. Dnepr

    Dnepr Guest

    Abstraction is a good thing :hsugh:
     
  24. Peyomp

    Peyomp New Member

    Joined:
    Jan 11, 2002
    Messages:
    14,017
    Likes Received:
    0
    Not for every problem. The Perl to do this is about 1/10 as long. Your continued sarcasm is strong indication of ignorance. Do you even know a dynamic language?
     
  25. Joe_Cool

    Joe_Cool Never trust a woman or a government. Moderator

    Joined:
    Jun 30, 2003
    Messages:
    299,223
    Likes Received:
    525
    Never heard of using the tool best suited to the job, eh? :hsugh:

    A backhoe is way better at digging than a trowel. That doesn't mean that a backhoe would be the best tool for taking care of your wife's flower garden.
     
    Last edited: Dec 2, 2007
Thread Status:
Not open for further replies.

Share This Page