WEB mining web data?

Discussion in 'OT Technology' started by airbball23, Mar 2, 2010.

  1. airbball23

    airbball23 Rent this Space only $5/mnth

    Joined:
    Jan 13, 2007
    Messages:
    1,489
    Likes Received:
    0
    cliffs: i'm trying to mine data from a particular site but i'm having some troubles and I wanted to know if there was a better way to do it.

    So basically i'm trying to get redbox locations in a certain area. If you go to their site - they dont display ALL the locations - you have to place in a city and they show you a list of redboxes in that particular area. I'm trying to do an entire state or at least half a state. Let's take NY for example - I want to mine the addresses for all the redboxes in Manhatten (as an example, not the city i want to mine) but I dont want to manual copy and paste all the results into excel.

    I tried to grab the info inside excel but the way they have their site set up - when excel grabs the certain box only thing that shows up is the word "go" and not the address.

    Is there a better way I can mine this data?
     
  2. kingtoad

    kingtoad OT Supporter

    Joined:
    Sep 2, 2003
    Messages:
    55,918
    Likes Received:
    10
    Location:
    Los Angeles
    Do they have an API for you to get this data from? If not, then no, unless you want to go through the pain of scraping data. If they have an API, then I'm sure you can create something simple enough to interface with it.
     
  3. airbball23

    airbball23 Rent this Space only $5/mnth

    Joined:
    Jan 13, 2007
    Messages:
    1,489
    Likes Received:
    0
    no API but they do use Google Maps API
     
  4. kingtoad

    kingtoad OT Supporter

    Joined:
    Sep 2, 2003
    Messages:
    55,918
    Likes Received:
    10
    Location:
    Los Angeles
    It is likely you're going to have to go through the pain of writing a script that scrapes data then. Either than, or you manually extract the data from their site and put it into a document or flat file of your own to manage.

    Do they have a directory of some sort listing all the redbox locations in a certain area? If so, that could probably help you tremendously.
     
  5. airbball23

    airbball23 Rent this Space only $5/mnth

    Joined:
    Jan 13, 2007
    Messages:
    1,489
    Likes Received:
    0
    i hope not :( man i tried to do another thing. You can "add to fav" location. There were about 50 so i added them but when i try to copy the text it wont let me. It's in a "aspnetForm" so i'm guessing this is
    the restriction? anyway around this? I mean it's in text format - it's
    not an img - but i can't see the addresses in the "view source".
     
  6. airbball23

    airbball23 Rent this Space only $5/mnth

    Joined:
    Jan 13, 2007
    Messages:
    1,489
    Likes Received:
    0
  7. airbball23

    airbball23 Rent this Space only $5/mnth

    Joined:
    Jan 13, 2007
    Messages:
    1,489
    Likes Received:
    0
    i found a way around for anyone interested.

    i had to save the certain site onto my desktop then opened it up in dreamweaver but the text was still hidden BUT after double-clicking on the name anchor it showed the addresses in plaintext.
     
  8. Ricky

    Ricky █▄ █▄█ █▄ ▀█▄

    Joined:
    Jun 17, 2005
    Messages:
    38,767
    Likes Received:
    6
    you can do this in php

    i have a script somewhere that i've created (with the help of ot)
     
  9. Kevin

    Kevin New Member

    Joined:
    Aug 27, 2002
    Messages:
    87,634
    Likes Received:
    0
    Location:
    Michigan
    Any web framework can handle it. Its a matter of whether you have to parse HTML or just call a web service.
     
  10. Josh

    Josh Guest

    I briefly looked over their site.

    I do a lot of data mining and here's how I would approach it...

    First get a list of all the zip codes in your state since you're going to need to do a search for each. You may also be able to just get a single (1) zip code from each city since the search is a radius search.

    Either way, you need to start with the list of zips, then perform a query to their search page with each zip in your list. I'd ignore the map altogether and focus on the text side of things. Each search result had an id on the result div show as k##### so you can easily use their assigned ID to each kiosk in your database to avoid duplicates.

    What they are posting to the page looks like:

    {"latitude":44.7894335,"longitude":-122.7004253,"radius":50,"maxKiosks":50,"mcdOnly":false,"getInv":false,"__K":"UNKNOWN"}

    use firebug that will help you see whats going on in there.

    This is is not the easiest of sites to mine i will tell you that
     

Share This Page