Another coding challenge? Or is it too challenging?

Discussion in 'OT Technology' started by Astro, Oct 19, 2003.

?

Your interest in this project

  1. I'm interested! When can I start?

    1 vote(s)
    33.3%
  2. I'm interested, but I don't know where to begin.

    0 vote(s)
    0.0%
  3. Nah. This has been done before. Its too complex for me. I have no time.

    1 vote(s)
    33.3%
  4. Dude! I thought the next challenge wasn't going to be a web app!

    1 vote(s)
    33.3%
  1. Astro

    Astro Code Monkey

    Joined:
    Mar 18, 2000
    Messages:
    2,047
    Likes Received:
    0
    Location:
    Cleveland Ohio
    Here's one I was thinking of tackling the other day and thought maybe there would be some folks here interested in taking a stab at it too. Its kind of specific and some folks may not have access to all the gear to put this together. So, if folks are interested, please leave feedback so I know how many might get into this... I'll also note this isn't really a new problem, so I don't know how excited folks we be about solving it.

    The problem: Using ONLY email, handle web URL requests and return the pages to the original requester's email address. Ideally, the format should be raw text - no HTML, but it would also be acceptable to prompt the user for their preference ahead of time for raw text or HTML flavor. From the time of the request to the time of the response should be carried out in a reasonable amount of time (2-10 minutes would be socialably acceptable).

    This problem has multiple issues.

    - How do you handle the requests
    - How do you collect the requested page, parse it (if necessary), and then return it
    - If the page has a link, will the user be able to specify it? (think of this as a bonus but not a requirement)
    - How do you handle forms and sending form data back? (think of this as a bonus but not a requirement)

    The angle I'm looking at this problem from is I have a cell phone. It sends and receives text messages via email. But I'm too cheap to pay for web access. Text messages also cost money, so keeping them short and simple would be very valuable. It would be great to pass a request for weather, sports scores, or other misc content. If you don't have a phone or a phone that can do this, thats ok - you can pretend by using another email account (so you'd need two email accounts - 1 for sending and 1 for receiving).

    I all ready have a collection of ideas on how to tackle this. I was curious if anyone out there would be interested in trying it out as well. Of course, its open to any language, any server technology, and any creative ideas you might have. Before actually starting this, lets see how many of you would be game to give it a whirl...
     
  2. Penguin Man

    Penguin Man Protect Your Digital Liberties

    Joined:
    Apr 27, 2002
    Messages:
    21,696
    Likes Received:
    0
    Location:
    Edmonton, AB
    I'd be interested (especially since it kinda relates to the stuff I'm doing right now), but I don't have time :hs:
     
  3. Astro

    Astro Code Monkey

    Joined:
    Mar 18, 2000
    Messages:
    2,047
    Likes Received:
    0
    Location:
    Cleveland Ohio
    Hmm... I'd predict about 1-2 hours would be needed to do this, but it wouldn't include the bonus items.
     
  4. drewski_amk

    drewski_amk Guest

    theoretically, all you would need is a linux box w/ perl. when a request comes into a certain user, extract the url, get the content of that page with perl, parse and strip it of html, and send it back to the same address. total time is less that 1 min.

    whether it would work or not is a different matter, hardest part would probably be configing sendmail for the auto response when a message is recieved.

    btw: there are already e-mail address that do this hosted by others, if you can find out what they are
     
  5. drewski_amk

    drewski_amk Guest

  6. Astro

    Astro Code Monkey

    Joined:
    Mar 18, 2000
    Messages:
    2,047
    Likes Received:
    0
    Location:
    Cleveland Ohio
    I haven't seen that before. Thanks. Anyone still want to give this challenge a whirl? I think there's still value in problem solving some of the issues I mentioned above...
     
  7. Leb_CRX

    Leb_CRX OT's resident terrorist

    Joined:
    Apr 22, 2001
    Messages:
    39,994
    Likes Received:
    0
    Location:
    Ottawa, Canada
    " Dude! I thought the next challenge wasn't going to be a web app! "

    :wtc: :wtc: :wtc: :wtc: :wtc: :wtc: :wtc:
     
  8. Astro

    Astro Code Monkey

    Joined:
    Mar 18, 2000
    Messages:
    2,047
    Likes Received:
    0
    Location:
    Cleveland Ohio
    I know... I know... I haven't been able to think of any non-web centric project ideas.
     
  9. Scoob_13

    Scoob_13 Anything is possible, but the odds are astronomica

    Joined:
    Oct 5, 2001
    Messages:
    73,792
    Likes Received:
    38
    Location:
    Fort Worth. Hooray cowgirls.

    "Create A.I."




    "Using VS.Net."


    "Go."
     
  10. Azn_Bok_Choy

    Azn_Bok_Choy Guest

    Show me the challenge!
     
  11. Astro

    Astro Code Monkey

    Joined:
    Mar 18, 2000
    Messages:
    2,047
    Likes Received:
    0
    Location:
    Cleveland Ohio
    Create AI? To do what? Turing test would be a great example. But to build from the ground up could be a project worthy of a PhD in computer science and/or psychology. Creating ANY form of AI would be a rather complex challenge. I'd be curious to hear more about what you're thinking. Can you come up with some projects that could be done in 16 hours or less?

    Using VS.Net - isn't Visual Studio just a GUI to the .NET coding suite? Um. The idea behind these challenges is to allow ANY tool and ANY language to be used. Example: I tell you to go from Cleveland Ohio to Miami Florida. I don't care how you do it, just do it. How you get from A to B doesn't matter. Just the excercise of getting there is the idea. Make sense? If you meant something completely different, let me know. I might have missed the idea behind your message.
     
  12. Astro

    Astro Code Monkey

    Joined:
    Mar 18, 2000
    Messages:
    2,047
    Likes Received:
    0
    Location:
    Cleveland Ohio
    For those that think this project is too difficult, I'd be happy to provide guidance. I doubt you'll find me spelling the entire project out, but think of me as a guide who would point you in the direct you would need to go (unless you really get stuck and frustration is kicking in).
     
  13. Scoob_13

    Scoob_13 Anything is possible, but the odds are astronomica

    Joined:
    Oct 5, 2001
    Messages:
    73,792
    Likes Received:
    38
    Location:
    Fort Worth. Hooray cowgirls.
    :rofl: Sorry, forgot to include the :rofl: smiley in that post :o

    It's an old joke from a professor of mine that I had for a couple of semesters, he simply updated "VB6" to "VS.Net" when it came out :o

    Whenever he had a kid that was 3|_33+ he would assign the kid the project of creating A.I. in VB (or .Net) so that the kid would shut up and study :o He'd ask for weekly updates as well :bigthumb:


    As for actually coding the A.I., a simple A.I. would probably be easiest (well, relatively) in C#, Java, or even C++. Anything that would require learning and whatnot would probably best be done in C or C++.
     
  14. Scoob_13

    Scoob_13 Anything is possible, but the odds are astronomica

    Joined:
    Oct 5, 2001
    Messages:
    73,792
    Likes Received:
    38
    Location:
    Fort Worth. Hooray cowgirls.

    It seems like an interesting project, and we actually created one like this so that we could get around web filtering at work, however I think you're asking for a stripped-down version of the webpages that you're looking for - that right there is pretty difficult since the backend would have to know what extraneous crap to filter out so that you can stay in your text message limit.

    I'll see if I can dig up the code for what we did (we had the pages simply emailed back to us in raw text while having them simultaneously posted to an in-house web server that we could connect to since it wasn't blocked), but I doubt it's going to fit what you're looking for.
     
  15. Leb_CRX

    Leb_CRX OT's resident terrorist

    Joined:
    Apr 22, 2001
    Messages:
    39,994
    Likes Received:
    0
    Location:
    Ottawa, Canada
    I might join you guys writing an application that based on webpage requests emails them...I know it's not the same thing, but I dont want to be left out :wtc:
     
  16. Astro

    Astro Code Monkey

    Joined:
    Mar 18, 2000
    Messages:
    2,047
    Likes Received:
    0
    Location:
    Cleveland Ohio
    It souns close enough - it would be a desktop app?
     
  17. Astro

    Astro Code Monkey

    Joined:
    Mar 18, 2000
    Messages:
    2,047
    Likes Received:
    0
    Location:
    Cleveland Ohio
    404 - Old Joke not found.


    :)
     
  18. Scoob_13

    Scoob_13 Anything is possible, but the odds are astronomica

    Joined:
    Oct 5, 2001
    Messages:
    73,792
    Likes Received:
    38
    Location:
    Fort Worth. Hooray cowgirls.

    I thought it was funny :wtc:


    Then again, I have a joke about a piece of string that walks into a bar that I find hillarious :o
     
  19. Astro

    Astro Code Monkey

    Joined:
    Mar 18, 2000
    Messages:
    2,047
    Likes Received:
    0
    Location:
    Cleveland Ohio
    Not to go too far off topic, but I'm not sure C/C++ would be the tools of choice for AI. In fact, I'd be willing to say a custom language would be needed (which I believe there are several if not many out there - all of which I've forgotten the names of). It reallly really depends on your design of how it would work - but these custom languages have tools and syntax already setup to handle the AI and learning work. If the design is simple, then yes, C/C++ as well as any other language (dare I even say VB) would work just as well.
     
  20. 5Gen_Prelude

    5Gen_Prelude There might not be an "I" in the word "Team", but

    Joined:
    Mar 14, 2000
    Messages:
    14,519
    Likes Received:
    1
    Location:
    Vancouver, BC, CANADA
    A piece of strong walks into a bar and asks for a drink. Bartender replies, "Sorry, we don't serve your kind here." The string walks outside and ties himself up and walks back in to the bar and asks for a drink. The bartended replies, "Didn't you just walk in here before?"

    The string replies, "'Fraid knot"
     
  21. Astro

    Astro Code Monkey

    Joined:
    Mar 18, 2000
    Messages:
    2,047
    Likes Received:
    0
    Location:
    Cleveland Ohio
    See, this is where the fun part of this project kicks in. How do you parse out the extra crap? And what IS the crap you want to filter out. I'm thinking of stripping out the HTML stuff, but I know of an algorithm to actually strip out non-essential text - and its not even considered an AI algorithm (cool, eh?). I know of two approaches - well, 3 approaches to do this. Picking the right tool will definitely directly effect which of the 3 approaches you decide on.

    Here's the 3 approaches (for those thinking about tackling this project, I suggest skipping over this to prevent tainting your ideas). For those who are looking for ideas or don't know where to start, then here's some ideas for you:

    Approach to filter HTML tags:

    1. Lynx. With the right command line syntax, you unlock the answer to why Lynx is still around.
    Hint: lynx http://www.rsbauer.com -dump -nolist
    Hint 2: lynx http://www.rsbauer.com -dump -nolist > myfiledump.html (this dumps to a file called myfiledump.html)
    A variation of Lynx would be to use wget. I hear its got some power to it as well. I haven't used it, but I would think it is capable of stripping out the HTML. Careful here if you are making this web based. This is a super huge security risk. Don't lock your server down or don't properly check the user's input will lead into a compromised server. Ask me how to secure this and I'll let you know.

    2. Treat the HTML as XML. This isn't bullet proof since the HTML standard and HTML coders both suck and are not usually XML friendly (unless you're working with a valid XHTML site). But its possible to persuade the XML parser to yank out a good collection of the HTML and a regular expression could be used to clean out the rest. Some might think a good regular expression will get the job done, but HTML does some quirky things that could easily throw the best reg. expression off - but feel free to take a crack at it. I know the HTML approach works well for ASP ("been there, done that, and got the t-shirt").

    3. (My personal favorite) PHP's strip_tags() function will do very nicely (and has been available since the PHP3 days). With one function, PHP will slice and dice your HTML content in a single string form to one without HTML or PHP tags. It doesn't do anything about the extra carriage returns so a regular expression may be needed to reduce the number of blank lines. Your mileage may vary.

    These are just some ideas. Of course, the ultimate would be to build your own parser.
     
  22. Astro

    Astro Code Monkey

    Joined:
    Mar 18, 2000
    Messages:
    2,047
    Likes Received:
    0
    Location:
    Cleveland Ohio
    Ok - here's the algorithm (in English) to parse out the extra text that is not considered important - example: ads, menus, copyright info, etc. (again, if you are doing this on your own and don't want to be tainted, skip over this)

    1. For the target site you want to grab content off of, pull down at least 3 pages.

    2. Compare pages. Find matching/like text blocks.

    3. Keep track of matching/like text blocks and remove them from the target page. This (when working) will remove nav menus (even if they're images - unless your HTML parser got to them first). This should also remove items that are static or semi static. But this depends on your search algorithm. Which I have some ideas on, but nothing too exciting.

    This sounds like a lot of work, but the result would be a really really trimmed down page. A good example of a site needing to have this done is cnn.com. They have a TON of junk up there, but you probably would want the main headline (or the first couple). So stripping out what would be the nav menus and junk down below the main headlines would be pretty useful. This algorithm was posted out on the net at some time. Not sure I could find it, but if I do, I'll post a link...
     
  23. Leb_CRX

    Leb_CRX OT's resident terrorist

    Joined:
    Apr 22, 2001
    Messages:
    39,994
    Likes Received:
    0
    Location:
    Ottawa, Canada
    yep C#..

    nothing too simple, nor too fancy, just something I will do to get my feet wet again
     
  24. Leb_CRX

    Leb_CRX OT's resident terrorist

    Joined:
    Apr 22, 2001
    Messages:
    39,994
    Likes Received:
    0
    Location:
    Ottawa, Canada
    oh ya and I got code at home that rips off the HTML stuff from a webpage...but I will attempt to make some myself...since reusing someone else's code is bad for learning
     
  25. Astro

    Astro Code Monkey

    Joined:
    Mar 18, 2000
    Messages:
    2,047
    Likes Received:
    0
    Location:
    Cleveland Ohio
    It depends. If its something simple, then rewriting isn't too bad since you can learn from it. But if its a library, then I'd probably just run with the library. Basically, if you were being paid to build this, what would you do? - keeping in mind time, quality, and useability. Either way, the goal is to make it work. :)
     

Share This Page