parsing text file with regex (or other)

Discussion in 'OT Technology' started by babygodzilla, Jul 18, 2003.

  1. babygodzilla

    babygodzilla I love rice

    Joined:
    Nov 5, 2001
    Messages:
    3,108
    Likes Received:
    0
    i have a text file that has a pattern of something like ID#, Code, Name, Time, Room#. but once in a while not all of these variables come up in a line. so something like:

    please ignore underscores as spaces

    12345___ABC_____Daiba, Yu_____0710-1000_____145
    53154___BEC__________________TBA__________315
    ________TCR_____Kerr, Hum_____0200-0400_______
    21678___________Mama, Papa________________SAT1

    something like that. im sure u get the idea. what i'm doing is extracting the variables using regex and putting them in an array, and later in a database.

    the question is, is it possible to have just one expression to accommodate for all these possibilities? if not, what is the minimum number of regex i can have?

    THANKS
     
  2. samm

    samm Next in Line

    Joined:
    Dec 22, 2000
    Messages:
    2,630
    Likes Received:
    0
    Location:
    San Jose, CA
    why not just split each line on multiple spaces and pass the values into an array? Easy to do in perl:

    @line = split(/\s+/,$line);

    Not sure if that will work, I think the + qualifier means 1 or more spaces. None the less, if it's just simple spaces separating the data, you shouldn't need a regular expression.
     
  3. skinjob

    skinjob Active Member

    Joined:
    Jan 6, 2001
    Messages:
    2,337
    Likes Received:
    0
    Location:
    Aztlán
    This really isn't a problem that regexs were designed to solve. Since your data is in fixed width columns, it's really simple to parse using basic string manipulation. But, if you have to use regexs, something that matches column widths would be the easiest. If a column is 5 characters wide, then create a regex that matches any 5 characters including whitespace. Then, you can add logic to check if all characters were whitespace or not.
     
  4. babygodzilla

    babygodzilla I love rice

    Joined:
    Nov 5, 2001
    Messages:
    3,108
    Likes Received:
    0
    yes i tried to do that with regex, that is specifying the width. for some reason it doesnt work...


    did u say its simpler using basic string manips? how's that? my friend and i tried to do that but it gets confusing...


    im using PHP by the way
     
  5. skinjob

    skinjob Active Member

    Joined:
    Jan 6, 2001
    Messages:
    2,337
    Likes Received:
    0
    Location:
    Aztlán
    look up the substr() function.
     
  6. babygodzilla

    babygodzilla I love rice

    Joined:
    Nov 5, 2001
    Messages:
    3,108
    Likes Received:
    0
    thanks so much! i think this is very helpful. im gonna try it!
     
  7. 5Gen_Prelude

    5Gen_Prelude There might not be an "I" in the word "Team", but

    Joined:
    Mar 14, 2000
    Messages:
    14,519
    Likes Received:
    1
    Location:
    Vancouver, BC, CANADA
    Sounds like a lot of extra step to go from a text file to a database!
     
  8. babygodzilla

    babygodzilla I love rice

    Joined:
    Nov 5, 2001
    Messages:
    3,108
    Likes Received:
    0
    heheh :)

    is there something u can suggest to make the process easier? at least the parsing part.
     
  9. Astro

    Astro Code Monkey

    Joined:
    Mar 18, 2000
    Messages:
    2,047
    Likes Received:
    0
    Location:
    Cleveland Ohio
    mmm... file parsing. My favorite!

    Important question: Is this a one time shot or will you have to be converting this stuff more then once (or twice)?

    If its a one time shot (and MySQL is your DB):

    - Load into Excel (or whatever spreadsheet program) and export as a comma delimited file (make sure you only have the fields you want in there and nothing else). Also make sure the fields are in the same order as seen in your database
    - Issue a LOAD DATA INFILE "myfilename" INTO mytablename (you can issue this from SQL or from the mysql command prompt)

    If you're using something else then MySQL, there's probably a tool that will do something pretty similar (MS SQL has isql which MIGHT work).

    -- -- -- --

    If you need to do this more then once or your DB engine doesn't support the LOAD DATA INFILE trick, then you'll want to use the substr() trick. If you're smooth, you build a utility function that parses any field as long as you give it the data to parse, the start position, and the length of the field. This will make updating your code a little easier if the file layout changes.
     
  10. 5Gen_Prelude

    5Gen_Prelude There might not be an "I" in the word "Team", but

    Joined:
    Mar 14, 2000
    Messages:
    14,519
    Likes Received:
    1
    Location:
    Vancouver, BC, CANADA
    Okay well, one time shot, simply open up access, File->Get External Data->Import Data, choose the file, follow the wizard as a fixed field width. It will trim everything for you automatically.

    For batch runs, you can automate it using VBA.
     
  11. Astro

    Astro Code Monkey

    Joined:
    Mar 18, 2000
    Messages:
    2,047
    Likes Received:
    0
    Location:
    Cleveland Ohio
    Thats one way, but if it was me, I'd cut out the middle man (Excel + VBA) and just spend a few minutes to create an upload script to do all the work instead (with the right script, go to admin page -> browse for file -> upload -> done).
     
  12. 5Gen_Prelude

    5Gen_Prelude There might not be an "I" in the word "Team", but

    Joined:
    Mar 14, 2000
    Messages:
    14,519
    Likes Received:
    1
    Location:
    Vancouver, BC, CANADA
    Who said anything about Excel? It's one command line, you could even do it as a macro if the filename doesn't change. I just don't know what he's ultimately trying to accomplish here.
     
  13. Astro

    Astro Code Monkey

    Joined:
    Mar 18, 2000
    Messages:
    2,047
    Likes Received:
    0
    Location:
    Cleveland Ohio
    I keep forgetting VBA is command line driven and a more balls-y replacement of batch file coding. Yuck in my opinion (Perl or PHP would make for interesting command-line alternatives and both do well with string manipulations). Although all is pointless if the user doesn't know the language...
     
  14. babygodzilla

    babygodzilla I love rice

    Joined:
    Nov 5, 2001
    Messages:
    3,108
    Likes Received:
    0
    :slap: i never even thought of using Excel and such... :(

    im gonna try it rite now!

    btw yes it is a one time thing and im using mySQL
     

Share This Page