Lucene anyone?

Discussion in 'OT Technology' started by Gonrad, Dec 3, 2007.

  1. Gonrad

    Gonrad OT Supporter

    Joined:
    Nov 15, 2003
    Messages:
    17,422
    Likes Received:
    0
    Location:
    Toronto
    http://lucene.apache.org/java/docs/

    I'm using a .NET port and am fiding it a bit slow (or maybe I'm doing something wrong?).

    I'm indexing 300K datarows and it's doing about 2000 or so in about 1min.

    Any alternatives?



    Jist of my code:

    Code:
    While reader.read
       indextext = ""
       xmltext = ""
    
       For each column in reader.columns
         
         value = reader.row(column.name)
         colunnName = column.name
    
         indextext &= value & " "
    
         xmltext &= "<" & colunnName & ">" & value & "</" &colunnName & ">"
       end
    
       WriteXMLfile(xmltext)
       AddToIndex(indextext)
    End while
    
     
  2. whup

    whup I wish you had children and.. so that I could step

    Joined:
    Feb 12, 2007
    Messages:
    1,603
    Likes Received:
    0
    I don't think you'll find a better alternative than Lucene.

    You need to identify what's really slowing the operation up - I recommend using a profiler, for example JetBrains dotTrace.

    For one, you should use a StringBuilder (initialized to a very large size so that it doesn't have to be dynamically redimensioned) to build up that XML text, rather than using string operations. I think that will be taking a significant amount of time.

    But yeah, please run a profiler or something to see where the performance is lacking the most. If the AddToIndex method is taking too long, you should post that code too.
     
  3. whup

    whup I wish you had children and.. so that I could step

    Joined:
    Feb 12, 2007
    Messages:
    1,603
    Likes Received:
    0
    Also why are you trying to read the data from the database, iterate over the column set for every row, spit that out into xml, then index the xml? You should be able to cut the XML out anyway, adding Documents straight into the index. I'm not sure what you're doing in AddToIndex.

    How fast does this operation need to run and why? How often?
     

Share This Page