Sunday, April 3, 2011

How do I sort Lucene results by field value using a HitCollector?

I'm using the following code to execute a query in Lucene.Net

var collector = new GroupingHitCollector(searcher.GetIndexReader());
searcher.Search(myQuery, collector);
resultsCount = collector.Hits.Count;

How do I sort these search results based on a field?


Thanks for your answer. I had tried using TopFieldDocCollector but I got an error saying, "value is too small or too large" when i passed 5000 as numHits argument value. Please suggest a valid value to pass.

From stackoverflow
  • The method will accept a search.Sort parameter, which can be constructed as simply as:

    new Sort("my_sort_field")

    However, there are some limitations on which fields can be sorted on - they need to be indexed but not tokenized, and the values convertible to Strings, Floats or Integers.

    Lucene in Action covers all of the details, as well as sorting my multiple fields and so on.

    erickson : One key point: the field to be sorted must be indexed.
    Alabaster Codify : ...and not tokenized - beat me to it :)
  • thanks for your input,but I need to use collector object(instead of using hits) and I dont see any overloaded method which returns a collector object as well as sort it on a field.

    erickson : That's easy to do as well. Maybe if you accept answers for some of the other questions you've already asked, an answer will appear.
    itsadok : @erickson: Well, he only asked 2 questions so far, and the other one is really general. @unknown: Consider choosing a better user name. Also, you should edit you question with clarifications instead of posting an "Answer".
    erickson : He/she has asked at least fifteen questions so far, with no accepted answers. Look at the gravatars, which are a hash of the identity.
    itsadok : OK, now you got me all curious. How do you track someone's questions through their gravatar?
    erickson : Delete this "answer". Edit your question to provide additional information as needed.
  • What you're looking for is probably TopFieldDocCollector. Use it instead of the GroupingHitCollector (what is that?), or inside it.

    Comment on this if you need more info. I'll be happy to help.

    Ed : thanks for your answer......... I had tried using TopFieldDocCollector but i got an error saying "value is too small or too large" when i passed 5000 as numHits argument value...please suggest a valid value to pass...
  • thanks for your answer......... I had tried using TopFieldDocCollector but i got an error saying "value is too small or too large" when i passed 5000 as numHits argument value...please suggest a valid value to pass...


    erickson : Delete this "answer". Updates to your question should be provided by editing (as I've done for you).
  • In the original (Java) version of Lucene, there is no hard restriction on the size of the the TopFieldDocCollector results. Any number greater than zero is accepted. Although memory constraints and performance degradation create a practical limit that depends on your environment, 5000 hits is trivial and shouldn't pose a problem outside of a mobile device.

    Perhaps in porting Lucene, TopFieldDocCollector was modified to use something other than Lucene's "heap" implementation (called PriorityQueue, extended by FieldSortedHitQueue)—something that imposes an unreasonably small limit on the results size. If so, you might want to look at the source code for TopFieldDocCollector, and implement your own similar hit collector using a better heap implementation.

    I have to ask, however, why are you trying to collect 5000 results? No user in an interactive application is going to want to see that many. I figure that users willing to look at 200 results are rare, but double it to 400 just as factor of safety. Depending on the application, limiting the result size can hamper malicious screen scrapers and mitigate denial-of-service attacks too.


Post a Comment