Lucene's StandardAnalyzer removes dots from string/acronyms when indexing it. I want Lucene to retain dots and hence I'm using WhitespaceAnalyzer class.
I can give my list of stop words to StandardAnalyzer...but how do i give it to WhitespaceAnalyzer?
Thanks for reading.
From stackoverflow
-
Create your own analyzer by extending WhiteSpaceAnalyzer and override tokenStream method as follows.
public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result = super.tokenStream(fieldName, reader); result = new StopFilter(result, stopSet); return result; }
Here the stopSet is the Set of stop words, which you could get by adding a constructor to your analyzer which accepts a list of stop words.
You may also wish to override reusableTokenStream() method in similar fashion if you plan to reuse the TokenStream.
Steve Chapman : could you please have a loot at my answer and comment: http://stackoverflow.com/questions/899542/problem-using-same-instance-of-indexsearcher-for-multiple-requests/1014501#1014501
0 comments:
Post a Comment