Item description: “couldn’t,” (curly) search for: “couldn’t,” (curly) works. Item description: “couldn't,” (plain) search for: “couldn’t,” (curly) fails. Item description: “couldn’t,” (curly) search for: “couldn't,” (plain) works. Item description: “couldn't,” (plain) search for: “couldn't,” (plain) works. Looks like the stemming code in Lucene isn't aware of curly apostrophes.
Isn't in the stemming, rather it's in the tokenization, StandardTokenizer specifically. An apostrophe splits a word with a contraction into two words ("couldn't" -> "couldn" and "t"). A smart apostrophe seems to remain one word. (This needs to be verified, researching this problem on Google is what lead to this conclusion.) There is little to no information available on what to do about this, which is kind of stunning. It really may come down to just replacing the smart one with a regular one before the text is handed over to Lucene for any purposes.
Subclass EnglishAnalyzer. Override initReader(…) to add a CharFilter. Do something like MappingCharFilter that instead can replace contents of words, maybe based on regular expression matches? Match and replace contractions.
There may be something here, too. https://issues.apache.org/jira/browse/LUCENE-3884 http://lucene.apache.org/core/4_10_0/analyzers-common/org/apache/lucene/analysis/util/ElisionFilter.html