-
Notifications
You must be signed in to change notification settings - Fork 3
Post Processors
A post processor service is a type that configured in a SearchContext
will process a collection of ResultInfo
producing a new one. It's a class of type PostProcessor
that must override the following abstract method:
IEnumerable<ResultInfo> Process(IEnumerable<ResultInfo> results)
It's also mandatory to define a constructor that accepts a single object
parameter, used for the settings of the specific post processor.
PickAll comes with following built-in post processors, that chan be found in PickAll.PostProcessors
namespace:
- Uniqueness: removes duplicate results by URL
- Order: orders results placing indexes of same number close by each other
- FuzzyMatch: compares a string against results descriptions
- Improve: improves results computing word frequency to perform a subsequent search
- Wordify: reduces documents of results URLs to a collection of words
- Textify: extract all text from documents of results URLs
FuzzyMatch post processors computes (Levenshtein Distance)[https://en.wikipedia.org/wiki/Levenshtein_distance] between a given string and results descriptions. If the distance is out of the specified range, the result will be excluded. Is configured as follows:
var context = SearchContext.Default
.With<FuzzyMatch>(new FuzzyMatchSettings {
Text = options.FuzzyMatch,
MaximumDistance = 10 }); // MinimumDistance default is 0
Improve post processors reduces results descriptions to words, than computes the more frequent to use these in a subsequent search. Is configured as follows:
var context = SearchContext.Default
.With<Improve>(
new ImproveSettings {
WordCount = 2,
NoiseLength = 3});
In this case it will consider only first two more frequent words. All words with a length of 3 caracthers or less will be excluded from the computation.
Wordify and Textify both extract all text from results URLs and are configured in a very similar way. It follows a configuration sample:
var context = SearchContext.Default
.With<Wordify>(
new WordifySettings {
IncludeTitle = true,
NoiseLength = 3}); // Textify doesn't support NoiseLength
Data is presented in different ways:
var result = results.First();
// Wordify
IEnumerable<string> words = (WordifyData)result.Data).Words;
// Textify
string text = (Textify)result.Data).Text;
You will prefer Textify
when you need to process the text directly, since by default it doesn't sanitize text (property TextifySettings.SanitizeText
). You will opt for Wordify
if you're interested in sanitized (only alphanumeric) words only.