Skip to content

Post Processors

Giacomo Stelluti Scala edited this page Jan 3, 2020 · 20 revisions

Reason

A post processor service is a type that configured in a SearchContext will process a collection of ResultInfo producing a new one. It's a class of type PostProcessor that must override the following abstract method:

IEnumerable<ResultInfo> Process(IEnumerable<ResultInfo> results)

It's also mandatory to define a constructor that accepts a single object parameter, used for the settings of the specific post processor.

Built-In

PickAll comes with following built-in post processors, that chan be found in PickAll.PostProcessors namespace:

  • Uniqueness: removes duplicate results by URL
  • Order: orders results placing indexes of same number close by each other
  • FuzzyMatch: compares a string against results descriptions
  • Improve: improves results computing word frequency to perform a subsequent search
  • Wordify: reduces documents of results URLs to a collection of words
  • Textify: extract all text from documents of results URLs

FuzzyMatch

FuzzyMatch post processors computes (Levenshtein Distance)[https://en.wikipedia.org/wiki/Levenshtein_distance] between a given string and results descriptions. If the distance is out of the specified range, the result will be excluded. Is configured as follows:

var context = SearchContext.Default
                    .With<FuzzyMatch>(new FuzzyMatchSettings {
                        Text = options.FuzzyMatch,
                        MaximumDistance = 10 }); // MinimumDistance default is 0

Improve

Improve post processors reduces results descriptions to words, than computes the more frequent to use these in a subsequent search. Is configured as follows:

var context = SearchContext.Default
                    .With<Improve>(
                        new ImproveSettings {
                            WordCount = 2,
                            NoiseLength = 3});

In this case it will consider only first two more frequent words. All words with a length of 3 caracthers or less will be excluded from the computation.

Wordify and Textify

Wordify and Textify both extract all text from results URLs and are configured in a very similar way. It follows a configuration sample:

var context = SearchContext.Default
                    .With<Wordify>(
                        new WordifySettings {
                            IncludeTitle = true,
                            NoiseLength = 3}); // Textify doesn't support NoiseLength

Data is presented in different ways:

var result = results.First();
// Wordify
IEnumerable<string> words = (WordifyData)result.Data).Words;
// Textify
string text = (Textify)result.Data).Text;

You will prefer Textify when you need to process the text directly, since by default it doesn't sanitize text (property TextifySettings.SanitizeText). You will opt for Wordify if you're interested in sanitized (only alphanumeric) words only.

Clone this wiki locally