Skip to content

Post Processors

Giacomo Stelluti Scala edited this page Jan 3, 2020 · 20 revisions

Reason

A post processor service is a type that configured in a SearchContext will process a collection of ResultInfo producing a new one. It's a class of type PostProcessor that must override the following abstract method:

IEnumerable<ResultInfo> Process(IEnumerable<ResultInfo> results)

It's also mandatory to define a constructor that accepts a single object parameter, used for the settings of the specific post processor.

Built-In

PickAll comes with following built-in post processors, that chan be found in PickAll.PostProcessors namespace:

  • Uniqueness: removes duplicate results by URL
  • Order: orders results placing indexes of same number close by each other
  • FuzzyMatch: compares a string against results descriptions
  • Improve: improves results computing word frequency to perform a subsequent search
  • Wordify: reduces documents of results URLs to a collection of words
  • Textify: extract all text from documents of results URLs

FuzzyMatch

FuzzyMatch post processors computes (Levenshtein Distance)[https://en.wikipedia.org/wiki/Levenshtein_distance] between a given string and results descriptions. If the distance is out of the specified range, the result will be excluded. Is configured as follows:

var context = SearchContext.Default
                    .With<FuzzyMatch>(new FuzzyMatchSettings {
                        Text = options.FuzzyMatch,
                        MaximumDistance = 10 }); // MinimumDistance default is 0

Improve

Improve post processors reduces results descriptions to words, than computes the more frequent to use these in a subsequent search. Is configured as follows:

var context = SearchContext.Default
                    .With<Improve>(
                        new ImproveSettings {
                            WordCount = 2,
                            NoiseLength = 3});

In this case it will consider only first two more frequent words. All words with a length of 3 caracthers or less will be excluded from the computation.

Wordify and Textify

Wordify and Textify both extract all text from results URLs and are configured in a very similar way. It follows a configuration sample:

var context = SearchContext.Default
                    .With<Wordify>(
                        new WordifySettings {
                            IncludeTitle = true,
                            NoiseLength = 3}); // Textify doesn't support NoiseLength
Clone this wiki locally