OpenTREP Logo  0.07.18
C++ Open Travel Request Parsing Library
Loading...
Searching...
No Matches
OPENTREP::Filter Struct Reference

Class filtering out the words not suitable for indexing and/or searching, when part of greater strings. Hence, most of the methods take as parameter the "initial"/greater string. More...

#include <opentrep/bom/Filter.hpp>

Static Public Member Functions

static void trim (std::string &ioPhrase, const NbOfLetters_T &iMinWordLength=4)
 
static bool shouldKeep (const std::string &iPhrase, const std::string &iWord)
 

Detailed Description

Class filtering out the words not suitable for indexing and/or searching, when part of greater strings. Hence, most of the methods take as parameter the "initial"/greater string.

For instance, words of length less than 3 (e.g., "de", "a", "san"), when part of greater strings (e.g., respectively, "rio de janeiro", "san francisco"), should not be indexed and searched for.

Definition at line 21 of file Filter.hpp.

Member Function Documentation

◆ trim()

void OPENTREP::Filter::trim ( std::string & ioPhrase,
const NbOfLetters_T & iMinWordLength = 4 )
static

Trim all the non-relevant words from the given phrase.

The following rules are applied to the right and left outer words, iteratively until no more outer word can be stripped out:

  • If the left or right outer word has no more than <iMinWordLength> letters (e.g., 'de', 'san'), it should be stripped out
  • If the left or right outer word is part of the "black-list" (e.g., 'airport', 'intl', 'international'), it should be filtered out
Parameters
std::string&The phrase to be amended (e.g., 'de san francisco', part of the 'aeroport de san francisco' global phrase).
constNbOfLetters_T& The minimum length of the words (default is 4 letters).

Definition at line 131 of file Filter.cpp.

References OPENTREP::createStringFromWordList(), OPENTREP::tokeniseStringIntoWordList(), and OPENTREP::trim().

Referenced by OPENTREP::Result::calculateCodeMatches().

◆ shouldKeep()

bool OPENTREP::Filter::shouldKeep ( const std::string & iPhrase,
const std::string & iWord )
static

State whether or not to keep the given word, as opposed to filter out a non-indexable/searchable word.

The following rules are applied in sequence (if a rule applies, then the method returns, and the other rules are not processed/checked):

  • When the word is equal to the phrase (e.g., 'san'), it should be kept (not filtered out), as it is obviously here intentionally
  • If the word has no more than 3 letters (e.g., 'de', 'san'), it should be filtered out
  • If the word is part of the "black-list" (e.g., 'airport', 'intl', 'international'), it should be filtered out
Parameters
conststd::string& The initial phrase (e.g., 'san francisco airport').
conststd::string& The word on which a decision has to be made
Returns
bool Whether or not the word should be kept / filtered out

Definition at line 144 of file Filter.cpp.

References OPENTREP::hasGoodSize(), and OPENTREP::isBlackListed().

Referenced by OPENTREP::addUnmatchedWord(), OPENTREP::Result::calculateCombinedWeights(), and OPENTREP::Result::fullTextMatch().


The documentation for this struct was generated from the following files: