tdm-teeft

tdm-teeft is a tdm module for terme exctraction of unstructured text. It can be used to get keywords of document.

Installation

Using npm :

$ npm i -g tdm-teeft
$ npm i --save tdm-teeft

Using Node :

/* require of Teeft module */
const Teeft = require('tdm-teeft');

/* Build new Instance of Tagger */
let tagger = new Teeft.Tagger();

/* Build new Instance of Filter */
let filter = new Teeft.Filter();

/* Build new Instance of Indexator */
let indexator = new Teeft.Indexator();

/* Build new Instance of TermExtraction */
let termextraction = new Teeft.TermExtraction();

Launch tests

$ npm run test

Build documentation

$ npm run docs

API Documentation

Classes

Filter
Indexator
Tagger
TermExtractor

Filter

Kind: global class

Filter
- new Filter([options])
- .call(occur, strength) ⇒ Boolean
- .configure(length) ⇒ Number

new Filter([options])

Returns: Filter - - An instance of Filter

Param	Type	Description
[options]	`Object`	Options of constructor
[options.minOccur]	`Number`	Number of minimal occurence
[options.noLimitStrength]	`Number`	Strength limit
[options.lengthSteps]	`Number`	Steps length

Example (Example usage of 'contructor' (with paramters))

let options = {
  // Will allow to assign a 'value' depending on the length of indexed text (nb of tokens)
  'lengthSteps': {
    'values': [ // store intermediate steps here,
      { // here : value '4' will be used for text length > 1000 tokens && text length <= 3000 tokens
        'lim': 3000, // 'this property must be > 'lengthSteps.min.lim' && must be < 'lengthSteps.max.lim'
        'value': 4
      },
      { // here : value '5' will be used for text length > 3000 tokens && text length <= 4000 tokens
        'lim': 4000, // 'this property must be > 'lengthSteps.min.lim' && must be < 'lengthSteps.max.lim'
        'value': 5
      }
    ],
    'min': { // 'value' depending of minimum 'lim' length of text (here : value '1' will be used for text length <= 1000 tokens)
      'lim': 1000,
      'value': 1
    },
    'max': { // 'value' depending of maximum 'lim' length of text (here : value '7' will be used for text length > 6000 tokens)
      'lim': 6000,
      'value': 7
    }
  },
  'minOccur': 3, // Minimal number of occurence (of tokens) used by default : here 3. This value will be updated depending on the length of indexed text when 'configure' function is called
  'noLimitStrength': 2 //
  },
  defaultFilter = new Filter(options);
// returns an instance of Filter with properties :
// - minOccur : 3
// - noLimitStrength : 2
// - lengthSteps : {'values': [{'lim': 3000, 'value': 4}, {'lim': 4000, 'value': 5}], 'min': {'lim': 1000, 'value': 1}, 'max': {'lim': 6000, 'value': 7}

Example (Example usage of 'contructor' (with default values))

let defaultFilter = new Filter();
// returns an instance of Filter with properties :
// - minOccur : 7
// - noLimitStrength : 2
// - lengthSteps : {'values': [{'lim': 3000, 'value': 4}], 'min': {'lim': 1000, 'value': 1}, 'max': {'lim': 6000, 'value': 7}

filter.call(occur, strength) ⇒ `Boolean`

Check values depending of filter conditions

Kind: instance method of Filter
Returns: Boolean - Return true if conditions are respected

Param	Type	Description
occur	`Number`	Occurence value
strength	`Number`	Strength value

Example (Example usage of 'call' function)

let defaultFilter = new Filter();
defaultFilter.configure(500);
defaultFilter.call(1, 1); // returns true
defaultFilter.configure(5000);
defaultFilter.call(1, 1); // returns false

filter.configure(length) ⇒ `Number`

Configure the filter depending of lengthSteps

Kind: instance method of Filter
Returns: Number - Return configured minOccur value

Param	Type	Description
length	`Number`	Text length

Example (Example usage of 'configure' function)

let defaultFilter = new Filter();
defaultFilter.configure(500); // returns 1
defaultFilter.configure(5000); // returns 7
defaultFilter.configure('test'); // returns null

Indexator

Kind: global class

Indexator
- new Indexator([options])
- instance
  - .tokenize(text) ⇒ Array
  - .translateTag(tag) ⇒ String
  - .sanitize(terms) ⇒ Array
  - .lemmatize(terms) ⇒ Array
  - .index(data) ⇒ Object
- static
  - .compare(a, b) ⇒ Number

new Indexator([options])

Returns: Indexator - - An instance of Indexator

Param	Type	Description
[options]	`Object`	Options of constructor
[options.filter]	`Filter`	Options given to extractor of this instance of Indexator
[options.lexicon]	`Object`	Lexicon used by tagger of this instance of Indexator
[options.stopwords]	`Object`	Stopwords used by this instance of Indexator
[options.lemmatizer]	`Object`	Lemmatizer used by tagger of this instance of Indexator
[options.stemmer]	`Object`	Stemmer used by this instance of Indexator
[options.dictionary]	`Object`	Dictionnary used by this instance of Indexator

Example (Example usage of 'contructor' (with paramters))

let options = {
    'filter': customFilter // According customFilter contain your custom settings
  },
  indexator = new Indexator(options);
// returns an instance of Indexator with custom Filter

Example (Example usage of 'contructor' (with default values))

let indexator = new Indexator();
// returns an instance of Indexator with default options

indexator.tokenize(text) ⇒ `Array`

Extract token from a text

Kind: instance method of Indexator
Returns: Array - Array of tokens

Param	Type	Description
text	`String`	Fulltext

Example (Example usage of 'tokenize' function)

let indexator = new Indexator();
indexator.tokenize('my sample sentence'); // return ['my', 'sample', 'sentence']

indexator.translateTag(tag) ⇒ `String`

Translate the tag of Tagger to Lemmatizer

Kind: instance method of Indexator
Returns: String - Tag who match with a Lemmatizer tag (or false)

Param	Type	Description
tag	`String`	Tag given by Tagger

Example (Example usage of 'translateTag' function)

let indexator = new Indexator();
indexator.translateTag(RB); // return 'adv';
indexator.translateTag(JJ); // return 'adj';
indexator.translateTag(NN); // return 'noun';
indexator.translateTag(NNP); // return 'noun';
indexator.translateTag(VBG); // return 'verb';
indexator.translateTag(VBN); // return 'verb';

indexator.sanitize(terms) ⇒ `Array`

Sanitize list of terms (with some filter)

Kind: instance method of Indexator
Returns: Array - Liste of sanitized terms

Param	Type	Description
terms	`Array`	List of terms

Example (Example usage of 'sanitize' function)

let indexator = new Indexator();
indexator.sanitize([ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
  { term: 'is', tag: 'VBZ' },
  { term: 'a', tag: 'DT' },
  { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
  { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]);
// return [ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
//   { term: '#', tag: '#' },
//   { term: '#', tag: '#' },
//   { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
//   { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]

indexator.lemmatize(terms) ⇒ `Array`

Lemmatize a list of tagged terms (add a property lemma & stem)

Kind: instance method of Indexator
Returns: Array - List of tagged terms with a lemma

Param	Type	Description
terms	`Array`	List of tagged terms

Example (Example usage of 'translateTag' function)

let indexator = new Indexator();
indexator.lemmatize([ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
  { term: 'is', tag: 'VBZ' },
  { term: 'a', tag: 'DT' },
  { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
  { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]);
// return [ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
//   { term: '#', tag: '#' },
//   { term: '#', tag: '#' },
//   { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
//   { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]

indexator.index(data) ⇒ `Object`

Index a fulltext

Kind: instance method of Indexator
Returns: Object - Return a representation of fulltext (indexation & more informations/statistics about tokens/terms)

Param	Type	Description
data	`String`	Fulltext who need to be indexed

Example (Example usage of 'translateTag' function)

let indexator = new Indexator();
indexator.index('This is a sample sentence'); // return an object representation of indexation

Indexator.compare(a, b) ⇒ `Number`

Compare the specificity of two objects between them

Kind: static method of Indexator
Returns: Number - -1, 1, or 0

Param	Type	Description
a	`Object`	First object
b	`Object`	Second object

Example (Example usage of 'compare' function)

Indexator.compare({ 'term': 'a', 'specificity': 1 }, { 'term': 'b', 'specificity': 2 }); // return 1
Indexator.compare({ 'term': 'a', 'specificity': 1 }, { 'term': 'b', 'specificity': 1 }); // return 0
Indexator.compare({ 'term': 'a', 'specificity': 2 }, { 'term': 'b', 'specificity': 1 }); // return -1

Tagger

Kind: global class

Tagger
- new Tagger([options])
- .tag(terms) ⇒ Array

new Tagger([options])

Returns: Tagger - - An instance of Tagger

Param	Type	Description
[options]	`Object`	Options of constructor

Example (Example usage of 'contructor' (with paramters))

let lexicon = { ... },
  tagger = new Tagger(options);
// returns an instance of Tagger with custom lexion

Example (Example usage of 'contructor' (with default values))

let tagger = new Tagger();
// returns an instance of Tagger with default lexion

tagger.tag(terms) ⇒ `Array`

Tag terms

Kind: instance method of Tagger
Returns: Array - List of tagged terms

Param	Type	Description
terms	`Array`	List of terms

Example (Example usage of 'tag' function)

let tagger = new Tagger();
tagger.tag(['this', 'is', 'a', 'test']); // return [{ 'term': 'this', 'tag': 'DT' }, { 'term': 'is', 'tag': 'VBZ' }, { 'term': 'a', 'tag': 'DT' }, { 'term': 'test', 'tag': 'NN' }]

TermExtractor

Kind: global class

TermExtractor
- new TermExtractor([options])
- .extract(taggedTerms) ⇒ Object
- ._startsWith(str, prefix) ⇒ Boolean

new TermExtractor([options])

Returns: TermExtractor - - An instance of TermExtractor

Param	Type	Description
[options]	`Object`	Options of constructor
[options.tagger]	`Tagger`	An instance of Tagger
[options.filter]	`Filter`	An instance of Filter

Example (Example usage of 'contructor' (with paramters))

let myTagger = new Tagger(), // According myTagger contain your custom settings
  myFilter = new Filter(), // According myFilter contain your custom settings
  termExtractor = new TermExtractor({ 'tagger': myTagger, 'filter': myFilter });
// returns an instance of TermExtractor with custom options

Example (Example usage of 'contructor' (with default values))

let termExtractor = new TermExtractor();
// returns an instance of TermExtractor with default options

termExtractor.extract(taggedTerms) ⇒ `Object`

Extract temrs

Kind: instance method of TermExtractor
Returns: Object - Return all extracted terms

Param	Type	Description
taggedTerms	`Array`	List of tagged terms

Example (Example usage of 'extract' function)

let termExtractor = new TermExtractor(),
  myDefaultTagger = new Tagger(),
  taggedTerms = myDefaultTagger.tag('This is a sample test for this module. It index any fulltext. It is a sample test.');
termExtractor.extract(taggedTerms);
// return
// { 'sample': { 'frequency': 2, 'strength': 1 }, 'test': { 'frequency': 2, 'strength': 1 },
// 'sample test': { 'frequency': 2, 'strength': 2 },
// 'module': { 'frequency': 1, 'strength': 1 },
// 'index': { 'frequency': 1, 'strength': 1 },
// 'fulltext': { 'frequency': 1, 'strength': 1 }
// };

termExtractor._startsWith(str, prefix) ⇒ `Boolean`

Check if prefix of given string match with given prefix

Kind: instance method of TermExtractor
Returns: Boolean - Return true if the prefix of the string is correct, else false

Param	Type	Description
str	`String`	String where the prefix will be searched
prefix	`String`	Prefix used for the research

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
docs		docs
src		src
test		test
.gitignore		.gitignore
.jsdoc.conf.json		.jsdoc.conf.json
.prettierrc.json		.prettierrc.json
README.md		README.md
index.js		index.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

tdm-teeft

Installation

Launch tests

Build documentation

API Documentation

Classes

Filter

new Filter([options])

filter.call(occur, strength) ⇒ `Boolean`

filter.configure(length) ⇒ `Number`

Indexator

new Indexator([options])

indexator.tokenize(text) ⇒ `Array`

indexator.translateTag(tag) ⇒ `String`

indexator.sanitize(terms) ⇒ `Array`

indexator.lemmatize(terms) ⇒ `Array`

indexator.index(data) ⇒ `Object`

Indexator.compare(a, b) ⇒ `Number`

Tagger

new Tagger([options])

tagger.tag(terms) ⇒ `Array`

TermExtractor

new TermExtractor([options])

termExtractor.extract(taggedTerms) ⇒ `Object`

termExtractor._startsWith(str, prefix) ⇒ `Boolean`

About

Uh oh!

Releases

Packages

Uh oh!

Languages

NicolasKieffer/tdm-teeft

Folders and files

Latest commit

History

Repository files navigation

tdm-teeft

Installation

Launch tests

Build documentation

API Documentation

Classes

Filter

new Filter([options])

filter.call(occur, strength) ⇒ Boolean

filter.configure(length) ⇒ Number

Indexator

new Indexator([options])

indexator.tokenize(text) ⇒ Array

indexator.translateTag(tag) ⇒ String

indexator.sanitize(terms) ⇒ Array

indexator.lemmatize(terms) ⇒ Array

indexator.index(data) ⇒ Object

Indexator.compare(a, b) ⇒ Number

Tagger

new Tagger([options])

tagger.tag(terms) ⇒ Array

TermExtractor

new TermExtractor([options])

termExtractor.extract(taggedTerms) ⇒ Object

termExtractor._startsWith(str, prefix) ⇒ Boolean

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

filter.call(occur, strength) ⇒ `Boolean`

filter.configure(length) ⇒ `Number`

indexator.tokenize(text) ⇒ `Array`

indexator.translateTag(tag) ⇒ `String`

indexator.sanitize(terms) ⇒ `Array`

indexator.lemmatize(terms) ⇒ `Array`

indexator.index(data) ⇒ `Object`

Indexator.compare(a, b) ⇒ `Number`

tagger.tag(terms) ⇒ `Array`

termExtractor.extract(taggedTerms) ⇒ `Object`

termExtractor._startsWith(str, prefix) ⇒ `Boolean`

Packages