tdm-teeft is a tdm module for terme exctraction of unstructured text. It can be used to get keywords of document.
Using npm :
$ npm i -g tdm-teeft
$ npm i --save tdm-teeftUsing Node :
/* require of Teeft module */
const Teeft = require('tdm-teeft');
/* Build new Instance of Tagger */
let tagger = new Teeft.Tagger();
/* Build new Instance of Filter */
let filter = new Teeft.Filter();
/* Build new Instance of Indexator */
let indexator = new Teeft.Indexator();
/* Build new Instance of TermExtraction */
let termextraction = new Teeft.TermExtraction();$ npm run test$ npm run docsKind: global class
- Filter
- new Filter([options])
- .call(occur, strength) ⇒
Boolean - .configure(length) ⇒
Number
Returns: Filter - - An instance of Filter
| Param | Type | Description |
|---|---|---|
| [options] | Object |
Options of constructor |
| [options.minOccur] | Number |
Number of minimal occurence |
| [options.noLimitStrength] | Number |
Strength limit |
| [options.lengthSteps] | Number |
Steps length |
Example (Example usage of 'contructor' (with paramters))
let options = {
// Will allow to assign a 'value' depending on the length of indexed text (nb of tokens)
'lengthSteps': {
'values': [ // store intermediate steps here,
{ // here : value '4' will be used for text length > 1000 tokens && text length <= 3000 tokens
'lim': 3000, // 'this property must be > 'lengthSteps.min.lim' && must be < 'lengthSteps.max.lim'
'value': 4
},
{ // here : value '5' will be used for text length > 3000 tokens && text length <= 4000 tokens
'lim': 4000, // 'this property must be > 'lengthSteps.min.lim' && must be < 'lengthSteps.max.lim'
'value': 5
}
],
'min': { // 'value' depending of minimum 'lim' length of text (here : value '1' will be used for text length <= 1000 tokens)
'lim': 1000,
'value': 1
},
'max': { // 'value' depending of maximum 'lim' length of text (here : value '7' will be used for text length > 6000 tokens)
'lim': 6000,
'value': 7
}
},
'minOccur': 3, // Minimal number of occurence (of tokens) used by default : here 3. This value will be updated depending on the length of indexed text when 'configure' function is called
'noLimitStrength': 2 //
},
defaultFilter = new Filter(options);
// returns an instance of Filter with properties :
// - minOccur : 3
// - noLimitStrength : 2
// - lengthSteps : {'values': [{'lim': 3000, 'value': 4}, {'lim': 4000, 'value': 5}], 'min': {'lim': 1000, 'value': 1}, 'max': {'lim': 6000, 'value': 7}Example (Example usage of 'contructor' (with default values))
let defaultFilter = new Filter();
// returns an instance of Filter with properties :
// - minOccur : 7
// - noLimitStrength : 2
// - lengthSteps : {'values': [{'lim': 3000, 'value': 4}], 'min': {'lim': 1000, 'value': 1}, 'max': {'lim': 6000, 'value': 7}Check values depending of filter conditions
Kind: instance method of Filter
Returns: Boolean - Return true if conditions are respected
| Param | Type | Description |
|---|---|---|
| occur | Number |
Occurence value |
| strength | Number |
Strength value |
Example (Example usage of 'call' function)
let defaultFilter = new Filter();
defaultFilter.configure(500);
defaultFilter.call(1, 1); // returns true
defaultFilter.configure(5000);
defaultFilter.call(1, 1); // returns falseConfigure the filter depending of lengthSteps
Kind: instance method of Filter
Returns: Number - Return configured minOccur value
| Param | Type | Description |
|---|---|---|
| length | Number |
Text length |
Example (Example usage of 'configure' function)
let defaultFilter = new Filter();
defaultFilter.configure(500); // returns 1
defaultFilter.configure(5000); // returns 7
defaultFilter.configure('test'); // returns nullKind: global class
- Indexator
- new Indexator([options])
- instance
- .tokenize(text) ⇒
Array - .translateTag(tag) ⇒
String - .sanitize(terms) ⇒
Array - .lemmatize(terms) ⇒
Array - .index(data) ⇒
Object
- .tokenize(text) ⇒
- static
- .compare(a, b) ⇒
Number
- .compare(a, b) ⇒
Returns: Indexator - - An instance of Indexator
| Param | Type | Description |
|---|---|---|
| [options] | Object |
Options of constructor |
| [options.filter] | Filter |
Options given to extractor of this instance of Indexator |
| [options.lexicon] | Object |
Lexicon used by tagger of this instance of Indexator |
| [options.stopwords] | Object |
Stopwords used by this instance of Indexator |
| [options.lemmatizer] | Object |
Lemmatizer used by tagger of this instance of Indexator |
| [options.stemmer] | Object |
Stemmer used by this instance of Indexator |
| [options.dictionary] | Object |
Dictionnary used by this instance of Indexator |
Example (Example usage of 'contructor' (with paramters))
let options = {
'filter': customFilter // According customFilter contain your custom settings
},
indexator = new Indexator(options);
// returns an instance of Indexator with custom FilterExample (Example usage of 'contructor' (with default values))
let indexator = new Indexator();
// returns an instance of Indexator with default optionsExtract token from a text
Kind: instance method of Indexator
Returns: Array - Array of tokens
| Param | Type | Description |
|---|---|---|
| text | String |
Fulltext |
Example (Example usage of 'tokenize' function)
let indexator = new Indexator();
indexator.tokenize('my sample sentence'); // return ['my', 'sample', 'sentence']Translate the tag of Tagger to Lemmatizer
Kind: instance method of Indexator
Returns: String - Tag who match with a Lemmatizer tag (or false)
| Param | Type | Description |
|---|---|---|
| tag | String |
Tag given by Tagger |
Example (Example usage of 'translateTag' function)
let indexator = new Indexator();
indexator.translateTag(RB); // return 'adv';
indexator.translateTag(JJ); // return 'adj';
indexator.translateTag(NN); // return 'noun';
indexator.translateTag(NNP); // return 'noun';
indexator.translateTag(VBG); // return 'verb';
indexator.translateTag(VBN); // return 'verb';Sanitize list of terms (with some filter)
Kind: instance method of Indexator
Returns: Array - Liste of sanitized terms
| Param | Type | Description |
|---|---|---|
| terms | Array |
List of terms |
Example (Example usage of 'sanitize' function)
let indexator = new Indexator();
indexator.sanitize([ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
{ term: 'is', tag: 'VBZ' },
{ term: 'a', tag: 'DT' },
{ term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
{ term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]);
// return [ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
// { term: '#', tag: '#' },
// { term: '#', tag: '#' },
// { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
// { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]Lemmatize a list of tagged terms (add a property lemma & stem)
Kind: instance method of Indexator
Returns: Array - List of tagged terms with a lemma
| Param | Type | Description |
|---|---|---|
| terms | Array |
List of tagged terms |
Example (Example usage of 'translateTag' function)
let indexator = new Indexator();
indexator.lemmatize([ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
{ term: 'is', tag: 'VBZ' },
{ term: 'a', tag: 'DT' },
{ term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
{ term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]);
// return [ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
// { term: '#', tag: '#' },
// { term: '#', tag: '#' },
// { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
// { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]Index a fulltext
Kind: instance method of Indexator
Returns: Object - Return a representation of fulltext (indexation & more informations/statistics about tokens/terms)
| Param | Type | Description |
|---|---|---|
| data | String |
Fulltext who need to be indexed |
Example (Example usage of 'translateTag' function)
let indexator = new Indexator();
indexator.index('This is a sample sentence'); // return an object representation of indexationCompare the specificity of two objects between them
Kind: static method of Indexator
Returns: Number - -1, 1, or 0
| Param | Type | Description |
|---|---|---|
| a | Object |
First object |
| b | Object |
Second object |
Example (Example usage of 'compare' function)
Indexator.compare({ 'term': 'a', 'specificity': 1 }, { 'term': 'b', 'specificity': 2 }); // return 1
Indexator.compare({ 'term': 'a', 'specificity': 1 }, { 'term': 'b', 'specificity': 1 }); // return 0
Indexator.compare({ 'term': 'a', 'specificity': 2 }, { 'term': 'b', 'specificity': 1 }); // return -1Kind: global class
- Tagger
- new Tagger([options])
- .tag(terms) ⇒
Array
Returns: Tagger - - An instance of Tagger
| Param | Type | Description |
|---|---|---|
| [options] | Object |
Options of constructor |
Example (Example usage of 'contructor' (with paramters))
let lexicon = { ... },
tagger = new Tagger(options);
// returns an instance of Tagger with custom lexionExample (Example usage of 'contructor' (with default values))
let tagger = new Tagger();
// returns an instance of Tagger with default lexionTag terms
Kind: instance method of Tagger
Returns: Array - List of tagged terms
| Param | Type | Description |
|---|---|---|
| terms | Array |
List of terms |
Example (Example usage of 'tag' function)
let tagger = new Tagger();
tagger.tag(['this', 'is', 'a', 'test']); // return [{ 'term': 'this', 'tag': 'DT' }, { 'term': 'is', 'tag': 'VBZ' }, { 'term': 'a', 'tag': 'DT' }, { 'term': 'test', 'tag': 'NN' }]Kind: global class
- TermExtractor
- new TermExtractor([options])
- .extract(taggedTerms) ⇒
Object - ._startsWith(str, prefix) ⇒
Boolean
Returns: TermExtractor - - An instance of TermExtractor
| Param | Type | Description |
|---|---|---|
| [options] | Object |
Options of constructor |
| [options.tagger] | Tagger |
An instance of Tagger |
| [options.filter] | Filter |
An instance of Filter |
Example (Example usage of 'contructor' (with paramters))
let myTagger = new Tagger(), // According myTagger contain your custom settings
myFilter = new Filter(), // According myFilter contain your custom settings
termExtractor = new TermExtractor({ 'tagger': myTagger, 'filter': myFilter });
// returns an instance of TermExtractor with custom optionsExample (Example usage of 'contructor' (with default values))
let termExtractor = new TermExtractor();
// returns an instance of TermExtractor with default optionsExtract temrs
Kind: instance method of TermExtractor
Returns: Object - Return all extracted terms
| Param | Type | Description |
|---|---|---|
| taggedTerms | Array |
List of tagged terms |
Example (Example usage of 'extract' function)
let termExtractor = new TermExtractor(),
myDefaultTagger = new Tagger(),
taggedTerms = myDefaultTagger.tag('This is a sample test for this module. It index any fulltext. It is a sample test.');
termExtractor.extract(taggedTerms);
// return
// { 'sample': { 'frequency': 2, 'strength': 1 }, 'test': { 'frequency': 2, 'strength': 1 },
// 'sample test': { 'frequency': 2, 'strength': 2 },
// 'module': { 'frequency': 1, 'strength': 1 },
// 'index': { 'frequency': 1, 'strength': 1 },
// 'fulltext': { 'frequency': 1, 'strength': 1 }
// };Check if prefix of given string match with given prefix
Kind: instance method of TermExtractor
Returns: Boolean - Return true if the prefix of the string is correct, else false
| Param | Type | Description |
|---|---|---|
| str | String |
String where the prefix will be searched |
| prefix | String |
Prefix used for the research |