Skip to content

Commit 2a43603

Browse files
author
Arthur Zakirov
committed
Do not broke words with numbers and letters, divided by hyphen
1 parent 399cc98 commit 2a43603

File tree

4 files changed

+38
-22
lines changed

4 files changed

+38
-22
lines changed

Diff for: README.md

+21-21
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,27 @@
33
## Introduction
44

55
The **pg_tsparser** module is the modified default text search parser from
6-
PostgreSQL 9.6.
7-
The difference between **tsparser** and **default** parsers is that **tsparser**
8-
gives also unbroken words by underscore character.
6+
PostgreSQL 9.6. The differences are:
7+
* **tsparser** gives unbroken words by underscore character
8+
* **tsparser** gives unbroken words with numbers and letters by hyphen character
9+
10+
For example:
11+
12+
```sql
13+
SELECT to_tsvector('english', 'pg_trgm') as def_parser,
14+
to_tsvector('english_ts', 'pg_trgm') as new_parser;
15+
def_parser | new_parser
16+
-----------------+-----------------------------
17+
'pg':1 'trgm':2 | 'pg':2 'pg_trgm':1 'trgm':3
18+
(1 row)
19+
20+
SELECT to_tsvector('english', '123-abc') as def_parser,
21+
to_tsvector('english_ts', '123-abc') as new_parser;
22+
def_parser | new_parser
23+
-----------------+-----------------------------
24+
'123':1 'abc':2 | '123':2 '123-abc':1 'abc':3
25+
(1 row)
26+
```
927

1028
## License
1129

@@ -40,21 +58,3 @@ ALTER TEXT SEARCH CONFIGURATION english_ts
4058
word, hword, hword_part
4159
WITH english_stem;
4260
```
43-
44-
## Examples
45-
46-
Example of difference between **tsparser** and **default**:
47-
48-
```sql
49-
SELECT to_tsvector('english_ts', 'pg_trgm');
50-
to_tsvector
51-
-----------------------------
52-
'pg':2 'pg_trgm':1 'trgm':3
53-
(1 row)
54-
55-
SELECT to_tsvector('english', 'pg_trgm');
56-
to_tsvector
57-
-----------------
58-
'pg':1 'trgm':2
59-
(1 row)
60-
```

Diff for: expected/pg_tsparser.out

+12
Original file line numberDiff line numberDiff line change
@@ -193,3 +193,15 @@ SELECT to_tsvector('english_ts', 'pg_trgm');
193193
'pg':2 'pg_trgm':1 'trgm':3
194194
(1 row)
195195

196+
SELECT to_tsvector('english_ts', '12_abc');
197+
to_tsvector
198+
---------------------------
199+
'12':2 '12_abc':1 'abc':3
200+
(1 row)
201+
202+
SELECT to_tsvector('english_ts', '12-abc');
203+
to_tsvector
204+
---------------------------
205+
'12':2 '12-abc':1 'abc':3
206+
(1 row)
207+

Diff for: sql/pg_tsparser.sql

+2
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,5 @@ ALTER TEXT SEARCH CONFIGURATION english_ts
2222
WITH english_stem;
2323

2424
SELECT to_tsvector('english_ts', 'pg_trgm');
25+
SELECT to_tsvector('english_ts', '12_abc');
26+
SELECT to_tsvector('english_ts', '12-abc');

Diff for: tsparser.c

+3-1
Original file line numberDiff line numberDiff line change
@@ -1132,6 +1132,8 @@ static const TParserStateActionItem actionTPS_InUnsignedInt[] = {
11321132
{p_iseqC, '-', A_PUSH, TPS_InHostFirstAN, 0, NULL},
11331133
{p_iseqC, '_', A_PUSH, TPS_InHostFirstAN, 0, NULL},
11341134
{p_iseqC, '@', A_PUSH, TPS_InEmail, 0, NULL},
1135+
{p_iseqC, '-', A_PUSH, TPS_InHyphenNumWordFirst, 0, NULL},
1136+
{p_iseqC, '_', A_PUSH, TPS_InHyphenNumWordFirst, 0, NULL},
11351137
{p_isasclet, 0, A_PUSH, TPS_InHost, 0, NULL},
11361138
{p_isalpha, 0, A_NEXT, TPS_InNumWord, 0, NULL},
11371139
{p_isspecial, 0, A_NEXT, TPS_InNumWord, 0, NULL},
@@ -1658,7 +1660,7 @@ static const TParserStateActionItem actionTPS_InParseHyphen[] = {
16581660
{p_isEOF, 0, A_RERUN, TPS_Base, 0, NULL},
16591661
{p_isasclet, 0, A_NEXT, TPS_InHyphenAsciiWordPart, 0, NULL},
16601662
{p_isalpha, 0, A_NEXT, TPS_InHyphenWordPart, 0, NULL},
1661-
{p_isdigit, 0, A_PUSH, TPS_InHyphenUnsignedInt, 0, NULL},
1663+
{p_isdigit, 0, A_PUSH, TPS_InHyphenNumWordPart, 0, NULL},
16621664
{p_iseqC, '-', A_PUSH, TPS_InParseHyphenHyphen, 0, NULL},
16631665
{p_iseqC, '_', A_PUSH, TPS_InParseHyphenHyphen, 0, NULL},
16641666
{NULL, 0, A_RERUN, TPS_Base, 0, NULL}

0 commit comments

Comments
 (0)