Skip to content

Commit f97b04b

Browse files
authored
Update Jaro-Winkler description in README
The previous readme made it same like Jaro-Winkler was the ideal typo detector, when in actuality it is really only suited for typos caused by unsynchronized high-speed typing between between both hands but does not account for actual miskey errors such as hitting the wrong key altogether or advertently pressing two keys instead of one. This is because Jaro-Winkler operates only on transpositions and does not favorbly consider a string consisting strictly of additions or permutitions with letters not already part of the word's alphabet to be "similar" changes.
1 parent 1924ab8 commit f97b04b

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

Diff for: README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -250,10 +250,10 @@ Will produce:
250250
```
251251

252252
## Jaro-Winkler
253-
Jaro-Winkler is a string edit distance that was developed in the area of record linkage (duplicate detection) (Winkler, 1990). The Jaro–Winkler distance metric is designed and best suited for short strings such as person names, and to detect typos.
253+
Jaro-Winkler is a string edit distance that was developed in the area of record linkage (duplicate detection) (Winkler, 1990). The Jaro–Winkler distance metric is designed and best suited for short strings such as person names, and to detect transposition typos.
254254

255255
Jaro-Winkler computes the similarity between 2 strings, and the returned value lies in the interval [0.0, 1.0].
256-
It is (roughly) a variation of Damerau-Levenshtein, where the substitution of 2 close characters is considered less important then the substitution of 2 characters that a far from each other.
256+
It is (roughly) a variation of Damerau-Levenshtein, where the transposition of 2 close characters is considered less important than the transposition of 2 characters that are far from each other. Jaro-Winkler penalizes additions or substitutions that cannot be expressed as transpositions.
257257

258258
The distance is computed as 1 - Jaro-Winkler similarity.
259259

0 commit comments

Comments
 (0)