Skip to content

starcatmeow/fastminhash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fastminhash

fastminhash is a fast N-gram MinHash implementation in C.

Install

pip3 install fastminhash

Usage

from minhash import minhash, minhash_all

# `n` means to hash every n-grams, `token_max_value` is the maximum value of each token, `coeff1`, `coeff2`, `modulo` are constants for hash function `(coeff1 * token + coeff2) % modulo` construction, `tokens` is the list of tokens
hash = minhash(5, 100276, 100279, 12345, 54321, [32352, 33513, 1864, 3626, 12763, 27125, 23981])

# You can calculate multiple minhashes at once by using multiple coeffs
hashes = minhash_all(5, 100276, 100279, [(12345, 54321), (23456, 65432)], [32352, 33513, 1864, 3626, 12763, 27125, 23981])

About

N-gram MinHash implementation in C

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published