Isn’t simhash/minhash what you want?
svcs.cs.pdx.edu Git – simhash.git/summary
See also How do you find high-similarity pairs of sentences in a large corpus?
Isn’t simhash/minhash what you want?
svcs.cs.pdx.edu Git – simhash.git/summary
See also How do you find high-similarity pairs of sentences in a large corpus?