Thanks hsei,
Similarity doesn't designed for big amount of files, it can teoretically process a 200 000 of files, but this take a lot of time.
We now actively work on 4th algorithm, that has all properties of "precise", but can compare all elements stored on disk (like sql database), current implementation need contain all data in RAM, because every one compared with everything, it can't be optimized like binary search or other type of sorting.