Author Topic: A clearer reporting of "100.0" percent matches  (Read 23436 times)

AntiBotQuestion

  • Jr. Member
  • **
  • Posts: 4
    • View Profile
A clearer reporting of "100.0" percent matches
« on: February 15, 2015, 22:32:02 »
It seems that I can get a "Content" match of "100.0%" even for two files with different length and different bitrate. It is, I think, unreasonable - it is not the same content if it is not the same content (different tags are OK as long as tags are a separately-reported thing).

It might be that the Precise algorithm can tell the difference, but (1) then disclose that, although you like to keep the Precise algorithm a secret, and (2) deduplication based on audio content is an even less sophisticated method than the non-Premium mode; not only is it available in less sophisticated deduplicators, but from a user point of view it is counterintuitive to switch to the more sophisticated method to get a less sophisticated method.

Suggestion: For streams that decode to the truly bit-exact same signal, display "exact" rather than "100.0%". Takes the same space in the table.

(That does mean that you might need to round off decoded mp3s  somewhere around -150 dB if you do not calculate with very high precision, if you do not want a CBR and a VBR with the same signal to be reported as different.)

AntiBotQuestion

  • Jr. Member
  • **
  • Posts: 4
    • View Profile
Re: A clearer reporting of "100.0" percent matches
« Reply #1 on: February 22, 2015, 20:42:52 »
It seems that I can get a "Content" match of "100.0%" even for two files with different length and different bitrate. It is, I think, unreasonable - it is not the same content if it is not the same content (different tags are OK as long as tags are a separately-reported thing).

It might be that the Precise algorithm can tell the difference

Obviously not. It just matched "100.0%" Content and "100.0%" precise with different wordlength (20 bits used vs 16 bits used).

Too bad, this piece of software looked promising. Now I have to manually check each possible match, and that takes multiple operations per pair - I cannot even mark and drag them to an application that actually can compare by bits.

Admin

  • Administrator
  • Hero Member
  • *****
  • Posts: 664
    • View Profile
    • https://www.smilarityapp.com
Re: A clearer reporting of "100.0" percent matches
« Reply #2 on: February 29, 2016, 16:25:13 »
Similarity didn't compare files byte by byte, it uses audio data from decoders, some decoders never give raw data (flac, apple quicktime) only converted to some other format. If we also read file second time to calculate some digital signature it takes 2 times more, and everything to have very ephemeral benefit. You can always use any byte by byte duplicate comparing software with hashes for such task.
And Similarity content algorithms compares only starting 30-60 sec. of file not more.