Author Topic: Similarity seems not optimized for 300,000+ mp3 files  (Read 77239 times)

ektorbarajas

  • Jr. Member
  • **
  • Posts: 45
    • View Profile
    • DJ ektorbarajas
Similarity seems not optimized for 300,000+ mp3 files
« Reply #225 on: January 17, 2016, 23:14:39 »
Hi.

I have a huge amount of mp3 files (300,000+) and usually use the following config for checking duplicates (collection vs new files):
Audio comparison method: precise 95%
Duration check: enable 95%
Skip video files

My collection is stored accros several drives: my laptop drive and 2 EHDD (USB 3.0)
The Similatity Cache shows a total of 393,693 files

And every time I want to search for duplicates have to deal with the following:
1) when I open the program, it takes between 1 and 3 minutes to load and the GUI to appear
2) then when I start the comparison, a couple of types i have compared all my files (I've not created groups) and it takes 4 to 5 full days to complete. First time I launched this comparison, I though that it took longer since the cache was build, but then launched again the same exact comparison and it took the same 4 to 5 days, isn't it supposed to take much less? I have not added nor deleted any file, just launched again the same comparison to check the difference between having the cache to build and having the cache already created

Are there any plans to further optimize similarity? to really take advantage of the cache and reduce dramatically the time? and also to process a huge collection more efficiently?

Thanks

AntiBotQuestion

  • Jr. Member
  • **
  • Posts: 4
    • View Profile
Re: Similarity seems not optimized for 300,000+ mp3 files
« Reply #226 on: February 15, 2016, 08:29:23 »
It requires way less than 300k tracks to be annoying, I tell you. Similarity keeps telling me I've got some twenty hours left, plus minus - it can do that for days. Down to sixteen now, two days after I completed a scan and hit a new, just to test.
My disk is USB2 only, if the bottleneck were read speed, then it would not be using 90 percent CPU all the time. For days.

ektorbarajas

  • Jr. Member
  • **
  • Posts: 45
    • View Profile
    • DJ ektorbarajas
Re: Similarity seems not optimized for 300,000+ mp3 files
« Reply #227 on: February 15, 2016, 12:55:09 »
That is the point, I don't know why the CPU usage is high but Similarity is taking ages to do its job.

My 2 EHDD are USB 3.0, but what concerns me is that it appears that the cache is useless (at least for a very high volume of files).

I can't believe that I performed a full scan, added several new files to the cache, and by running the same exact scan (wth no new added files) takes the same amount of time like the initial scan.