Author Topic: How-To: One User's Tutorial for Cleaning Media Library Duplicates with Similarity (Read 129505 times)

StanleyTweedle · « **on:** March 25, 2009, 12:27:47 »

I'd like to share with the Similarity User Community, my own technique for effective, efficient duplicate detection, and removal with confidence in every deletion.

Step 00:
Obtain the latest copy of Similarity from www . music-similarity . com .
Obtain the latest copy of freeware media player, "Foobar2000", available at foobar2000 . org
Install the software.
Prepare your media-library folders:
Placing ALL media files in a single, "Parent" folder is recommended, from this user's experience. Pulling results from a single location, subfolders or not, is more easy to manage than results from multiple Parent locations. Trust me-- put all mp3s, etc., into a single folder (subfolders are okay. And, when you're finished removing dupes, you can re-arrange your library to its previous state.)

[for example, my path:
Parent Folder - E:\myMusic\
Folder structure example:
E:\myMusic\_artist-1\
E:\myMusic\_artist-2\
etc,
etc.
...totalling about 7,000 mp3's, in hundreds of sub-folders, under a single parent, "myMusic"]
Step 01:
Launch Similarity. The default options are recommended. Enable the experimental algorithm if you wish to have a very accurate report.*(1)
Click the folder icon. Select that single folder, under which you placed all of your other mp3 folders. Subfolders are automatically scanned, so DO NOT add anything but the "parent folder". (correct me, please, if i'm wrong)
[according to the illustrated folder structure, above, i select ONLY "E:\myMusic"]
Step 02:
START THE MEDIA SCAN:
How do I enable Scanning in Similarity?
As soon as the folder is selected, and the dialogue window is closed ( clicking [OK] button ), Similarity begins the scanning process. In fact, this is the very method for enabling the scan. **NOTE** If you close Similarity, and re-open it, wondering why it's not doinging anything-- you must first "delete" any folders from the "add-folder" dialogue, then re-add those folders. (odd, perhaps-- but that's how it works! hey, be patient-- it's still very "beta" software! Give the man some time to tend to such quirks. ;-)
Step 03:
SIMILARITY COMMENCE ROCKING (, and rolling)
so...Wait...and...wait... (tic, toc, tic, toc...)
Consider This: Similiarity is actually "listening" to all of those files-- so it knows if there are duplicates (vs just looking at file-names, etc.). Imagine how long it would take for you to do the same!

Scan Progress is indicated in the upper-right [Number of Files scanned], and lower-left [number of duplicates detected]
NOTE: Similarity WILL gradually report each duplicate file, as the scan proceeds. Duplicates may be verified, and even deleted, as the scan progresses
Step 04:
SAVE PLAYLIST
NOTE: This section begins a particular process-- unique to the author of this tutorial. The following procedure is not the recommended action per Similarity, but I find it to be my own preference. I recommend the reader try this method on only a few files, at first, so he or she might make the best personal judgment for further proceeding with this technique.

When Similarity finds approx 100 duplicates (observe, lower-left corner), you may wish to save the playlist. Before saving, sort the playlist items by column, by "content", such that the items marked "100%" are mostly near the "top" of the column. Save the playlist in an easily accessible folder (the "Save" icon, looks like a floppy disk).

Step 05:
OPEN PLAYLIST IN FOOBAR2000
Foobar2000 has a built-in function to perform actions on files, including "delete". By loading the playlist into Foobar2000, you create an environment quite friendly for comparing the duplicates reported by Similarity. Simply go through the playlist, determine which file is "better" (and likewise, which to "delete"). In this manner, a confidence level is raised from assurance that a deleted file is in fact the one which might be better discarded.

Indeed, Similarity offers the "speaker" icons, which launch the file-in-focus, for previewing its contents-- however, using the aforementioned method, it is much easier to perform an "A-B" comparison between files. Furthermore, the practice of comparison becomes less methodical, and more enjoyable-- as the playlist can continue in the background while the user need not give 100% of his or her attention. In other words, let the playlist play-- enjoy your tunes, and checking only as the duplicates load into playing.
NOTE: It is recommended, regardless of playlists in Foobar2000, to USE SIMILARITY FOR THE DELETION action. ALthough, Foobar2000 is capable of deleting files from the playlist, Similiarity will not function as beautifully if files are modified outside of its own process. Experiment. See what you prefer, but begin by deleting from Similarity, NOT from within Foobar, unless you opt to "Move" files, as suggested, after the next paragraph.

I find this is much more "bearable" than working only in Similarity. This is NOT to take any value from Similarity itself, but only to suggest a possible practice which may be more enjoyable for the user who has a large library containing many duplicates.

NOTE: Foobar2000, in addition to offering a "File-Operation -> Delete" command, has also a "move" operation. If the user is really paranoid about which files should be deleted, I recommend creating a "Safe_Delete" folder (i.e. E:\SafeDeleteMp3s\ ). Using a safety-net folder, instead of deleting every duplicate, the user might use "File-Operations -> move -> SafeDelete", so the duplicate mp3's are removed from a primary library of files, separated into a sort of "pre-delete" folder (not unlike recycle bin, but safe from system cleaner utilities which might otherwise automatically dump recycle bin contents). If the user is confident about deleting duplicates, then such a "safeDelete" folder is only superfluous, and probably a waste of effort. Use your own best judgment for how to handle your own files.

Step 06:
Return to step 03, repeating from Step 03 - Step 05, until all duplicate files are eliminated.
Good luck!

[Footnotes]
*(1)Enabling the "experimental algorithm" does not affect the "regular" scan duration, but only adds time to the end of the job, as the statistics it offers are "in addition-to" those which are revealed, for example, if a scan is performed without it. In other words-- Similarity performs the "Experimental" check only AFTER the "regular" scan is finished (in my observation). you'll get the same "regular" results either way, in the same amount of time. If you decide not to wait for the "experimental", then there is no harm-- the "regular" results are unaffected by a premature cancellation of an "experimental" scan.

Admin · « **Reply #1 on:** March 29, 2009, 20:16:29 »

Thanks, for very big comment.

Fuldessiosods · « **Reply #2 on:** April 05, 2009, 13:26:13 »

do not understand

StanleyTweedle · « **Reply #3 on:** April 08, 2009, 00:48:33 »

do not understand

Hi, Fuldessiosods. Ah, yes: I see that I have rambled-on more than usual! not good, albeit-- much better than some less productive escapades, like "drunk dialing" [telephones], for example! hee hee....

I realize my text is rather verbose, and as such-- I tend to lose my readers' focus. Likewise, I would not argue for its educational quality.

I'm confident you will achieve success with Similarity, regardless of my convoluted creation above, but if you wish to try as I explained above, I will try to accommodate you.

---------

I would like to know the developer's (Admin's?) opinion of my method; whether there exists a more complicated way of doing things!
;-)

If others have found a way to utilize Similarity through a special process, I hope you share-- I look forward to reading someone else's story!

BTW: with the Admin's approval, I invite others to ask questions about my "how-to". If you are confused by my explanation, but wish to try, it is my pleasure to help if I am able.

eminn3m · « **Reply #4 on:** April 09, 2009, 21:17:35 »

Thank you very much for taking the time to explain your method. Iv tried multiple programs and none quite did it for me. I'm hoping this will do the trick.

Summary:

1. Scan Files in Similarity.
For more accuracy use the "experimental algorithm" option, the results will be displayed before this option kicks in.

2. Export to Playlist.

3. Import playlist to Foobar and choose which files to delete.

4. Delete the files you chose using either Similiarity or Foobar, preferably in Similiarity if you want to keep it's database intact.

royrogers · « **Reply #5 on:** June 06, 2009, 12:39:37 »

Hi Stanley - I have just started to fiddle around with 'Similarity' and read your method with interest. I have also downloaded Foobar 2000 as you suggested. I could not understand, however, why you recommend returning to 'Similarity' to perform deletions rather than do it in Foobar. If one did do it in the latter, what would be the effect on 'Similarity'? Wouldn't one just delete the file without any lasting impact?
By the way, just to show how raw I am on this, if I do the deletions in 'Similarity', how do I actually tell it to delete all duplicate files in one go?
Regards
royrogers

Admin · « **Reply #6 on:** June 07, 2009, 18:04:35 »

royrogers
this will change over time, wait new versions

AVS · « **Reply #7 on:** June 22, 2009, 23:11:53 »

royrogers

how do I actually tell it to delete all duplicate files in one go?

Just mark duplicate files, then 'right click' with your mouse on the list and press 'delete marked'.

Guest · « **Reply #8 on:** November 10, 2009, 18:13:50 »

Thaks for this How-To

CoreyAnn · « **Reply #9 on:** January 09, 2010, 10:19:40 »

I cant figure out how 2 select /check ALL of the boxes next 2 the list of duplicates in order 2 delete ALL at once... Am I missing something?

StanleyTweedle · « **Reply #10 on:** July 08, 2011, 14:56:16 »

Readers, I'm really quite pleased to see someone found my /method/ useful. I'm flattered, to be honest. (as an instructor[1], it's not often i'd go about instructing this way, so I find it interesting on different levels...)

Please Note: As the Admin states, above, Similarity [would and] has changed since I authored this little /tutorial/ in the Spring of 2009. (i have to chuckle at how well the first respondent summarized, in just a few lines of text, what took me like three pages to spit-out! haha!...)

But, indeed-- this whole technique I've explained (posted, first march 2009) is really out-of-date, in terms of relevancy to the current edition of Similarity. The application GUI underwent a massive change, maybe late 2009, or 2010, such that talk of "checking" for deleting, and much of the logistical details are no longer relevant.

Of course, the /basic/ concept does remain viable, however, look at it this way: I've not used the technique i've described here-- likely, since the GUI changed, which is many, many versions ago.

Best wishes to the Readers, and good luck in your trimming of the fat!

Seacrest, out!
[1] I, instructor: www.ChordsAndScales.Info

Similarity - Home

Author Topic: How-To: One User's Tutorial for Cleaning Media Library Duplicates with Similarity (Read 129505 times)

StanleyTweedle

How-To: One User's Tutorial for Cleaning Media Library Duplicates with Similarity

Admin

How-To: One User's Tutorial for Cleaning Media Library Duplicates with Similarity

Fuldessiosods

How-To: One User's Tutorial for Cleaning Media Library Duplicates with Similarity

StanleyTweedle

How-To: One User's Tutorial for Cleaning Media Library Duplicates with Similarity

eminn3m

How-To: One User's Tutorial for Cleaning Media Library Duplicates with Similarity

royrogers

How-To: One User's Tutorial for Cleaning Media Library Duplicates with Similarity

Admin

How-To: One User's Tutorial for Cleaning Media Library Duplicates with Similarity

AVS

How-To: One User's Tutorial for Cleaning Media Library Duplicates with Similarity

Guest

How-To: One User's Tutorial for Cleaning Media Library Duplicates with Similarity

CoreyAnn

How-To: One User's Tutorial for Cleaning Media Library Duplicates with Similarity

StanleyTweedle

Re: How-To: One User's Tutorial for Cleaning Media Library Duplicates with Similarity

Quick Reply