Hi, I've done some tests and I ask myseld if opencl is not fully used ? I've done this test (song files are already in similarity database) :
Ryzen 5 3600 with GFX1650 "precise" algorithm only (work group size 1024) : 2:49
Ryzen 5 3600 without opencl "precise" algorithm only : 3:23
Ryzen 5 3600 With GT430 "precise" algorithm only (work group size 64 either it failes) : 3:57
So with GT430 opencl the operation is slower, with GFX1650 it gives equivalent of +1.17 core (my CPU is 6 cores). So I can consider 7.17 core.
But with my new Ryzen 7 9750x (16 cores), following this, using GFX1650 will only give equivalent of 0.78 core.
With opencl benchmark :
https://browser.geekbench.com/opencl-benchmarks GFX1650 has score 38098. Maybe you could improve this ? By proposing 2048 or more work group size ?