Thanks for your question, sorry but this is very hard task, current version of algorithms didn't design to do this, they show very high amount of false positives if they configured to this problem.
It's possible to create such algorithm, but very hard, we need remove to separate sub band the human speech, use speech recognition system on it, audio sub bands compared in piecewise manner with 'precise' likely algorithm. But the main problem it's very rarely needed and will be very slow.