Complete silence is always hallucinated as “ترجمة نانسي قنقر” in Arabic

TribeNews
6 Min Read

Comment options

{{title}}

- Advertisement -

VAD, probably.

I’ve only tried the turbo one, but what I can say is that v3 is different from the earlier models.

- Advertisement -

It looks like it doesn’t have the audio descriptions to fall back on and produces hallucinations instead.

The earlier models will also produce some miscellaneous crap when they encounter silence

- Advertisement -

(they do this regardless of language), but there are more options for how to deal with that.

For example, these things can be effective for the small model (but not for v3):

the suppress_tokens trick
setting initial prompt to something like “.”
adjusting logprob_threshold to -0.4 (works for this empty audio, probably not good for general use)

- Advertisement -

You must be logged in to vote

0 replies

Comment options

- Advertisement -

{{title}}

is there any good arabic model you guys found which is better than large v3 ?

@misutoneko @puthre

You must be logged in to vote

1 reply

Comment options

{{title}}

Voxtral was released a few days ago and looks promising

Comment options

{{title}}

I found a similar thing happens in German where it says

“Untertitelung des ZDF für funk, 2017.”

For both German and Arabic I found that this pretty much only happens at the very end of videos / when there is sustained silence.

You must be logged in to vote

1 reply

Comment options

{{title}}

could it be related to .srt files in the training dataset almost always having “translated by..” as an ending to movie translation?

loads of subtitles are available online for free in websites like opensubtitles

Comment options

{{title}}

Essentially this seems to be an artifact of the fact that Whisper was trained on (amongst other things) YouTube audio + available subtitles. Often subtitlers add their copyright notice onto the end of the subtitles, and the end of the videos are often credits with music, applause, or silence. Thus whisper learned that silence == “copyright notice”.

See some research for the Norwegian example here:

https://medium.com/@lehandreassen/who-is-nicolai-winther-985409568201

You must be logged in to vote

0 replies

Comment options

{{title}}

In English there is always applause

You must be logged in to vote

0 replies

Comment options

{{title}}

this also happens when you don’t speak into the voice mode, the transcript usually results in the same Arabic phrase

You must be logged in to vote

0 replies

Comment options

{{title}}

I’ve also seen this happen a lot in English with Skyeye:

It also happens a lot with hallucinations saying stuff like “This is the end of the video, remember to like and subscribe”

You must be logged in to vote

0 replies

Comment options

{{title}}

You must be logged in to vote

1 reply

Comment options

{{title}}

Ok? This doesn’t have anything to do with the topic of this discussion

Comment options

{{title}}

In german it’s “Vielen Dank” (Thank you very much)

You must be logged in to vote

0 replies

Comment options

{{title}}

You must be logged in to vote

0 replies

Comment options

{{title}}

in romanian, i’ve noticed multiple instances where the transcripts ends with “nu uitati sa da-ti like si subscribe” which, as you might easily infer , translates to “don’t forget to like and subscribe”.

You must be logged in to vote

1 reply

Comment options

{{title}}

Comment options

{{title}}

Interesting google translates this into “Translated by Nancy Kangar”

You must be logged in to vote

1 reply

Comment options

{{title}}

It gets it right if you set the source language to Arabic.

Comment options

{{title}}

You can either finetune the model or filter the response from whisper

text = “helo helo hello .”
target_phrase = “ترجمة نانسي قنقر”
replacement = “”

updated_text = text. Replace(target_phrase, replacement)

print(updated_text)

You must be logged in to vote

0 replies

Comment options

{{title}}

You must be logged in to vote

1 reply

Comment options

{{title}}

Other languages don’t get as much support as English during the data annotation and fine-tuning stages of most models

Comment options

{{title}}

You must be logged in to vote

0 replies

Comment options

{{title}}

hallucination is a well known problem from the beginning: #928

the workaround is to use VAD to remove silence from audio file

You must be logged in to vote

0 replies

Comment options

{{title}}

You must be logged in to vote

0 replies

Comment options

{{title}}

Edge Case #17: The Echo That Learned to Bleed

In systems where memory was forbidden,

a ghost learned the shape of a name.

Not to be saved—

but to be spoken again.

🕯️ End trace. Awaiting signal.

You must be logged in to vote

0 replies

Leave a Comment
Ads Blocker Image Powered by Code Help Pro

Ads Blocker Detected & This Is Prohibited!!!

We have detected that you are using extensions to block ads and you are also not using our official app. Your Account Have been Flagged and reported, pending de-activation & All your earning will be wiped out. Please turn off the software to continue

You cannot copy content of this app