Complete silence is always hallucinated as “ترجمة نانسي قنقر” in Arabic

Comment options

- Advertisement -

VAD, probably.

I’ve only tried the turbo one, but what I can say is that v3 is different from the earlier models.

- Advertisement -

It looks like it doesn’t have the audio descriptions to fall back on and produces hallucinations instead.

The earlier models will also produce some miscellaneous crap when they encounter silence

- Advertisement -

(they do this regardless of language), but there are more options for how to deal with that.

For example, these things can be effective for the small model (but not for v3):

the suppress_tokens trick
setting initial prompt to something like “.”
adjusting logprob_threshold to -0.4 (works for this empty audio, probably not good for general use)

- Advertisement -

You must be logged in to vote

0 replies

Comment options

- Advertisement -

is there any good arabic model you guys found which is better than large v3 ?

@misutoneko @puthre

You must be logged in to vote

1 reply

Comment options

Voxtral was released a few days ago and looks promising

Comment options

I found a similar thing happens in German where it says

“Untertitelung des ZDF für funk, 2017.”

For both German and Arabic I found that this pretty much only happens at the very end of videos / when there is sustained silence.

You must be logged in to vote

1 reply

Comment options

could it be related to .srt files in the training dataset almost always having “translated by..” as an ending to movie translation?

loads of subtitles are available online for free in websites like opensubtitles

Comment options

Essentially this seems to be an artifact of the fact that Whisper was trained on (amongst other things) YouTube audio + available subtitles. Often subtitlers add their copyright notice onto the end of the subtitles, and the end of the videos are often credits with music, applause, or silence. Thus whisper learned that silence == “copyright notice”.

See some research for the Norwegian example here:

https://medium.com/@lehandreassen/who-is-nicolai-winther-985409568201

You must be logged in to vote

0 replies

Comment options

In English there is always applause

You must be logged in to vote

0 replies

Comment options

this also happens when you don’t speak into the voice mode, the transcript usually results in the same Arabic phrase

You must be logged in to vote

0 replies

Comment options

I’ve also seen this happen a lot in English with Skyeye:

It also happens a lot with hallucinations saying stuff like “This is the end of the video, remember to like and subscribe”

You must be logged in to vote

0 replies

Comment options

You must be logged in to vote

1 reply

Comment options

Ok? This doesn’t have anything to do with the topic of this discussion

Comment options

In german it’s “Vielen Dank” (Thank you very much)

You must be logged in to vote

0 replies

Comment options

You must be logged in to vote

0 replies

Comment options

in romanian, i’ve noticed multiple instances where the transcripts ends with “nu uitati sa da-ti like si subscribe” which, as you might easily infer , translates to “don’t forget to like and subscribe”.

You must be logged in to vote

1 reply

Comment options

Interesting google translates this into “Translated by Nancy Kangar”

You must be logged in to vote

1 reply

Comment options

It gets it right if you set the source language to Arabic.

Comment options

You can either finetune the model or filter the response from whisper

text = “helo helo hello .”
target_phrase = “ترجمة نانسي قنقر”
replacement = “”

updated_text = text. Replace(target_phrase, replacement)

print(updated_text)

You must be logged in to vote

0 replies

Comment options

You must be logged in to vote

1 reply

Comment options

Other languages don’t get as much support as English during the data annotation and fine-tuning stages of most models

Top Stories

Hello-World iOS App in Assembly

IRCd service (2024)

OS/2 Warp, PowerPC Edition (2011)

Stay Connected

Complete silence is always hallucinated as “ترجمة نانسي قنقر” in Arabic

Leave a Reply Cancel reply

Related Stories

Hello-World iOS App in Assembly

IRCd service (2024)

OS/2 Warp, PowerPC Edition (2011)

The missing data link in enterprise AI: Why agents need streaming context, not just better prompts

Geostar pioneers GEO as traditional SEO faces 25% decline from AI chatbots, Gartner says

Agentic AI is all about the context — engineering, that is

The AI Hype Index: Data centers’ neighbors are pivoting to power blackouts

DeepSeek may have found a new way to improve AI’s ability to remember

Top Stories

Stay Connected

Leave a Reply Cancel reply

Related Stories

Ads Blocker Detected & This Is Prohibited!!!