Whisper is extremely slow by comparison, and while the ASR on macOS is actually decent now, those upgrades don’t seem to be coming to iOS devices (including the 16gb ipad pros). Even though I am a native english speaker and IRL people have no problem understanding me, the Apple ASR model on iOS literally can’t understand me well enough to finish 2 sentences properly.
I currently use the ~490mb distilled Parakeet ASR model with a 3rd party app on macOS, and it is absolutely sufficient. If we extrapolate based on my known transcription speed on M3 Max (~250x realtime) and relative Geekbench Metal performance, an iPhone 15 Pro could transcribe audio 47x faster than realtime. So if a user was yapping for 2 whole minutes, parakeet would transcribe it in 2.5s, it would probably take an extra ~2-3s for the first time loading it with app in foreground,
https://huggingface.co/FluidInference/parakeet-tdt-0.6b-v2-coreml
It claims max memory usage is 800mb
Please authenticate to join the conversation.
In Review
BoltAI Mobile
6 months ago

kernkraft
Get notified by email when there are changes.
In Review
BoltAI Mobile
6 months ago

kernkraft
Get notified by email when there are changes.