Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 28 additions & 5 deletions apps/desktop/src-tauri/src/captions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,13 @@ impl Default for CaptionData {

lazy_static::lazy_static! {
static ref WHISPER_CONTEXT: Arc<Mutex<Option<Arc<WhisperContext>>>> = Arc::new(Mutex::new(None));
// std::sync::Mutex so the guard is held by the blocking thread itself.
// If the async future is dropped mid-transcription the blocking thread
// continues to hold this lock until it finishes, preventing a racing
// retry from spawning a second ML session concurrently.
// On Apple Silicon each WhisperState allocates ~700 MB of Metal (unified)
// memory; without serialisation rapid re-clicks exhausted 44 GB of RAM.
static ref TRANSCRIPTION_LOCK: std::sync::Mutex<()> = std::sync::Mutex::new(());
}

#[cfg(not(all(target_os = "macos", target_arch = "x86_64")))]
Expand Down Expand Up @@ -1111,9 +1118,12 @@ pub async fn transcribe_audio(
TranscriptionEngine::Parakeet => {
log::info!("Using Parakeet TDT engine");
let model_dir = model_path.clone();
tokio::task::spawn_blocking(move || process_with_parakeet(&audio_path, &model_dir))
.await
.map_err(|e| format!("Parakeet task panicked: {e}"))?
tokio::task::spawn_blocking(move || {
let _guard = TRANSCRIPTION_LOCK.lock().unwrap_or_else(|p| p.into_inner());
process_with_parakeet(&audio_path, &model_dir)
})
.await
.map_err(|e| format!("Parakeet task panicked: {e}"))?
}
TranscriptionEngine::Whisper => {
let context = match get_whisper_context(&model_path).await {
Expand All @@ -1134,11 +1144,24 @@ pub async fn transcribe_audio(
.unwrap_or_default();

log::info!("Starting Whisper transcription in blocking task...");
tokio::task::spawn_blocking(move || {
let result = tokio::task::spawn_blocking(move || {
let _guard = TRANSCRIPTION_LOCK.lock().unwrap_or_else(|p| p.into_inner());
process_with_whisper(&audio_path, context, &language, &transcription_hints)
})
.await
.map_err(|e| format!("Whisper task panicked: {e}"))?
.map_err(|e| format!("Whisper task panicked: {e}"))?;

// Release the cached context so Metal buffers (~500 MB on Apple Silicon)
// are freed after each run rather than held until the editor closes.
// Gated to aarch64 only: on other platforms the cache improves
// repeated-transcription latency with no meaningful memory cost.
#[cfg(all(target_os = "macos", target_arch = "aarch64"))]
{
let mut ctx = WHISPER_CONTEXT.lock().await;
*ctx = None;
Comment on lines +1160 to +1161

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Whisper Cache Always Evicted

This clears the cached Whisper context after every Whisper run on all platforms, although the memory issue described here is specific to Apple Silicon Metal buffers. On Windows, Linux, and Intel macOS, repeated subtitle generation now reloads the Whisper model from disk each time instead of reusing the warmed WHISPER_CONTEXT, causing avoidable latency and memory churn with no Metal-memory benefit.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src-tauri/src/captions.rs
Line: 1158-1159

Comment:
**Whisper Cache Always Evicted**

This clears the cached Whisper context after every Whisper run on all platforms, although the memory issue described here is specific to Apple Silicon Metal buffers. On Windows, Linux, and Intel macOS, repeated subtitle generation now reloads the Whisper model from disk each time instead of reusing the warmed `WHISPER_CONTEXT`, causing avoidable latency and memory churn with no Metal-memory benefit.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

}

result
}
};

Expand Down
Loading