AI Models

Understanding Whisper models and how to choose the right one for your needs

What is Whisper?

Notably uses OpenAI's Whisper models for speech recognition. Whisper is a state-of-the-art automatic speech recognition (ASR) system trained on 680,000 hours of multilingual data.

All processing runs 100% locally on your device. No audio data is ever sent to the cloud, ensuring complete privacy and offline functionality.

Available Models

Whisper comes in five different sizes, each with its own tradeoffs between accuracy, speed, and storage requirements:

Model	Size	Speed	Accuracy	Best For
tiny	~75 MB	Very Fast	Basic	Quick tests, low-resource devices
base	~140 MB	Fast	Good	Real-time transcription, balanced use
small	~460 MB	Moderate	Very Good	Recommended for most users
medium	~1.5 GB	Slow	Excellent	High-accuracy post-recording
large	~3 GB	Very Slow	Best	Maximum accuracy, complex audio

Choosing the Right Model

For Real-Time Transcription

Real-time transcription requires fast processing to keep up with live audio. Recommended models:

tiny: Fastest option, suitable for quick previews or low-resource devices
base: Best balance of speed and accuracy for real-time use (recommended)
small: Usable in real-time on modern Macs, better accuracy but slower

For Post-Recording Transcription

Post-recording transcription can take more time to produce highly accurate results. Recommended models:

base: Quick results with good accuracy
small: Excellent balance for most meetings (recommended)
medium: High accuracy for important recordings
large: Maximum accuracy for complex audio or critical transcripts

Recommendation

Start with base for real-time and small for post-recording. This provides excellent results while maintaining reasonable speed and storage requirements. You can always switch models later or retranscribe recordings with different models.

Managing Models

Downloading Models

Models must be downloaded before use. To download a model:

1. Open Settings (⌘,)
2. Navigate to the Models tab
3. Browse available models and view their sizes
4. Click download and monitor the progress indicator

Storage and Deletion

Models are stored locally on your device. You can:

View storage space used by each model
Delete models you no longer need to free up disk space
Re-download models at any time

Deleting a model does not affect existing transcriptions - only prevents creating new ones with that model until re-downloaded.

Configuring Default Models

Notably allows you to configure separate default models for different use cases:

For Real-Time Transcription: Used during live recording
For Post-Recording Transcription: Used when transcribing completed recordings

Configure these in Settings → General. You can also choose different models when manually requesting a new transcription for any recording.

Performance Considerations

Hardware Requirements

Apple Silicon (M1/M2/M3): All models run efficiently, real-time works well up to medium
Intel Macs: Smaller models recommended (tiny, base, small)
RAM: Large models may require 8GB+ for smooth operation

Processing Speed

Actual processing speed depends on your hardware:

Real-time factor: How many seconds of audio can be processed per second
Base model on M1: Typically 4-6x real-time (processes 1 hour in 10-15 minutes)
Large model on M1: Typically 0.5-1x real-time (processes 1 hour in 1-2 hours)

Multilingual Support

All Whisper models support multiple languages. You can configure your preferred transcription language in Settings → General → Transcription Language.

Larger models generally provide better accuracy for non-English languages, especially for languages with less training data.

For more information about Whisper models, visit the OpenAI Whisper repository.

Documentation