AI Models
Understanding Whisper models and how to choose the right one for your needs
What is Whisper?
Notably uses OpenAI's Whisper models for speech recognition. Whisper is a state-of-the-art automatic speech recognition (ASR) system trained on 680,000 hours of multilingual data.
All processing runs 100% locally on your device. No audio data is ever sent to the cloud, ensuring complete privacy and offline functionality.
Available Models
Whisper comes in five different sizes, each with its own tradeoffs between accuracy, speed, and storage requirements:
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
| tiny | ~75 MB | Very Fast | Basic | Quick tests, low-resource devices |
| base | ~140 MB | Fast | Good | Real-time transcription, balanced use |
| small | ~460 MB | Moderate | Very Good | Recommended for most users |
| medium | ~1.5 GB | Slow | Excellent | High-accuracy post-recording |
| large | ~3 GB | Very Slow | Best | Maximum accuracy, complex audio |
Choosing the Right Model
For Real-Time Transcription
Real-time transcription requires fast processing to keep up with live audio. Recommended models:
- tiny: Fastest option, suitable for quick previews or low-resource devices
- base: Best balance of speed and accuracy for real-time use (recommended)
- small: Usable in real-time on modern Macs, better accuracy but slower
For Post-Recording Transcription
Post-recording transcription can take more time to produce highly accurate results. Recommended models:
- base: Quick results with good accuracy
- small: Excellent balance for most meetings (recommended)
- medium: High accuracy for important recordings
- large: Maximum accuracy for complex audio or critical transcripts
Recommendation
Start with base for real-time and small for post-recording. This provides excellent results while maintaining reasonable speed and storage requirements. You can always switch models later or retranscribe recordings with different models.
Managing Models
Downloading Models
Models must be downloaded before use. To download a model:
- 1. Open Settings (⌘,)
- 2. Navigate to the Models tab
- 3. Browse available models and view their sizes
- 4. Click download and monitor the progress indicator
Storage and Deletion
Models are stored locally on your device. You can:
- View storage space used by each model
- Delete models you no longer need to free up disk space
- Re-download models at any time
Deleting a model does not affect existing transcriptions - only prevents creating new ones with that model until re-downloaded.
Configuring Default Models
Notably allows you to configure separate default models for different use cases:
- For Real-Time Transcription: Used during live recording
- For Post-Recording Transcription: Used when transcribing completed recordings
Configure these in Settings → General. You can also choose different models when manually requesting a new transcription for any recording.
Performance Considerations
Hardware Requirements
- Apple Silicon (M1/M2/M3): All models run efficiently, real-time works well up to medium
- Intel Macs: Smaller models recommended (tiny, base, small)
- RAM: Large models may require 8GB+ for smooth operation
Processing Speed
Actual processing speed depends on your hardware:
- Real-time factor: How many seconds of audio can be processed per second
- Base model on M1: Typically 4-6x real-time (processes 1 hour in 10-15 minutes)
- Large model on M1: Typically 0.5-1x real-time (processes 1 hour in 1-2 hours)
Multilingual Support
All Whisper models support multiple languages. You can configure your preferred transcription language in Settings → General → Transcription Language.
Larger models generally provide better accuracy for non-English languages, especially for languages with less training data.
For more information about Whisper models, visit the OpenAI Whisper repository.