Documentation

Learn how to use Notably

AI Models

Understanding Whisper models and how to choose the right one for your needs

What is Whisper?

Notably uses OpenAI's Whisper models for speech recognition. Whisper is a state-of-the-art automatic speech recognition (ASR) system trained on 680,000 hours of multilingual data.

All processing runs 100% locally on your device. No audio data is ever sent to the cloud, ensuring complete privacy and offline functionality.

Available Models

Whisper comes in five different sizes, each with its own tradeoffs between accuracy, speed, and storage requirements:

ModelSizeSpeedAccuracyBest For
tiny~75 MBVery FastBasicQuick tests, low-resource devices
base~140 MBFastGoodReal-time transcription, balanced use
small~460 MBModerateVery GoodRecommended for most users
medium~1.5 GBSlowExcellentHigh-accuracy post-recording
large~3 GBVery SlowBestMaximum accuracy, complex audio

Choosing the Right Model

For Real-Time Transcription

Real-time transcription requires fast processing to keep up with live audio. Recommended models:

  • tiny: Fastest option, suitable for quick previews or low-resource devices
  • base: Best balance of speed and accuracy for real-time use (recommended)
  • small: Usable in real-time on modern Macs, better accuracy but slower

For Post-Recording Transcription

Post-recording transcription can take more time to produce highly accurate results. Recommended models:

  • base: Quick results with good accuracy
  • small: Excellent balance for most meetings (recommended)
  • medium: High accuracy for important recordings
  • large: Maximum accuracy for complex audio or critical transcripts

Recommendation

Start with base for real-time and small for post-recording. This provides excellent results while maintaining reasonable speed and storage requirements. You can always switch models later or retranscribe recordings with different models.

Managing Models

Downloading Models

Models must be downloaded before use. To download a model:

  1. 1. Open Settings (⌘,)
  2. 2. Navigate to the Models tab
  3. 3. Browse available models and view their sizes
  4. 4. Click download and monitor the progress indicator

Storage and Deletion

Models are stored locally on your device. You can:

  • View storage space used by each model
  • Delete models you no longer need to free up disk space
  • Re-download models at any time

Deleting a model does not affect existing transcriptions - only prevents creating new ones with that model until re-downloaded.

Configuring Default Models

Notably allows you to configure separate default models for different use cases:

  • For Real-Time Transcription: Used during live recording
  • For Post-Recording Transcription: Used when transcribing completed recordings

Configure these in Settings → General. You can also choose different models when manually requesting a new transcription for any recording.

Performance Considerations

Hardware Requirements

  • Apple Silicon (M1/M2/M3): All models run efficiently, real-time works well up to medium
  • Intel Macs: Smaller models recommended (tiny, base, small)
  • RAM: Large models may require 8GB+ for smooth operation

Processing Speed

Actual processing speed depends on your hardware:

  • Real-time factor: How many seconds of audio can be processed per second
  • Base model on M1: Typically 4-6x real-time (processes 1 hour in 10-15 minutes)
  • Large model on M1: Typically 0.5-1x real-time (processes 1 hour in 1-2 hours)

Multilingual Support

All Whisper models support multiple languages. You can configure your preferred transcription language in Settings → General → Transcription Language.

Larger models generally provide better accuracy for non-English languages, especially for languages with less training data.

For more information about Whisper models, visit the OpenAI Whisper repository.