Meta's new Llama 3.3 LLM is now accessible on macOS through Ollama, offering powerful capabilities in a more efficient package. While matching the performance of Llama 3.1 (405B) on key benchmarks, Llama 3.3 is significantly smaller and optimized for multilingual dialogue. The enhanced efficiency technically lowers barriers to entry, though you'll still want a high-end workstation to get the best experience. This guide will walk you through setting up and running Llama 3.3 using Ollama, with recommended specs for smooth operation.
Model Variants and System Requirements
Below is a comprehensive table of available Llama 3.3 variants with their specific requirements:
Model Variant | Size | Minimum RAM Requirement | Recommended RAM | Storage Needed | Recommended Mac Models |
---|---|---|---|---|---|
latest | 43GB | 64GB | 96GB | 50GB | Mac Studio M2 Max/Ultra |
70b | 43GB | 64GB | 96GB | 50GB | Mac Studio M2 Max/Ultra |
70b-instruct-fp16 | 141GB | 140GB | 192GB | 150GB | Mac Studio M2 Ultra |
70b-instruct-q2_K | 26GB | 32GB | 64GB | 30GB | MacBook Pro M2 Max |
70b-instruct-q3KM | 34GB | 32GB | 64GB | 40GB | MacBook Pro (Max) |
70b-instruct-q3KS | 31GB | 32GB | 64GB | 35GB | MacBook Pro (Max) |
70b-instruct-q4_0 | 40GB | 48GB | 96GB | 45GB | Mac Studio (Max) |
70b-instruct-q4_1 | 44GB | 48GB | 96GB | 50GB | Mac Studio (Max) |
70b-instruct-q4KM | 43GB | 48GB | 96GB | 50GB | Mac Studio (Max) |
70b-instruct-q4KS | 40GB | 48GB | 96GB | 45GB | Mac Studio (Max) |
70b-instruct-q5_0 | 49GB | 64GB | 96GB | 55GB | Mac Studio M2 Max/Ultra |
70b-instruct-q5_1 | 53GB | 64GB | 96GB | 60GB | Mac Studio M2 Max/Ultra |
70b-instruct-q5KM | 50GB | 64GB | 96GB | 55GB | Mac Studio M2 Max/Ultra |
70b-instruct-q6_K | 58GB | 96GB | 192GB | 65GB | Mac Studio M2 Ultra |
70b-instruct-q8_0 | 75GB | 96GB | 192GB | 80GB | Mac Studio M2 Ultra |
Note:
- All models require MacOS Monterey (12.0) or later
- RAM requirements assume using Apple's unified memory architecture
- Performance may degrade if using minimum RAM requirements due to memory swapping
- For optimal performance, recommended RAM should be used
Installing Ollama on MacOS
- Download the Installer: Visit the Ollama website and download the MacOS installer.
- Run the Installer: Open the
.zip
file and move the Ollama app to the Applications folder. - Verify Installation: Open Terminal and run:
ollama --version
If installed correctly, this command will display the version information.
Step 1: Pull Llama 3.3
Download the desired variant of the Llama 3.3 model:
ollama pull llama3.3
For specific variants, use the appropriate tag. For example:
ollama pull llama3.3:70b-instruct-q4_0
Step 2: Run Llama 3.3
You can run the model in two ways:
Option 1: Interactive Mode
Start the model in interactive mode to enter prompts manually:
ollama run llama3.3
Option 2: Direct Command Prompt
Run the model with a predefined text prompt directly:
ollama run llama3.3 "Explain the applications of Llama 3.3."
Managing Models
- List Installed Models:
ollama list
- Stop a Running Model:
ollama stop llama3.3
- Remove a Model:
ollama rm llama3.3
Model Selection Guidelines
- For development and testing: Use highly quantized models (q2K, q3K) on MacBook Pro with 32GB+ RAM
- For general use: Use q4_x variants on Mac Studio with 64GB+ RAM
- For high-quality inference: Use q5x or q6K variants on Mac Studio with 96GB+ RAM
- For maximum quality: Use fp16 variant on Mac Studio M2 Ultra with 192GB RAM
Evaluate Llama 3.3
Llama 3.3 is now available on eval.supa.so, where you can compare and evaluate it against other models.