Getting Started with Llama 3.3 Using Ollama (MacOS)

Meta's new Llama 3.3 LLM is now accessible on macOS through Ollama, offering powerful capabilities in a more efficient package. While matching the performance of Llama 3.1 (405B) on key benchmarks, Llama 3.3 is significantly smaller and optimized for multilingual dialogue. The enhanced efficiency technically lowers barriers to entry, though you'll still want a high-end workstation to get the best experience. This guide will walk you through setting up and running Llama 3.3 using Ollama, with recommended specs for smooth operation.

Model Variants and System Requirements

Below is a comprehensive table of available Llama 3.3 variants with their specific requirements:

Model Variant	Size	Minimum RAM Requirement	Recommended RAM	Storage Needed	Recommended Mac Models
latest	43GB	64GB	96GB	50GB	Mac Studio M2 Max/Ultra
70b	43GB	64GB	96GB	50GB	Mac Studio M2 Max/Ultra
70b-instruct-fp16	141GB	140GB	192GB	150GB	Mac Studio M2 Ultra
70b-instruct-q2_K	26GB	32GB	64GB	30GB	MacBook Pro M2 Max
70b-instruct-q3KM	34GB	32GB	64GB	40GB	MacBook Pro (Max)
70b-instruct-q3KS	31GB	32GB	64GB	35GB	MacBook Pro (Max)
70b-instruct-q4_0	40GB	48GB	96GB	45GB	Mac Studio (Max)
70b-instruct-q4_1	44GB	48GB	96GB	50GB	Mac Studio (Max)
70b-instruct-q4KM	43GB	48GB	96GB	50GB	Mac Studio (Max)
70b-instruct-q4KS	40GB	48GB	96GB	45GB	Mac Studio (Max)
70b-instruct-q5_0	49GB	64GB	96GB	55GB	Mac Studio M2 Max/Ultra
70b-instruct-q5_1	53GB	64GB	96GB	60GB	Mac Studio M2 Max/Ultra
70b-instruct-q5KM	50GB	64GB	96GB	55GB	Mac Studio M2 Max/Ultra
70b-instruct-q6_K	58GB	96GB	192GB	65GB	Mac Studio M2 Ultra
70b-instruct-q8_0	75GB	96GB	192GB	80GB	Mac Studio M2 Ultra

Note:

All models require MacOS Monterey (12.0) or later
RAM requirements assume using Apple's unified memory architecture
Performance may degrade if using minimum RAM requirements due to memory swapping
For optimal performance, recommended RAM should be used

Installing Ollama on MacOS

Download the Installer: Visit the Ollama website and download the MacOS installer.
Run the Installer: Open the .zip file and move the Ollama app to the Applications folder.
Verify Installation: Open Terminal and run:

ollama --version

If installed correctly, this command will display the version information.

Step 1: Pull Llama 3.3

Download the desired variant of the Llama 3.3 model:

ollama pull llama3.3

For specific variants, use the appropriate tag. For example:

ollama pull llama3.3:70b-instruct-q4_0

Step 2: Run Llama 3.3

You can run the model in two ways:

Option 1: Interactive Mode

Start the model in interactive mode to enter prompts manually:

ollama run llama3.3

Option 2: Direct Command Prompt

Run the model with a predefined text prompt directly:

ollama run llama3.3 "Explain the applications of Llama 3.3."

Managing Models

List Installed Models:

ollama list

Stop a Running Model:

ollama stop llama3.3

Remove a Model:

ollama rm llama3.3

Model Selection Guidelines

For development and testing: Use highly quantized models (q2K, q3K) on MacBook Pro with 32GB+ RAM
For general use: Use q4_x variants on Mac Studio with 64GB+ RAM
For high-quality inference: Use q5x or q6K variants on Mac Studio with 96GB+ RAM
For maximum quality: Use fp16 variant on Mac Studio M2 Ultra with 192GB RAM

Evaluate Llama 3.3

Llama 3.3 is now available on eval.supa.so, where you can compare and evaluate it against other models.

Getting Started with Llama 3.3 Using Ollama (MacOS)

Model Variants and System Requirements

Installing Ollama on MacOS

Step 1: Pull Llama 3.3

Step 2: Run Llama 3.3

Option 1: Interactive Mode

Option 2: Direct Command Prompt

Managing Models

Model Selection Guidelines

Evaluate Llama 3.3

Wei-Ming Thor

Background

Skills/Languages

Work

Location

Open Source

Support