Getting Started with Llama 3.3 Using Ollama (MacOS)

Meta's new Llama 3.3 LLM is now accessible on macOS through Ollama, offering powerful capabilities in a more efficient package. While matching the performance of Llama 3.1 (405B) on key benchmarks, Llama 3.3 is significantly smaller and optimized for multilingual dialogue. The enhanced efficiency technically lowers barriers to entry, though you'll still want a high-end workstation to get the best experience. This guide will walk you through setting up and running Llama 3.3 using Ollama, with recommended specs for smooth operation.

Model Variants and System Requirements

Below is a comprehensive table of available Llama 3.3 variants with their specific requirements:

Model Variant Size Minimum RAM Requirement Recommended RAM Storage Needed Recommended Mac Models
latest 43GB 64GB 96GB 50GB Mac Studio M2 Max/Ultra
70b 43GB 64GB 96GB 50GB Mac Studio M2 Max/Ultra
70b-instruct-fp16 141GB 140GB 192GB 150GB Mac Studio M2 Ultra
70b-instruct-q2_K 26GB 32GB 64GB 30GB MacBook Pro M2 Max
70b-instruct-q3KM 34GB 32GB 64GB 40GB MacBook Pro (Max)
70b-instruct-q3KS 31GB 32GB 64GB 35GB MacBook Pro (Max)
70b-instruct-q4_0 40GB 48GB 96GB 45GB Mac Studio (Max)
70b-instruct-q4_1 44GB 48GB 96GB 50GB Mac Studio (Max)
70b-instruct-q4KM 43GB 48GB 96GB 50GB Mac Studio (Max)
70b-instruct-q4KS 40GB 48GB 96GB 45GB Mac Studio (Max)
70b-instruct-q5_0 49GB 64GB 96GB 55GB Mac Studio M2 Max/Ultra
70b-instruct-q5_1 53GB 64GB 96GB 60GB Mac Studio M2 Max/Ultra
70b-instruct-q5KM 50GB 64GB 96GB 55GB Mac Studio M2 Max/Ultra
70b-instruct-q6_K 58GB 96GB 192GB 65GB Mac Studio M2 Ultra
70b-instruct-q8_0 75GB 96GB 192GB 80GB Mac Studio M2 Ultra

Note:

  • All models require MacOS Monterey (12.0) or later
  • RAM requirements assume using Apple's unified memory architecture
  • Performance may degrade if using minimum RAM requirements due to memory swapping
  • For optimal performance, recommended RAM should be used

Installing Ollama on MacOS

  1. Download the Installer: Visit the Ollama website and download the MacOS installer.
  2. Run the Installer: Open the .zip file and move the Ollama app to the Applications folder.
  3. Verify Installation: Open Terminal and run:
ollama --version

If installed correctly, this command will display the version information.

Step 1: Pull Llama 3.3

Download the desired variant of the Llama 3.3 model:

ollama pull llama3.3

For specific variants, use the appropriate tag. For example:

ollama pull llama3.3:70b-instruct-q4_0

Step 2: Run Llama 3.3

You can run the model in two ways:

Option 1: Interactive Mode

Start the model in interactive mode to enter prompts manually:

ollama run llama3.3

Option 2: Direct Command Prompt

Run the model with a predefined text prompt directly:

ollama run llama3.3 "Explain the applications of Llama 3.3."

Managing Models

  • List Installed Models:
ollama list
  • Stop a Running Model:
ollama stop llama3.3
  • Remove a Model:
ollama rm llama3.3

Model Selection Guidelines

  • For development and testing: Use highly quantized models (q2K, q3K) on MacBook Pro with 32GB+ RAM
  • For general use: Use q4_x variants on Mac Studio with 64GB+ RAM
  • For high-quality inference: Use q5x or q6K variants on Mac Studio with 96GB+ RAM
  • For maximum quality: Use fp16 variant on Mac Studio M2 Ultra with 192GB RAM

Evaluate Llama 3.3

Llama 3.3 is now available on eval.supa.so, where you can compare and evaluate it against other models.

Wei-Ming Thor

I create practical guides on Software Engineering, Data Science, and Machine Learning.

Background

Full-stack engineer who builds web and mobile apps. Now, exploring Machine Learning and Data Engineering. Read more

Writing unmaintainable code since 2010.

Skill/languages

Best: JavaScript, Python
Others: Android, iOS, C, React Native, Ruby, PHP

Work

Engineering Manager

Location

Kuala Lumpur, Malaysia

Open Source
Support

Turn coffee into coding guides. Buy me coffee