Post

Run a Local Text-Generation Pipeline on Apple Silicon

Run a small Hugging Face text-generation pipeline on an Apple Silicon Mac with MPS acceleration and CPU fallback.

Run a Local Text-Generation Pipeline on Apple Silicon

Apple Silicon Macs can run Hugging Face text-generation pipelines through PyTorch’s Metal Performance Shaders (MPS) backend. Start with a small model so you can verify the workflow before downloading larger weights.

Create an Environment

1
2
3
4
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install torch transformers

Check MPS Support

1
2
3
4
import torch

print("MPS built:", torch.backends.mps.is_built())
print("MPS available:", torch.backends.mps.is_available())

If is_available() is False, continue with the CPU. You still have a useful local test environment.

Run Text Generation

Create generate.py:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import torch
from transformers import pipeline

device = "mps" if torch.backends.mps.is_available() else "cpu"

generator = pipeline(
    task="text-generation",
    model="openai-community/gpt2",
    device=device,
)

result = generator(
    "A useful local AI workflow starts with",
    max_new_tokens=60,
    do_sample=True,
    temperature=0.8,
)

print("Device:", device)
print(result[0]["generated_text"])

Run it with python generate.py.

GPT-2 is a small demonstration model, not a modern assistant model. Its role is to prove that Python, Transformers, the model download, and MPS selection work.

Use CPU Fallback

MPS does not support every PyTorch operation. If a model fails on an unsupported operation, retry with CPU fallback enabled:

1
PYTORCH_ENABLE_MPS_FALLBACK=1 python generate.py

Apple Silicon uses unified memory. Increase model size gradually and leave memory available for macOS and the Python process.

Next Steps

Continue with the upcoming hardware guide before downloading a larger model.

References

This post is licensed under CC BY 4.0 by the author.