Run a Local Text-Generation Pipeline on Apple Silicon

Run a small Hugging Face text-generation pipeline on an Apple Silicon Mac with MPS acceleration and CPU fallback.

Posted Jun 15, 2026 Updated Jun 15, 2026

By Sean

1 min read

Apple Silicon Macs can run Hugging Face text-generation pipelines through PyTorch’s Metal Performance Shaders (MPS) backend. Start with a small model so you can verify the workflow before downloading larger weights.

Create an Environment

  
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install torch transformers

Check MPS Support

  
import torch

print("MPS built:", torch.backends.mps.is_built())
print("MPS available:", torch.backends.mps.is_available())

If is_available() is False, continue with the CPU. You still have a useful local test environment.

Run Text Generation

Create generate.py:

  
import torch
from transformers import pipeline

device = "mps" if torch.backends.mps.is_available() else "cpu"

generator = pipeline(
    task="text-generation",
    model="openai-community/gpt2",
    device=device,
)

result = generator(
    "A useful local AI workflow starts with",
    max_new_tokens=60,
    do_sample=True,
    temperature=0.8,
)

print("Device:", device)
print(result[0]["generated_text"])

Run it with python generate.py.

GPT-2 is a small demonstration model, not a modern assistant model. Its role is to prove that Python, Transformers, the model download, and MPS selection work.

Use CPU Fallback

MPS does not support every PyTorch operation. If a model fails on an unsupported operation, retry with CPU fallback enabled:

PYTORCH_ENABLE_MPS_FALLBACK=1 python generate.py

Apple Silicon uses unified memory. Increase model size gradually and leave memory available for macOS and the Python process.

Next Steps

Continue with the upcoming hardware guide before downloading a larger model.

References

Python, AI

This post is licensed under CC BY 4.0 by the author.