Why Run Models Locally?

Module 5 of the workshop includes a hands-on exercise where you run an open-weight language model on your own hardware and compare it against a cloud-based tool on a real coding task. Before you do that, it is worth understanding why local models matter – not just for the exercise, but as a practical tool in your work.

Privacy

When you use a cloud AI tool – GitHub Copilot, Claude Code, or ChatGPT – your code is sent to a third-party server. For autocomplete tools, this happens continuously and automatically. Proprietary code, API keys, database schemas, and business logic all travel over the wire.

Local models keep everything on your machine. The code never leaves. This matters in any context where your work is sensitive, confidential, or regulated.

Resilience and Independence

Cloud providers can have outages, impose rate limits without notice, change their pricing, or be unavailable due to geofencing (Anthropic’s models, for instance, are region-locked and will stop working in some countries). If your workflow is entirely dependent on one provider, a single policy change can bring your day to a halt.

Local models are always available, always at the same cost, and never subject to someone else’s policy decisions.

Cost

Current AI coding tools are heavily subsidized by venture capital. The companies running them are not profitable. Prices will almost certainly rise as investors demand returns. Local models, once downloaded, cost nothing per token – just electricity.

Environmental Efficiency

The Qwen3.5 and Qwen3.6 models we use in this workshop are strikingly efficient for their capability. On Apple Silicon, inference uses a fraction of the energy of a cloud request that involves datacentre cooling, networking, and server overhead.

The Tradeoffs

Local models are not a free lunch:

Performance ceiling. The most capable local models (27B at Q4) are roughly equivalent to GPT-4-class performance, but only on sufficient hardware. Smaller models on modest hardware are noticeably weaker than top cloud models.
Setup complexity. There is a one-time setup cost. The guides in this section make it as smooth as possible.
Provenance concerns. Many of the best open-weight models are trained by Chinese companies (Alibaba’s Qwen, DeepSeek). This may be relevant in government-adjacent or security-sensitive workplaces.
Hardware requirements. Running larger models well requires a modern machine with enough memory. See your platform’s guide for specifics.

These are real tradeoffs worth knowing. The point of Exercise 2 is to experience them directly rather than reason about them in the abstract.