How to Run AI on Your Own Computer — No Cloud, No Subscription

Run AI models locally on your PC or Mac with LM Studio or Ollama. Free, private, offline. Step-by-step setup guide for beginners.

AI Tutorials · · Updated · 5 min read · beginner · 20 min

Quick answer

You can run AI models locally on your own computer using free tools like LM Studio or Ollama. LM Studio gives you a visual interface — download it, pick a model like Llama 3.3 8B, and start chatting. Ollama is command-line based. Both work offline with no subscriptions, no data leaving your machine, and no usage limits.

Why Run AI Locally?

Three reasons people ditch the cloud:

  1. Privacy — your prompts and data never leave your machine
  2. Cost — no subscription, no usage limits, completely free after download
  3. Offline access — works on a plane, in a cabin, anywhere

The trade-off is power. Cloud models like Claude and ChatGPT have hundreds of billions of parameters running on data centre GPUs. Local models are smaller and slower. But for many tasks — writing help, summarising documents, brainstorming, basic coding — they’re more than enough.

What You Need

Minimum (runs 7B models)

  • 8GB RAM
  • Modern CPU (Intel 8th gen+ or AMD Ryzen 3000+)
  • 10GB free disk space
  • No GPU required
  • 16GB RAM
  • Any dedicated GPU with 6GB+ VRAM
  • 30GB free disk space

Ideal (runs 70B models)

  • 32GB+ RAM
  • NVIDIA RTX 3060 12GB or better
  • 100GB+ free disk space

Apple Silicon note: M1, M2, M3, and M4 Macs are unusually good at local AI. Their unified memory architecture means the GPU can access all system RAM. An M2 MacBook Air with 16GB can run 13B models faster than most Windows laptops with discrete GPUs.

Option 1: LM Studio (Visual Interface)

LM Studio is the easiest way to get started. It’s a desktop app with a clean interface — no terminal required.

Installation

Download from lmstudio.ai. Available for Windows, macOS, and Linux. Install like any normal app.

Downloading Your First Model

  1. Open LM Studio and click the Discover tab
  2. Search for “Llama 3.3 8B Instruct”
  3. Pick the Q4_K_M quantisation (best balance of quality and size, about 5GB)
  4. Click Download and wait

What’s quantisation? Full AI models are enormous. Quantisation compresses them by reducing number precision. Q4 means 4-bit — roughly 4x smaller than the original with minimal quality loss. Always start with Q4_K_M.

Your First Conversation

  1. Click the Chat tab
  2. Select your downloaded model from the dropdown
  3. Type a message and press Enter

That’s it. You’re running AI locally.

Settings Worth Changing

SettingWhat It DoesRecommended
Context LengthHow much text the model remembers4096 (raise if you have RAM)
TemperatureCreativity vs consistency0.7 for chat, 0.2 for factual tasks
GPU OffloadLayers processed by GPUMax your GPU VRAM allows
System PromptPersistent instructionsSet your own or leave default

Option 2: Ollama (Command Line)

Ollama is leaner. No GUI — just your terminal. It’s faster to set up and better for automation.

Installation

Mac/Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows: Download from ollama.com and run the installer.

Download and Run

ollama pull llama3.3
ollama run llama3.3

You’re now in a chat session. Type your prompt and press Enter. Type /bye to exit.

Useful Commands

ollama list              # See downloaded models
ollama pull mistral      # Download Mistral (great for coding)
ollama rm llama3.3       # Delete a model
ollama serve             # Start API server on localhost:11434

The API server is powerful. It means you can use local AI from any tool that supports the OpenAI API format — including some automation workflows.

Which Models to Try

ModelParametersSizeBest For
Llama 3.3 8B8B~5GBGeneral chat, first model
Mistral 7B7B~4GBCoding, concise answers
Phi-3 Mini3.8B~2GBWeak hardware, fast responses
Llama 3.3 70B70B~40GBNear-cloud quality (needs 64GB RAM)
CodeLlama 34B34B~20GBDedicated coding model
Gemma 2 9B9B~6GBGoogle’s open model, strong reasoning

Start with Llama 3.3 8B. It’s the most capable small model and runs on almost anything.

Local AI vs Cloud AI — Honest Comparison

Local AICloud AI (Claude, ChatGPT)
CostFree forever$20/month+
PrivacyCompleteData processed on company servers
OfflineYesNo
SpeedDepends on hardwareConsistently fast
QualityGood to great (model dependent)State of the art
Context window4K-32K typical128K-200K
MultimodalLimitedImages, audio, video, files
UpdatesManual model downloadsAutomatic

The honest take: local AI is a complement to cloud AI, not a replacement. Use local for private data, offline work, and unlimited usage. Use Claude or ChatGPT when you need maximum capability.

Common Issues

“It’s really slow” — You’re probably running a model too large for your RAM. Drop to a smaller model or lower quantisation (Q3 instead of Q4).

“The answers are worse than ChatGPT” — Expected with 7-8B models. Try a 13B or 70B model if your hardware allows. Also make sure you’re using an “Instruct” model, not a base model.

“It’s using all my RAM” — AI models load entirely into memory. Close other apps or use a smaller model. Check your GPU offload settings — offloading to GPU frees system RAM.

“GPU not detected” — In LM Studio, check Settings > Hardware. For NVIDIA, ensure CUDA drivers are installed. For AMD, ROCm support is improving but still less reliable.

What’s Next

Once you’re comfortable with local AI:

  • Try building automations that use your local model’s API
  • Experiment with system prompts to customise behaviour
  • If you want cloud-level intelligence without the subscription, check the best free AI tools for generous free tiers
  • Learn about RAG to make your local model answer questions about your own documents

Frequently asked questions

Can I run AI on my laptop for free?
Yes. Tools like LM Studio and Ollama let you download and run open-source AI models completely free. You need at least 8GB of RAM for small models (7-8B parameters) and 16GB+ for larger ones. No GPU required for basic models, though a dedicated GPU dramatically improves speed.
Is local AI as good as ChatGPT or Claude?
Smaller local models (7-8B parameters) are noticeably less capable than ChatGPT or Claude for complex reasoning. However, 70B+ models running locally can rival GPT-4 class performance. The sweet spot for most people is a 13B model — capable enough for most tasks, runs on consumer hardware.
What computer do I need to run AI locally?
Minimum: 8GB RAM and a modern CPU (last 5 years). This runs 7B models slowly but works. Recommended: 16GB RAM and any dedicated GPU with 6GB+ VRAM. Ideal: 32GB RAM with an NVIDIA RTX 3060 or better. Apple Silicon Macs (M1-M4) are excellent for local AI due to unified memory.
Is running AI locally private?
Yes, completely. Nothing leaves your computer. No internet connection required after downloading the model. Your prompts, documents, and conversations stay entirely on your machine. This is why local AI is popular with lawyers, doctors, and anyone handling sensitive data.
LM Studio vs Ollama — which should I use?
LM Studio if you want a visual interface with a chat window, model browser, and easy settings. Ollama if you're comfortable with the terminal and want to integrate AI into scripts or other tools. Both use the same underlying models — the difference is the interface.

Want to keep learning?

Explore our guided learning paths or try building something with AI right now.

Enjoyed this article?

Subscribe for more AI insights delivered to your inbox every week.

No spam. Unsubscribe anytime.