BOF.team
← All Projects
activeWave 1 · Foundations

Local LLM Primer & Private Client Demo

Deploy OpenAI's gpt-oss-20b via Ollama on local GPU hardware and serve through LibreChat, covering LLM fundamentals, hardware evaluation, and hands-on deployment for private inference.

llmlocal-inferencehardware

Overview

This project demonstrates a fully local AI stack by deploying OpenAI's gpt-oss-20b, an open-weight, MoE reasoning model, on local hardware and serving it through LibreChat, an open-source chat interface with native Ollama support. The workshop covers LLM fundamentals, hardware evaluation, model selection, and hands-on deployment — giving attendees both the conceptual grounding and practical skills to stand up their own local inference environment with a production-quality front end.

Key Concepts

  • How LLMs work: model architecture, parameters, memory requirements, reasoning, Mixture of Experts (MoE), and activated experts
  • Common hardware profiles: CPU vs. GPU inference, VRAM/RAM requirements, and realistic local configurations
  • Picking the right model: interpreting common benchmarks, right-sizing for your application, and avoiding over/under-provisioning
  • Running a model locally: installing and configuring Ollama, serving a model, and connecting it to LibreChat

Learning Outcomes

By the end of this workshop, attendees will be able to:

  • Explain how an LLM processes input and generates output, and why model size and hardware specs matter
  • Evaluate hardware for local inference and identify minimum viable configurations
  • Interpret common model benchmarks and use them to select an appropriate open-weight model for a given use case
  • Stand up a fully local inference stack with LibreChat as the chat interface
  • Articulate the privacy, compliance, and security benefits of local inference for organizations with data residency requirements or air-gapped environments

Deliverables

A documented, reproducible local AI deployment using Ollama and LibreChat, demonstrating competency in private, self-hosted inference. Directly applicable to organizations with data residency requirements or air-gapped security postures.

Applied Skills

  • LLM architecture fundamentals (transformer internals, quantization, MoE)
  • Hardware evaluation and VRAM/RAM sizing for inference workloads
  • Model benchmarking and right-sizing for application requirements
  • Local deployment using Ollama with LibreChat as the client interface