Overview
This project demonstrates a fully local AI stack by deploying OpenAI's gpt-oss-20b, an open-weight, MoE reasoning model, on local hardware and serving it through LibreChat, an open-source chat interface with native Ollama support. The workshop covers LLM fundamentals, hardware evaluation, model selection, and hands-on deployment — giving attendees both the conceptual grounding and practical skills to stand up their own local inference environment with a production-quality front end.
Key Concepts
- How LLMs work: model architecture, parameters, memory requirements, reasoning, Mixture of Experts (MoE), and activated experts
- Common hardware profiles: CPU vs. GPU inference, VRAM/RAM requirements, and realistic local configurations
- Picking the right model: interpreting common benchmarks, right-sizing for your application, and avoiding over/under-provisioning
- Running a model locally: installing and configuring Ollama, serving a model, and connecting it to LibreChat
Learning Outcomes
By the end of this workshop, attendees will be able to:
- Explain how an LLM processes input and generates output, and why model size and hardware specs matter
- Evaluate hardware for local inference and identify minimum viable configurations
- Interpret common model benchmarks and use them to select an appropriate open-weight model for a given use case
- Stand up a fully local inference stack with LibreChat as the chat interface
- Articulate the privacy, compliance, and security benefits of local inference for organizations with data residency requirements or air-gapped environments
Deliverables
A documented, reproducible local AI deployment using Ollama and LibreChat, demonstrating competency in private, self-hosted inference. Directly applicable to organizations with data residency requirements or air-gapped security postures.
Applied Skills
- LLM architecture fundamentals (transformer internals, quantization, MoE)
- Hardware evaluation and VRAM/RAM sizing for inference workloads
- Model benchmarking and right-sizing for application requirements
- Local deployment using Ollama with LibreChat as the client interface