Overview · LLM Switchboard

Stop guessing which model to use.

LLM Switchboard catalogs every open model on Groq and NVIDIA build.nvidia.com — plus 303+ models you can run on your own hardware — scores them on the dimensions that matter for your job, and routes each request to the best fit. Cloud or local. One API. No lock-in.

⚡ 45 cloud models Groq · 18 NVIDIA · 27 ⬇ 303 run locally < 25GB

Cloud models

18 Groq · 27 NVIDIA

303

Local models < 25GB

run on your own hardware

1.0k t/s

Fastest (cloud)

GPT-OSS 20B

Benchmarks graded

by trustworthiness (BQS)

⬇ Run it on your own hardware no API key, $0

Browse 303 models →

303 open models under 25 GB with one-command Ollama/Docker setup, picked by benchmark — and a built-in sandbox to test them in-browser. reasoning 80 · coding 35 · vision 46 · STT 45 · TTS 38 · embeddings 59

What's inside

⚡

Smart router

Classifies the job, filters by your constraints, and ranks models with a transparent score. Callable as a REST API or an importable module.

Open →

◑

Business use cases

Ready-made recipes — MEDDIC, BANT, SDR chat, AI voice, compliance triage — each with the routed model, an example result, and a live test.

Open →

▤

Benchmark intelligence

Not just model scores — we grade the benchmarks themselves with the Benchmark² framework, so you trust the right signal.

Open →

◢

Groq + NVIDIA catalogs

Live context windows, pricing and speed for every hosted model, normalized into one schema.

Open →

⬇

Run locally

300+ open models under 25GB for reasoning, coding, vision, STT, TTS & embeddings — with copy-paste Ollama/Docker commands, benchmarks, and a live Hugging Face discovery button.

Open →

⬡

Agent harnesses & skills

A buyer's guide to LangGraph, CrewAI, DeerFlow, the SDKs — plus the portable SKILL.md ecosystem.

Open →

◇

Vector databases

From pgvector to Milvus to Alibaba's zvec — pick the right memory layer for RAG and agents.

Open →

Talk to the SDR live demo

This is use case #3, embedded

The homepage SDR chat is just a LLM Switchboard recipe: job type chat, policy fastest that passes, pinned to Groq for sub-second replies. Drop a key in .env and it answers live; without one it shows which model it would call.

See all business recipes →