Groq models

LPU-accelerated open models. Built for speed.

catalog updated 48m ago
Groq runs open models on its LPU inference engine — its edge is throughput (often 5–20× typical GPU speed). Live list: GET api.groq.com/openai/v1/models.
18 / 18
ModelMakerTypeContext$ In$ OutSpeed t/sCapabilityBest for
GPT-OSS 20B
openai/gpt-oss-20b
OpenAI (open-weight)chat131k$0.03$0.141.0k74Fastest reasoning-capable model on Groq (~1000 t/s advertised).
GPT-OSS Safeguard 20B preview
openai/gpt-oss-safeguard-20b
OpenAI (open-weight)guard131k$0.07$0.301.0k35Safety- and policy-tuned reasoning variant of gpt-oss-20b.
Llama 3.1 8B Instant
llama-3.1-8b-instant
Metachat131k$0.05$0.0884057Cheapest and one of the fastest models on Groq.
Qwen3 32B preview
qwen/qwen3-32b
Alibaba Cloud (Qwen)chat131k$0.08$0.2866272Strong multilingual and reasoning performance with toggleable thinking mode.
Llama 4 Scout (17Bx16E) preview
meta-llama/llama-4-scout-17b-16e-instruct
Metachat131k$0.11$0.3459471The only native vision/multimodal model in the self-serve catalog (replaced the retired Llama 3.2 Vision models and the deprecated Llama 4 Maverick).
GPT-OSS 120B
openai/gpt-oss-120b
OpenAI (open-weight)chat131k$0.04$0.1850087Flagship open-weight reasoning model on Groq and the official replacement for the deprecated Kimi K2-0905, Llama 4 Maverick, and DeepSeek-R1-distill.
Qwen3.6 27B preview
qwen/qwen3.6-27b
Alibaba Cloud (Qwen)chat262k$0.29$3.1750072Newer Qwen3.6 generation with improved reasoning.
Llama 3.3 70B Versatile
llama-3.3-70b-versatile
Metachat131k$0.59$0.7939482General-purpose dense workhorse with strong instruction following and broad knowledge.
Llama Guard 4 12B preview
meta-llama/llama-guard-4-12b
Metaguard164k$0.18$0.1832535Content-moderation/safety classifier that scores prompts and responses (incl.
Kimi K2
moonshotai/kimi-k2-instruct-0905
Moonshot AIchat262k$0.60$2.5020094Largest context window on GroqCloud (256K) and the priciest model on the platform.
Compound
groq/compound
Groqchat131kfreefree8062Turnkey agentic system that auto-uses real-time web search and code execution.
Compound Mini
groq/compound-mini
Groqchat131kfreefree8062Lighter, lower-latency variant of Compound for single-step tool use.
Llama Prompt Guard 2 86M preview
meta-llama/llama-prompt-guard-2-86m
Metaguard512$0.04$0.0435Small multilingual (8-language) classifier that flags prompt-injection and jailbreak attempts before they reach your main model.
Llama Prompt Guard 2 22M preview
meta-llama/llama-prompt-guard-2-22m
Metaguard512$0.02$0.0235DeBERTa-xsmall-based classifier; ~75% lower latency/compute than the 86M with minimal quality loss.
Whisper Large v3
whisper-large-v3
OpenAI (open-weight)audiofreefree25High-accuracy multilingual speech-to-text.
Whisper Large v3 Turbo
whisper-large-v3-turbo
OpenAI (open-weight)audiofreefree25Faster, cheaper Whisper variant.
Orpheus TTS (English)
canopylabs/orpheus-3b-0.1-ft-en
Canopy Labsaudiofreefree25Low-latency English text-to-speech.
Orpheus TTS (Arabic - Saudi)
canopylabs/orpheus-3b-0.1-ft-ar
Canopy Labsaudiofreefree25Saudi-Arabic text-to-speech.
Sign in to continue

LLM Switchboard is private — sign in with Authly to access the control room.

Sign in with Authly
← Back to home