Groq models · LLM Switchboard

Groq runs open models on its LPU inference engine — its edge is throughput (often 5–20× typical GPU speed). Live list: GET api.groq.com/openai/v1/models.

Model	Maker	Type	Context	$ In	$ Out	Speed t/s	Capability	Best for
GPT-OSS 20B openai/gpt-oss-20b	OpenAI (open-weight)	chat	131k	$0.03	$0.14	1.0k	74	Fastest reasoning-capable model on Groq (~1000 t/s advertised).
GPT-OSS Safeguard 20B preview openai/gpt-oss-safeguard-20b	OpenAI (open-weight)	guard	131k	$0.07	$0.30	1.0k	35	Safety- and policy-tuned reasoning variant of gpt-oss-20b.
Llama 3.1 8B Instant llama-3.1-8b-instant	Meta	chat	131k	$0.05	$0.08	840	57	Cheapest and one of the fastest models on Groq.
Qwen3 32B preview qwen/qwen3-32b	Alibaba Cloud (Qwen)	chat	131k	$0.08	$0.28	662	72	Strong multilingual and reasoning performance with toggleable thinking mode.
Llama 4 Scout (17Bx16E) preview meta-llama/llama-4-scout-17b-16e-instruct	Meta	chat	131k	$0.11	$0.34	594	71	The only native vision/multimodal model in the self-serve catalog (replaced the retired Llama 3.2 Vision models and the deprecated Llama 4 Maverick).
GPT-OSS 120B openai/gpt-oss-120b	OpenAI (open-weight)	chat	131k	$0.04	$0.18	500	87	Flagship open-weight reasoning model on Groq and the official replacement for the deprecated Kimi K2-0905, Llama 4 Maverick, and DeepSeek-R1-distill.
Qwen3.6 27B preview qwen/qwen3.6-27b	Alibaba Cloud (Qwen)	chat	262k	$0.29	$3.17	500	72	Newer Qwen3.6 generation with improved reasoning.
Llama 3.3 70B Versatile llama-3.3-70b-versatile	Meta	chat	131k	$0.59	$0.79	394	82	General-purpose dense workhorse with strong instruction following and broad knowledge.
Llama Guard 4 12B preview meta-llama/llama-guard-4-12b	Meta	guard	164k	$0.18	$0.18	325	35	Content-moderation/safety classifier that scores prompts and responses (incl.
Kimi K2 moonshotai/kimi-k2-instruct-0905	Moonshot AI	chat	262k	$0.60	$2.50	200	94	Largest context window on GroqCloud (256K) and the priciest model on the platform.
Compound groq/compound	Groq	chat	131k	free	free	80	62	Turnkey agentic system that auto-uses real-time web search and code execution.
Compound Mini groq/compound-mini	Groq	chat	131k	free	free	80	62	Lighter, lower-latency variant of Compound for single-step tool use.
Llama Prompt Guard 2 86M preview meta-llama/llama-prompt-guard-2-86m	Meta	guard	512	$0.04	$0.04	—	35	Small multilingual (8-language) classifier that flags prompt-injection and jailbreak attempts before they reach your main model.
Llama Prompt Guard 2 22M preview meta-llama/llama-prompt-guard-2-22m	Meta	guard	512	$0.02	$0.02	—	35	DeBERTa-xsmall-based classifier; ~75% lower latency/compute than the 86M with minimal quality loss.
Whisper Large v3 whisper-large-v3	OpenAI (open-weight)	audio	—	free	free	—	25	High-accuracy multilingual speech-to-text.
Whisper Large v3 Turbo whisper-large-v3-turbo	OpenAI (open-weight)	audio	—	free	free	—	25	Faster, cheaper Whisper variant.
Orpheus TTS (English) canopylabs/orpheus-3b-0.1-ft-en	Canopy Labs	audio	—	free	free	—	25	Low-latency English text-to-speech.
Orpheus TTS (Arabic - Saudi) canopylabs/orpheus-3b-0.1-ft-ar	Canopy Labs	audio	—	free	free	—	25	Saudi-Arabic text-to-speech.