Groq runs open models on its LPU inference engine — its edge is throughput (often 5–20× typical GPU speed). Live list:
GET api.groq.com/openai/v1/models.| Model | Maker | Type | Context | $ In | $ Out | Speed t/s | Capability | Best for |
|---|---|---|---|---|---|---|---|---|
GPT-OSS 20B openai/gpt-oss-20b | OpenAI (open-weight) | chat | 131k | $0.03 | $0.14 | 1.0k | Fastest reasoning-capable model on Groq (~1000 t/s advertised). | |
GPT-OSS Safeguard 20B preview openai/gpt-oss-safeguard-20b | OpenAI (open-weight) | guard | 131k | $0.07 | $0.30 | 1.0k | Safety- and policy-tuned reasoning variant of gpt-oss-20b. | |
Llama 3.1 8B Instant llama-3.1-8b-instant | Meta | chat | 131k | $0.05 | $0.08 | 840 | Cheapest and one of the fastest models on Groq. | |
Qwen3 32B preview qwen/qwen3-32b | Alibaba Cloud (Qwen) | chat | 131k | $0.08 | $0.28 | 662 | Strong multilingual and reasoning performance with toggleable thinking mode. | |
Llama 4 Scout (17Bx16E) preview meta-llama/llama-4-scout-17b-16e-instruct | Meta | chat | 131k | $0.11 | $0.34 | 594 | The only native vision/multimodal model in the self-serve catalog (replaced the retired Llama 3.2 Vision models and the deprecated Llama 4 Maverick). | |
GPT-OSS 120B openai/gpt-oss-120b | OpenAI (open-weight) | chat | 131k | $0.04 | $0.18 | 500 | Flagship open-weight reasoning model on Groq and the official replacement for the deprecated Kimi K2-0905, Llama 4 Maverick, and DeepSeek-R1-distill. | |
Qwen3.6 27B preview qwen/qwen3.6-27b | Alibaba Cloud (Qwen) | chat | 262k | $0.29 | $3.17 | 500 | Newer Qwen3.6 generation with improved reasoning. | |
Llama 3.3 70B Versatile llama-3.3-70b-versatile | Meta | chat | 131k | $0.59 | $0.79 | 394 | General-purpose dense workhorse with strong instruction following and broad knowledge. | |
Llama Guard 4 12B preview meta-llama/llama-guard-4-12b | Meta | guard | 164k | $0.18 | $0.18 | 325 | Content-moderation/safety classifier that scores prompts and responses (incl. | |
Kimi K2 moonshotai/kimi-k2-instruct-0905 | Moonshot AI | chat | 262k | $0.60 | $2.50 | 200 | Largest context window on GroqCloud (256K) and the priciest model on the platform. | |
Compound groq/compound | Groq | chat | 131k | free | free | 80 | Turnkey agentic system that auto-uses real-time web search and code execution. | |
Compound Mini groq/compound-mini | Groq | chat | 131k | free | free | 80 | Lighter, lower-latency variant of Compound for single-step tool use. | |
Llama Prompt Guard 2 86M preview meta-llama/llama-prompt-guard-2-86m | Meta | guard | 512 | $0.04 | $0.04 | — | Small multilingual (8-language) classifier that flags prompt-injection and jailbreak attempts before they reach your main model. | |
Llama Prompt Guard 2 22M preview meta-llama/llama-prompt-guard-2-22m | Meta | guard | 512 | $0.02 | $0.02 | — | DeBERTa-xsmall-based classifier; ~75% lower latency/compute than the 86M with minimal quality loss. | |
Whisper Large v3 whisper-large-v3 | OpenAI (open-weight) | audio | — | free | free | — | High-accuracy multilingual speech-to-text. | |
Whisper Large v3 Turbo whisper-large-v3-turbo | OpenAI (open-weight) | audio | — | free | free | — | Faster, cheaper Whisper variant. | |
Orpheus TTS (English) canopylabs/orpheus-3b-0.1-ft-en | Canopy Labs | audio | — | free | free | — | Low-latency English text-to-speech. | |
Orpheus TTS (Arabic - Saudi) canopylabs/orpheus-3b-0.1-ft-ar | Canopy Labs | audio | — | free | free | — | Saudi-Arabic text-to-speech. |