DeepInfra Now Available as Hugging Face Inference Provider with SDK Support
The serverless inference platform DeepInfra has been integrated into Hugging Face's ecosystem, allowing developers to access models through the Hub's web interface and Python/JavaScript SDKs with billing options for direct or routed requests.
1 source · single source
- DeepInfra joined Hugging Face Inference Providers, enabling access to models like DeepSeek V4, Kimi-K2.6, and GLM-5.1 through the Hub's UI and client SDKs
- Users can configure API keys in account settings and choose between direct billing (via DeepInfra) or routed requests billed through Hugging Face
- Integration extends to agent harnesses including Pi, OpenCode, and Hermes Agents, with support for text generation and conversational tasks launching initially
- Hugging Face PRO subscribers receive $2 monthly inference credits applicable across all supported providers
Hugging Face has added DeepInfra to its growing roster of integrated Inference Providers, giving developers native access to the serverless platform's model catalog directly from model pages on the Hub. DeepInfra, which maintains a catalog exceeding 100 models and emphasizes competitive per-token pricing, initially supports text generation and conversational tasks, with LLMs including DeepSeek V4, Kimi-K2.6, and GLM-5.1 available at launch. Expanded support for text-to-image, text-to-video, and embedding tasks is planned.
The integration surfaces DeepInfra as an option in Hugging Face's web UI, allowing users to invoke models directly from model cards. Developers can establish preferences in account settings, specifying which providers to prioritize and whether to supply custom API keys. Two billing modes are available: customers can route requests through Hugging Face (charged to their HF account at standard provider rates with no markup), or authenticate directly to DeepInfra and incur charges on their own DeepInfra account.
Both the Python (huggingface_hub >= 1.11.2) and JavaScript (@huggingface/inference) SDKs now include DeepInfra support, enabling programmatic access via standard OpenAI-compatible client libraries. The integration extends to popular agent frameworks such as Pi, OpenCode, and Hermes Agents, reducing the friction of plugging DeepInfra-hosted models into downstream applications.
Hugging Face PRO subscribers receive $2 in monthly inference credits usable across all integrated providers, while free-tier users access a smaller quota of free inference capacity.
- Apr 29, 2026 · Hugging Face
Granite 4.1 LLMs use five-stage pre-training and multi-stage reinforcement learning to achieve dense model efficiency
Trust79 - Apr 29, 2026 · GitHub · anthropics/claude-code releases
Claude Code v2.1.122 adds Bedrock service tier selection and improves MCP server handling
Trust67 - Apr 29, 2026 · TechCrunch — AI
Amazon launches OpenAI models and new agent service on AWS Bedrock
Trust56