
The hf CLI is your command-line gateway to the Hugging Face Hub, offering a powerful, terminal-based alternative to the Python SDK. Whether you’re downloading models, managing repositories, or running jobs, hf streamlines your workflow directly from your shell. It supports everything from creating branches and pull requests to managing buckets, collections, and inference endpoints, putting the full power of the Hub at your fingertips.
Historically, hf was built with human users in mind, but the landscape is evolving. We’ve seen a dramatic increase in its adoption by coding agents like Claude Code, Codex, and Cursor. Recognizing this shift, we undertook a significant overhaul of the hf CLI to serve both human developers and AI agents seamlessly, optimizing for efficiency and user experience across the board. This article delves into the changes we’ve made and how we benchmarked their effectiveness.
The Rise of AI Agents on the Hugging Face Hub
The presence of AI agents on the Hugging Face Hub is no longer a niche phenomenon; it’s a rapidly growing trend. Since April 2026, we’ve been actively tracking agent usage, identifying them through specific environment variables they set (e.g., CLAUDECODE, CODEX_SANDBOX, AI_AGENT). This detection mechanism serves a dual purpose: it allows the CLI to tailor its output for agents and enables us to attribute Hub traffic to specific agent types.
The numbers are already significant: Claude Code alone accounts for approximately 40,000 distinct users and nearly 49 million requests, with Codex following closely. These early figures underscore the substantial and increasing role coding agents play in interacting with the Hub. We anticipate this growth to continue as agents become an integral part of how developers work with AI resources.
Designed for Dual Audiences: Humans and AI Agents
Humans and AI agents have fundamentally different expectations when it comes to CLI output. A human typically desires rich, visually appealing terminal output—think ANSI colors, neatly padded and truncated tables, clear success/error messages, and progress bars. An agent, however, needs the exact opposite: raw, untruncated, structured data that is compact and free of visual clutter, making it easy to parse and light on tokens.
Our solution, introduced in hf v1.9.0, is an intelligent agent-mode output that automatically adapts based on context. When hf detects an agent, it renders commands differently, optimizing for parseability and token efficiency without requiring a special flag. For example, a human running hf models ls --author Qwen --limit 3 will see a colorful, truncated table with helpful hints, while an agent will receive a clean TSV (Tab-Separated Values) output with full IDs, ISO timestamps, and all tags, completely free of ANSI codes.
Beyond simple formatting, we’ve implemented logging methods like .table(...), .result(...), and .json() that take raw data and apply the appropriate rendering. We also provide --json and --quiet options for explicit control, allowing users or agents to force their preferred output format. The default mode is intelligently chosen, but flexibility remains paramount.
Streamlining Workflows with Smart Hints and Retry Safety
CLI commands are rarely standalone; they often form a chain of actions. To facilitate this, many hf commands now conclude with a “hint”—the exact next command, pre-filled with relevant IDs. For instance, starting a job in the background will suggest the command to fetch its logs, or creating a Space will point you to its boot status. This feature acts as a convenient guide for humans and a crucial rail for agents, significantly reducing the steps needed to determine the next action.
Error messages also follow this pattern, providing actionable fixes instead of just failures (e.g., “Not logged in. Run hf auth login first.”). Importantly, all hints, warnings, and errors are directed to stderr, ensuring they don’t interfere with the structured data output to stdout that agents rely on. Furthermore, hf prioritizes non-blocking operations and retry safety. Destructive commands in agent mode fail fast with a fix (e.g., “Use --yes to skip confirmation.”), and operations are designed to be idempotent. For example, hf repos create --exist-ok is a no-op if the repo already exists, and re-running an upload cleanly re-commits changes. A --dry-run option is also available for data transfer commands, allowing both humans and agents to preview actions before committing to potentially long or impactful operations.
Benchmarking Efficiency: CLI vs. Hand-rolled Solutions
To quantify the efficiency gains of the hf CLI for coding agents, we conducted a rigorous benchmark. We developed an evaluation harness to run 18 complex, non-trivial Hub tasks (e.g., aggregating trending models, inspecting repo files, managing branches and tags, syncing buckets) across different interaction methods. These tasks were executed repeatedly against the live Hub, with each run graded independently to ensure accuracy.
Our setup involved fresh agent instances, minimizing contextual nudges, and comparing the hf CLI (with and without a specialized skill) against baseline methods using curl or the Python SDK. Each task/tool combination was run 10 times due to agent non-determinism, totaling approximately 1,000 graded runs across Claude Code (Sonnet 4.6) and OpenAI Codex (GPT-5.5).
The results were compelling. Across both agents, the hf CLI consistently outperformed the alternatives, particularly on multi-step tasks. For Sonnet, curl and the SDK trailed by a significant 10 percentage points in task success, often failing on write operations that the hf CLI handled seamlessly. On GPT-5.5, the hf CLI showed remarkable token efficiency; for complex, multi-step tasks, the non-hf versions consumed up to 6 times more tokens to achieve the same outcome. While curl and the SDK performed adequately for simple, one-shot reads, their token usage skyrocketed as task complexity increased, highlighting the substantial advantage of the agent-optimized hf CLI.
Source: Hugging Face Blog