Originally published on 2026-02-06
Original article (Japanese): Agentic CLI Design: CLIをAIエージェント向けプロトコルとして設計する7つの原則
CLI tools have long been designed as interfaces for human interaction with terminals. However, with the rise of LLMs (Large Language Models) and AI agents (programs that autonomously invoke tools to progress tasks), there is a new role being demanded of CLIs. This new role is to be designed as a "protocol/API that agents can safely, reliably, and repeatedly invoke."
Recently, I have had more opportunities to have agents run CLI commands. Issues that might not bother humans, such as "stopping at confirmation prompts," "logs mixing with stdout making parsing impossible," and "accidentally repeating the same operation," can occur quite normally when interacting with agents.
In this article, I will summarize the design concept I propose called "Agentic CLI Design." This redefines CLI from "a UI operated by humans" to "a protocol invoked by agents," establishing seven design principles that ensure functionality based on assumptions of failure, re-execution, and non-interactivity.
What is Agentic CLI Design?
Agentic CLI Design consists of design principles for CLIs that allow LLMs/agents to execute commands safely and reliably in a non-interactive, iterative, and failure-prone environment.
Rather than optimizing for "tactility" or "ease of use" for humans, it focuses on ensuring that machines can read, judge, re-execute, and recover.
Success Conditions
The success conditions for Agentic CLI Design are that agents must meet the following criteria:
- No confusion: Options are clearly presented, allowing for judgment on the next action.
- No destruction: Default to safety, requiring explicit confirmation for destructive operations.
- No blockage: Able to complete non-interactively, with clear timeout/retry policies.
- Repeatable: Idempotent, ensuring safety when re-executed.
- Self-repairing: Observable, allowing for judgment of recovery procedures from errors.
7 Principles
Agentic CLI Design is composed of the following seven principles (Principle 1 to Principle 7).
Principle 1: Machine-readable
Principle: Output is structured and provided in a format that machines can reliably parse.
Design Checks:
- Options for
--json/--output json|yaml|textare available. - Strict adherence to standard output (stdout) = results / standard error output (stderr) = logs/progress (do not mix).
- Errors are also structured (preferably in JSON).
- Schema is stable (breaking changes managed via
schemaVersion, etc.).
Example:
The CLI for Kubernetes called kubectl supports JSON output. The AWS CLI also has --output json.
# Structured output on success
kubectl get pods -o json
# Example using JSON output
aws ec2 describe-instances --output json 2>&1
Human-friendly "readable tables" are secondary. Agents need to be able to reliably parse JSON or YAML.
Minimum Recommended Response (JSON):
On success:
{
"ok": true,
"type": "items.list",
"schemaVersion": 1,
"data": {
"items": [
{"id": "...", "createdAt": "2026-02-05T08:00:00Z"}
],
"nextCursor": "..."
}
}
On failure:
{
"ok": false,
"type": "items.list",
"schemaVersion": 1,
"error": {
"code": "rate_limited",
"message": "...",
"retryAfterMs": 1200
}
}
Principle 2: Non-interactive by default
Principle: Do not assume interactive prompts, allowing for completion in headless environments (running without screens or interactive operations, such as CI or job runners).
Design Checks:
- Options for
--yes/--force/--no-confirm/--non-interactiveare available. - Must be able to complete in environments without TTY.
- If interaction is necessary, it must be explicitly opted in.
Example:
Having pre-execution options like in Terraform makes agent operations easier.
# Execute without interaction
terraform apply -auto-approve
# Explicitly in non-interactive mode
apt-get install -y package-name
Agents cannot respond to "Y/N?" prompts. All choices must be specified in advance via options. It is also crucial that the process does not stop in environments without TTY.
Authentication (OAuth/headless) Key Points:
- If possible, prioritize Device Authorization Grant (RFC 8628).
- Provide
auth status --jsonfor agents to confirm prerequisites. - Support migration to headless environments with
auth export/auth import. - When
--non-interactive, return "error + next steps" without asking for confirmation.
Principle 3: Idempotent & Replayable
Principle: It is safe to execute the same command multiple times, and the results are predictable.
Idempotence means that repeating the same operation does not change the result. Agents may "hit the same command again" due to timeouts or network interruptions. Therefore, a design that avoids accidents during re-execution is necessary.
Design Checks:
- Accept dedupe-key / client-request-id for sending/creating.
- Allow choosing behaviors for "already created":
--if-exists skip|update|error. - Clearly indicate paging for retrieval:
--limit--cursor--all.
Example:
# Idempotent creation (skip if already exists)
kubectl apply -f deployment.yaml
# For a CLI hitting an HTTP API, explicitly provide a request ID (deduplication key)
curl -sS -X POST https://api.example.com/v1/items \
-H 'Content-Type: application/json' \
-H 'Idempotency-Key: 01JHXXXX...' \
-d '{"name":"example"}'
Agents may re-execute the same command due to network errors or timeouts. A design that ensures safety during re-execution is essential.
Principle 4: Safe-by-default
Principle: Destructive operations are not executed by default and require explicit confirmation.
Design Checks:
- Destructive operations can enforce
--dry-run/--confirm <id>. - Deletion requires
--force, ensuring no accidents by default. - Minimize permissions/scope, returning "next steps" when insufficient.
Example:
# Dry-run for pre-confirmation
terraform plan
# Execution requires explicit approval
terraform apply
# Prepare a preview before destructive operations
kubectl diff -f deployment.yaml
Agents can accidentally perform deletions. Multiple layers of confirmation are necessary for destructive operations.
Principle 5: Observable & Debuggable
Principle: The execution status can be observed, and recovery procedures can be determined in case of errors.
Design Checks:
- Options for
--verbose/--debug/--log-format jsonare available. - Ability to pass correlation IDs with
--trace-id. - Classify exit codes to facilitate automatic recovery:
- Example: 0=success / 2=argument error / 3=authentication error / 4=retry recommended.
Example:
# Output detailed logs
kubectl apply -f deployment.yaml --v=9
# Determine based on exit code
if [ $? -eq 4 ]; then
echo "Retryable error, waiting..."
sleep 5
retry_command
fi
Agents will determine the "next step" from error messages. Structured exit codes and errors are crucial.
Recommended Exit Code Classification:
- 0: Success
- 2: Argument error / usage error
- 3: Authentication / permission error
- 4: Retry recommended (rate limit / transient)
Principle 6: Context-efficient
Principle: Do not waste the context window of LLMs.
Design Checks:
- Use
--fields/--select(projection) to retrieve only necessary fields. - Handle large data with
--output ndjson(streaming format with one JSON per line). - Default to summaries, with details provided via
get/--include-*. - Implement server-side filtering (since/until/query/type…).
Example:
# Retrieve only necessary fields
kubectl get pods -o custom-columns=NAME:.metadata.name,STATUS:.status.phase
# Handle large data with paging
aws s3api list-objects-v2 --bucket my-bucket --max-items 100
Agents can hit token limits if they cram too much data into the context window (the maximum input LLMs can reference at once). A design that retrieves only the minimum necessary data is required.
Principle 7: Introspectable
Principle: The CLI itself should output specifications in a machine-readable format, allowing agents to self-discover.
Design Checks:
- Provide
commands --json(command list and argument list). - Provide
schema --command ... --output json-schema(command-level JSON Schema, defining the structure of JSON). - Provide
--help --json(examples, exit codes, error vocabulary). - Example of top-level fixed fields for
--output json:-
schemaVersion,type,ok.
-
Example:
# This is an example of desirable self-describing design
tool commands --output json
tool schema --command items.list --output json-schema
tool help items.list --output json
The Model Context Protocol (MCP) can derive schemas from tool definitions, but CLIs often become black boxes. By having the CLI output specifications in a machine-readable format, agents can achieve self-discovery.
Recommended Set of Introspection Commands:
tool commands --jsontool schema --command <subcommand...> --output json-schema-
tool help --json(or--help --jsonfor each command)
Anti-patterns
The following are examples of "commonly broken" aspects from the perspective of Agentic CLI Design.
Logs/Progress Mixed with stdout
# ❌ Bad Example
echo "Processing..."
echo '{"result": "success"}'
Agents will fail when trying to parse JSON. Please output logs to stderr.
JSON Structure Changes Based on Conditions
# ❌ Bad Example
# On success: {"data": {...}}
# On failure: {"error": "..."}
Please clarify success/failure with an ok field and unify the structure.
Default Interaction Causes Blockage in Headless Environments
# ❌ Bad Example
read -p "Continue? (y/n): " answer
This will cause blockage in CI environments or job runners. Please provide a --yes option.
Destructive Commands Can Be Executed by Default
# ❌ Bad Example
rm -rf /data/*
Please provide two-step confirmations with --dry-run and --confirm.
--all Results in Huge JSON Output
# ❌ Bad Example
curl https://api.example.com/items?all=true
This will explode the context window. Please implement paging (--limit / --cursor).
Authentication Requires a Browser, Failing in Remote/Container Environments
# ❌ Bad Example
open https://auth.example.com/login
Please prioritize Device Authorization Grant (RFC 8628).
Scorecard (Review Checklist)
The following is a checklist that can score the seven principles of Agentic CLI Design on a scale of 0/1/2 points. It is structured with specific items to facilitate easy adaptation to other projects.
Principle 1: Machine-readable
- [ ]
--output jsonis available. - [ ] stdout = results / stderr = logs is adhered to.
- [ ] Errors are structured (JSON recommended).
- [ ]
schemaVersionis present / compatibility policy is documented.
Principle 2: Non-interactive
- [ ]
--non-interactiveis available (can be auto ON without TTY). - [ ] All operations requiring interaction are opt-in (i.e., default is non-interactive).
- [ ] Vocabulary for
--yes/--force/--no-confirmis unified.
Principle 3: Idempotent & Replayable
- [ ] Writing commands have
--client-request-id/--dedupe-keyequivalents. - [ ] There is a policy for
--if-exists. - [ ]
--cursor/--limit/--allare available (with--allimplementing internal paging).
Principle 4: Safe-by-default
- [ ] Destructive operations can use
--dry-run. - [ ] Actual execution requires additional guards like
--confirm <id>/--force.
Principle 5: Observable & Debuggable
- [ ]
--debugis available (logs to stderr). - [ ]
--log-format jsonis available. - [ ] Accepts
--trace-id. - [ ] Exit code classification is present (2/3/4, etc.).
Principle 6: Context-efficient
- [ ]
--fields/--selectis available. - [ ]
--output ndjsonis available. - [ ] Heavy fields are opt-in via
--include-*.
Principle 7: Introspectable
- [ ]
commands --jsonis available. - [ ]
schema --command ... --output json-schemais available.
This Scorecard can be used for reviewing CLIs or as acceptance criteria.
Released AgentSkill
The content written in this article consists of "principles," which can lead to confusion when trying to implement them. Therefore, I have released an AgentSkill (a manual for agents) that supports Agentic CLI Design.
What it includes (having these elements generally stabilizes agent operations):
- Recipes by task (shortest command sequences).
- Guardrails (flow like
--dry-run→ confirmation →--confirm). - Recommended defaults (
--output json,--non-interactive, paging, etc.). - Typical success/failure output examples (JSON).
- Recovery procedures for errors (retries, authentication, insufficient permissions, etc.).
Using this AgentSkill as a foundation, I believe it is the fastest way to solidify "procedures and vocabulary for safe usage" for your CLI.
CLI vs MCP: Considerations for Differentiation
The Model Context Protocol (MCP) is a standard protocol for connecting AI models with external tools. MCP and CLI are not competing; they can be differentiated as follows:
Cases Suitable for CLI
- Existing CLI tools are available: GitHub CLI (gh), kubectl, aws cli, etc.
- Stateless operations: Operations that can be completed in a single command.
- Integration with Unix pipes: Integration with existing shell scripts.
Cases Suitable for MCP
- No existing CLI tools: Custom services or APIs.
- Stateful operations: Operations that require maintaining state across multiple calls.
- Real-time streaming: MCP supports streaming responses.
- Custom business logic: Applying unique rules to tool access.
Agentic CLI Design is a set of design principles for optimizing existing CLI tools for agents. When creating new tools, consider both MCP and CLI.
Conclusion
Agentic CLI Design is a design concept that redefines CLI from "a UI operated by humans" to "a protocol invoked by agents."
By being mindful of the seven principles (Principle 1 to Principle 7), you can design CLIs that allow agents to operate "without confusion, without destruction, without blockage, repeatedly, and while self-repairing."
Using this Scorecard when reviewing existing CLI tools (gh, kubectl, aws cli, etc.) can help identify areas for improvement for agents.
If you are interested, please take the time to score your CLI tool using the Scorecard.
Top comments (0)