tumf

Posted on Feb 6 • Originally published at blog.tumf.dev

Agentic CLI Design: 7 Principles for Designing CLI as a Protocol for AI Agents

#ai #llm #devops

Originally published on 2026-02-06
Original article (Japanese): Agentic CLI Design: CLIをAIエージェント向けプロトコルとして設計する7つの原則

CLI tools have long been designed as interfaces for human interaction with terminals. However, with the rise of LLMs (Large Language Models) and AI agents (programs that autonomously invoke tools to progress tasks), there is a new role being demanded of CLIs. This new role is to be designed as a "protocol/API that agents can safely, reliably, and repeatedly invoke."

Recently, I have had more opportunities to have agents run CLI commands. Issues that might not bother humans, such as "stopping at confirmation prompts," "logs mixing with stdout making parsing impossible," and "accidentally repeating the same operation," can occur quite normally when interacting with agents.

In this article, I will summarize the design concept I propose called "Agentic CLI Design." This redefines CLI from "a UI operated by humans" to "a protocol invoked by agents," establishing seven design principles that ensure functionality based on assumptions of failure, re-execution, and non-interactivity.

What is Agentic CLI Design?

Agentic CLI Design consists of design principles for CLIs that allow LLMs/agents to execute commands safely and reliably in a non-interactive, iterative, and failure-prone environment.

Rather than optimizing for "tactility" or "ease of use" for humans, it focuses on ensuring that machines can read, judge, re-execute, and recover.

Success Conditions

The success conditions for Agentic CLI Design are that agents must meet the following criteria:

No confusion: Options are clearly presented, allowing for judgment on the next action.
No destruction: Default to safety, requiring explicit confirmation for destructive operations.
No blockage: Able to complete non-interactively, with clear timeout/retry policies.
Repeatable: Idempotent, ensuring safety when re-executed.
Self-repairing: Observable, allowing for judgment of recovery procedures from errors.

7 Principles

Agentic CLI Design is composed of the following seven principles (Principle 1 to Principle 7).

Principle 1: Machine-readable

Principle: Output is structured and provided in a format that machines can reliably parse.

Design Checks:

Options for --json / --output json|yaml|text are available.
Strict adherence to standard output (stdout) = results / standard error output (stderr) = logs/progress (do not mix).
Errors are also structured (preferably in JSON).
Schema is stable (breaking changes managed via schemaVersion, etc.).

Example:

The CLI for Kubernetes called kubectl supports JSON output. The AWS CLI also has --output json.

# Structured output on success
kubectl get pods -o json

# Example using JSON output
aws ec2 describe-instances --output json 2>&1

Human-friendly "readable tables" are secondary. Agents need to be able to reliably parse JSON or YAML.

Minimum Recommended Response (JSON):

On success:

{
  "ok": true,
  "type": "items.list",
  "schemaVersion": 1,
  "data": {
    "items": [
      {"id": "...", "createdAt": "2026-02-05T08:00:00Z"}
    ],
    "nextCursor": "..."
  }
}

On failure:

{
  "ok": false,
  "type": "items.list",
  "schemaVersion": 1,
  "error": {
    "code": "rate_limited",
    "message": "...",
    "retryAfterMs": 1200
  }
}

Principle 2: Non-interactive by default

Principle: Do not assume interactive prompts, allowing for completion in headless environments (running without screens or interactive operations, such as CI or job runners).

Design Checks:

Options for --yes / --force / --no-confirm / --non-interactive are available.
Must be able to complete in environments without TTY.
If interaction is necessary, it must be explicitly opted in.

Example:

Having pre-execution options like in Terraform makes agent operations easier.

# Execute without interaction
terraform apply -auto-approve

# Explicitly in non-interactive mode
apt-get install -y package-name

Agents cannot respond to "Y/N?" prompts. All choices must be specified in advance via options. It is also crucial that the process does not stop in environments without TTY.

Authentication (OAuth/headless) Key Points:

If possible, prioritize Device Authorization Grant (RFC 8628).
Provide auth status --json for agents to confirm prerequisites.
Support migration to headless environments with auth export / auth import.
When --non-interactive, return "error + next steps" without asking for confirmation.

Principle 3: Idempotent & Replayable

Principle: It is safe to execute the same command multiple times, and the results are predictable.

Idempotence means that repeating the same operation does not change the result. Agents may "hit the same command again" due to timeouts or network interruptions. Therefore, a design that avoids accidents during re-execution is necessary.

Design Checks:

Accept dedupe-key / client-request-id for sending/creating.
Allow choosing behaviors for "already created": --if-exists skip|update|error.
Clearly indicate paging for retrieval: --limit --cursor --all.

Example:

# Idempotent creation (skip if already exists)
kubectl apply -f deployment.yaml

# For a CLI hitting an HTTP API, explicitly provide a request ID (deduplication key)
curl -sS -X POST https://api.example.com/v1/items \
  -H 'Content-Type: application/json' \
  -H 'Idempotency-Key: 01JHXXXX...' \
  -d '{"name":"example"}'

Agents may re-execute the same command due to network errors or timeouts. A design that ensures safety during re-execution is essential.

Principle 4: Safe-by-default

Principle: Destructive operations are not executed by default and require explicit confirmation.

Design Checks:

Destructive operations can enforce --dry-run / --confirm <id>.
Deletion requires --force, ensuring no accidents by default.
Minimize permissions/scope, returning "next steps" when insufficient.

Example:

# Dry-run for pre-confirmation
terraform plan

# Execution requires explicit approval
terraform apply

# Prepare a preview before destructive operations
kubectl diff -f deployment.yaml

Agents can accidentally perform deletions. Multiple layers of confirmation are necessary for destructive operations.

Principle 5: Observable & Debuggable

Principle: The execution status can be observed, and recovery procedures can be determined in case of errors.

Design Checks:

Options for --verbose / --debug / --log-format json are available.
Ability to pass correlation IDs with --trace-id.
Classify exit codes to facilitate automatic recovery:
- Example: 0=success / 2=argument error / 3=authentication error / 4=retry recommended.

Example:

# Output detailed logs
kubectl apply -f deployment.yaml --v=9

# Determine based on exit code
if [ $? -eq 4 ]; then
  echo "Retryable error, waiting..."
  sleep 5
  retry_command
fi

Agents will determine the "next step" from error messages. Structured exit codes and errors are crucial.

Recommended Exit Code Classification:

0: Success
2: Argument error / usage error
3: Authentication / permission error
4: Retry recommended (rate limit / transient)

Principle 6: Context-efficient

Principle: Do not waste the context window of LLMs.

Design Checks:

Use --fields/--select (projection) to retrieve only necessary fields.
Handle large data with --output ndjson (streaming format with one JSON per line).
Default to summaries, with details provided via get/--include-*.
Implement server-side filtering (since/until/query/type…).

Example:

# Retrieve only necessary fields
kubectl get pods -o custom-columns=NAME:.metadata.name,STATUS:.status.phase

# Handle large data with paging
aws s3api list-objects-v2 --bucket my-bucket --max-items 100

Agents can hit token limits if they cram too much data into the context window (the maximum input LLMs can reference at once). A design that retrieves only the minimum necessary data is required.

Principle 7: Introspectable

Principle: The CLI itself should output specifications in a machine-readable format, allowing agents to self-discover.

Design Checks:

Provide commands --json (command list and argument list).
Provide schema --command ... --output json-schema (command-level JSON Schema, defining the structure of JSON).
Provide --help --json (examples, exit codes, error vocabulary).
Example of top-level fixed fields for --output json:
- schemaVersion, type, ok.

Example:

# This is an example of desirable self-describing design
tool commands --output json
tool schema --command items.list --output json-schema
tool help items.list --output json

The Model Context Protocol (MCP) can derive schemas from tool definitions, but CLIs often become black boxes. By having the CLI output specifications in a machine-readable format, agents can achieve self-discovery.

Recommended Set of Introspection Commands:

tool commands --json
tool schema --command <subcommand...> --output json-schema
tool help --json (or --help --json for each command)

Anti-patterns

The following are examples of "commonly broken" aspects from the perspective of Agentic CLI Design.

Logs/Progress Mixed with stdout

# ❌ Bad Example
echo "Processing..."
echo '{"result": "success"}'

Agents will fail when trying to parse JSON. Please output logs to stderr.

JSON Structure Changes Based on Conditions

# ❌ Bad Example
# On success: {"data": {...}}
# On failure: {"error": "..."}

Please clarify success/failure with an ok field and unify the structure.

Default Interaction Causes Blockage in Headless Environments

# ❌ Bad Example
read -p "Continue? (y/n): " answer

This will cause blockage in CI environments or job runners. Please provide a --yes option.

Destructive Commands Can Be Executed by Default

# ❌ Bad Example
rm -rf /data/*

Please provide two-step confirmations with --dry-run and --confirm.

`--all` Results in Huge JSON Output

# ❌ Bad Example
curl https://api.example.com/items?all=true

This will explode the context window. Please implement paging (--limit / --cursor).

Authentication Requires a Browser, Failing in Remote/Container Environments

# ❌ Bad Example
open https://auth.example.com/login

Please prioritize Device Authorization Grant (RFC 8628).

Scorecard (Review Checklist)

The following is a checklist that can score the seven principles of Agentic CLI Design on a scale of 0/1/2 points. It is structured with specific items to facilitate easy adaptation to other projects.

Principle 1: Machine-readable

[ ] --output json is available.
[ ] stdout = results / stderr = logs is adhered to.
[ ] Errors are structured (JSON recommended).
[ ] schemaVersion is present / compatibility policy is documented.

Principle 2: Non-interactive

[ ] --non-interactive is available (can be auto ON without TTY).
[ ] All operations requiring interaction are opt-in (i.e., default is non-interactive).
[ ] Vocabulary for --yes/--force/--no-confirm is unified.

Principle 3: Idempotent & Replayable

[ ] Writing commands have --client-request-id / --dedupe-key equivalents.
[ ] There is a policy for --if-exists.
[ ] --cursor/--limit/--all are available (with --all implementing internal paging).

Principle 4: Safe-by-default

[ ] Destructive operations can use --dry-run.
[ ] Actual execution requires additional guards like --confirm <id> / --force.

Principle 5: Observable & Debuggable

[ ] --debug is available (logs to stderr).
[ ] --log-format json is available.
[ ] Accepts --trace-id.
[ ] Exit code classification is present (2/3/4, etc.).

Principle 6: Context-efficient

[ ] --fields/--select is available.
[ ] --output ndjson is available.
[ ] Heavy fields are opt-in via --include-*.

Principle 7: Introspectable

[ ] commands --json is available.
[ ] schema --command ... --output json-schema is available.

This Scorecard can be used for reviewing CLIs or as acceptance criteria.

Released AgentSkill

The content written in this article consists of "principles," which can lead to confusion when trying to implement them. Therefore, I have released an AgentSkill (a manual for agents) that supports Agentic CLI Design.

tumf/skills: agentic-cli-design

What it includes (having these elements generally stabilizes agent operations):

Recipes by task (shortest command sequences).
Guardrails (flow like --dry-run → confirmation → --confirm).
Recommended defaults (--output json, --non-interactive, paging, etc.).
Typical success/failure output examples (JSON).
Recovery procedures for errors (retries, authentication, insufficient permissions, etc.).

Using this AgentSkill as a foundation, I believe it is the fastest way to solidify "procedures and vocabulary for safe usage" for your CLI.

CLI vs MCP: Considerations for Differentiation

The Model Context Protocol (MCP) is a standard protocol for connecting AI models with external tools. MCP and CLI are not competing; they can be differentiated as follows:

Cases Suitable for CLI

Existing CLI tools are available: GitHub CLI (gh), kubectl, aws cli, etc.
Stateless operations: Operations that can be completed in a single command.
Integration with Unix pipes: Integration with existing shell scripts.

Cases Suitable for MCP

No existing CLI tools: Custom services or APIs.
Stateful operations: Operations that require maintaining state across multiple calls.
Real-time streaming: MCP supports streaming responses.
Custom business logic: Applying unique rules to tool access.

Agentic CLI Design is a set of design principles for optimizing existing CLI tools for agents. When creating new tools, consider both MCP and CLI.

Conclusion

Agentic CLI Design is a design concept that redefines CLI from "a UI operated by humans" to "a protocol invoked by agents."

By being mindful of the seven principles (Principle 1 to Principle 7), you can design CLIs that allow agents to operate "without confusion, without destruction, without blockage, repeatedly, and while self-repairing."

Using this Scorecard when reviewing existing CLI tools (gh, kubectl, aws cli, etc.) can help identify areas for improvement for agents.

If you are interested, please take the time to score your CLI tool using the Scorecard.

DEV Community

Agentic CLI Design: 7 Principles for Designing CLI as a Protocol for AI Agents

What is Agentic CLI Design?

Success Conditions

7 Principles

Principle 1: Machine-readable

Principle 2: Non-interactive by default

Principle 3: Idempotent & Replayable

Principle 4: Safe-by-default

Principle 5: Observable & Debuggable

Principle 6: Context-efficient

Principle 7: Introspectable

Anti-patterns

Logs/Progress Mixed with stdout

JSON Structure Changes Based on Conditions

Default Interaction Causes Blockage in Headless Environments

Destructive Commands Can Be Executed by Default

`--all` Results in Huge JSON Output

Authentication Requires a Browser, Failing in Remote/Container Environments

Scorecard (Review Checklist)

Principle 1: Machine-readable

Principle 2: Non-interactive

Principle 3: Idempotent & Replayable

Principle 4: Safe-by-default

Principle 5: Observable & Debuggable

Principle 6: Context-efficient

Principle 7: Introspectable

Released AgentSkill

CLI vs MCP: Considerations for Differentiation

Cases Suitable for CLI

Cases Suitable for MCP

Conclusion

Reference Links

Top comments (0)

What is Agentic CLI Design?

Success Conditions

7 Principles

Principle 1: Machine-readable

Principle 2: Non-interactive by default

Principle 3: Idempotent & Replayable

Principle 4: Safe-by-default

Principle 5: Observable & Debuggable

Principle 6: Context-efficient

Principle 7: Introspectable

Anti-patterns

Logs/Progress Mixed with stdout

JSON Structure Changes Based on Conditions

Default Interaction Causes Blockage in Headless Environments

Destructive Commands Can Be Executed by Default

--all Results in Huge JSON Output

Authentication Requires a Browser, Failing in Remote/Container Environments

Scorecard (Review Checklist)

Principle 1: Machine-readable

Principle 2: Non-interactive

Principle 3: Idempotent & Replayable

Principle 4: Safe-by-default

Principle 5: Observable & Debuggable

Principle 6: Context-efficient

Principle 7: Introspectable

Released AgentSkill

CLI vs MCP: Considerations for Differentiation

Cases Suitable for CLI

Cases Suitable for MCP

Conclusion

Reference Links

`--all` Results in Huge JSON Output