Hermes Agent: What an Open-Source AI Agent Can (and Can't) Do

Hermes Agent is an open-source AI coding assistant from Nous Research. It's not another IDE plugin — it's a terminal-native agent that runs on your infrastructure, uses your API keys, and has direct access to your file system and shell. No data leaves your VPS unless you route it through an API provider.

I've been using it daily for months. Here's what I've learned about what it's actually good at, what it struggles with, and how to work around the gaps.

What It Is

Hermes Agent runs in the terminal. You talk to it through Telegram, Slack, or the CLI. It has tools to read and write files, execute shell commands, search codebases, browse the web, and delegate sub-tasks to child agents. The whole thing is configurable through a single YAML file — providers, models, tools, even custom skills.

It's built on the philosophy that an AI agent should be a collaborative tool for a human developer, not a replacement for one. Every action it takes is visible, every file change is reviewable, and nothing happens without you knowing about it.

What It's Good At

1. Wrangling Large Codebases

This is where Hermes shines. Give it a codebase you've never touched and it can:

Map the directory structure
Understand the architecture from existing code patterns
Find the exact file you need to modify
Make targeted changes across multiple files

I regularly throw it at unfamiliar Go monorepos or TypeScript projects and get meaningful work done on the first try. The context window management is solid — it knows when to look at more files and when to stop.

2. Multi-File Refactoring

Renaming a type across 15 files? Extracting a shared component from 4 similar implementations? Moving a package to a new path? Hermes handles these reliably. The patch tool finds-and-replaces with fuzzy matching so minor whitespace differences don't break edits.

3. Repetitive DevOps Tasks

Writing systemd unit files. Setting up nginx reverse proxies. Configuring Let's Encrypt certs. Debugging why a service won't start. These are perfect for an agent — well-defined, documented, and tedious when done manually.

4. Research + Synthesis

Give it a topic and it can browse documentation, read source code, and synthesize findings. Need to understand how SQS redrive policies work across three SDKs? It'll find the docs and produce a comparative summary.

5. Following Established Patterns

If you have a CLAUDE.md or AGENTS.md in your repo with conventions, Hermes respects them. You tell it once "we use chi router in Go, never gin" and it remembers across sessions.

What It's Not Good At

1. Long-Running Autonomous Tasks

Hermes will not "build the whole feature while you grab coffee." It works iteratively, showing you results, asking for direction. This is deliberate — the agent is designed to keep you in the loop. If you want full autonomy, you chain cron jobs or use the kanban orchestration pattern.

2. Novel Architecture Design

The agent follows patterns it knows. Ask it to design a novel system architecture from scratch and you'll get something competent but conventional. It won't invent a new message bus pattern or suggest a database topology you haven't considered. For architecture decisions, you're still the senior engineer.

3. Cross-Session Context

Each conversation is isolated. Hermes doesn't carry context from yesterday's session into today's unless you explicitly save it to memory or a skill file. The memory system is durable but limited (~2KB) — you have to be selective about what you store.

4. UI Design

Ask it to build a UI that doesn't look like a template and you'll get a competent, generic interface. The agent knows CSS and frameworks but doesn't have an eye for visual design. You need to provide design direction — color palettes, layout preferences, reference screenshots.

5. Debugging Across Service Boundaries

If your Go service calls a Python service that depends on a PostgreSQL view that references a DynamoDB table — good luck. The agent can investigate each component individually but won't independently trace a bug across three language runtimes and two databases without heavy guidance.

How I Use It

The Cron + Agent Pattern

Most of my daily automation runs on cron jobs, not interactive sessions. Hermes supports two cron modes:

Agent-driven — "Every 3 hours, check for new replies to sent emails and summarize them." The agent wakes up, runs the check, forms a summary, and delivers it to my Telegram.

Script-driven — "Every day at 9 AM, run this lead generation script." No agent overhead — just execute the script and deliver stdout. If the script outputs nothing, nothing is sent.

The script-driven mode is key. Most "monitoring" tasks don't need an LLM. Only the tasks that require reasoning — "is this reply worth following up?" — should spin up an agent.

The Skill System

Skills are reusable playbooks stored as SKILL.md files. When a complex task succeeds (5+ tool calls, novel approach discovered, error overcome), I save the approach as a skill. Next time a similar situation comes up, the agent loads the skill and follows the verified workflow instead of figuring it out from scratch.

This is the closest thing to "the agent learns from experience" that actually works in practice.

Delegation for Parallel Work

The delegate_task tool spawns sub-agents for independent work streams. I use it for:

Researching two API libraries simultaneously
Reviewing PR diffs while writing tests for another change
Investigating a production incident while preparing a fix

Each sub-agent gets its own terminal, file system, and context. They don't interfere with each other. The parent agent synthesizes their results.

What I Don't Use It For

Security-sensitive operations — It doesn't touch production databases or deployment credentials
Client-facing work — If the output has to be pixel-perfect, I do it myself
The first draft of a novel system — I sketch architecture, then hand it to the agent for implementation
Anything that requires creative design judgment — Color palettes, typography, layout decisions

The Bottom Line

Hermes Agent is a force multiplier for an experienced engineer, not a junior developer replacement. It eliminates the mechanical parts of coding — boilerplate, refactoring, documentation research, config management — while leaving the high-level decisions to the human.

The sweet spot is anything that's tedious but well-defined. The failure mode is anything that requires taste, judgment, or cross-system intuition.

If you're a senior IC who spends too much time writing boilerplate and chasing config bugs, it's worth running. If you're looking for an agent that designs systems and ships features autonomously, this isn't that — yet.

Disclosure: I contribute to the open-source Hermes Agent project at Nous Research. This post reflects my experience as both a user and a contributor.