Most AI coding tools (Claude Code, Gemini CLI, Copilot) have the same limitation: they can only read/write files
and run shell commands. They can't actually interact with your desktop.
xAgent CLI is different. It can control your mouse, keyboard, and any application on your computer.
What it can do:
GUI Automation
- Click buttons at precise coordinates
- Type text into any field
- Navigate websites automatically
- Control ANY desktop application
Example:
xagent gui --url https://twitter.com/login
> Click the username field at (400, 280)
> Type "myaccount"
> Press Tab
> Type "mypassword"
> Click the login button at (450, 420)
Free Frontier Models
- MiniMax M2.1 (reasoning & coding)
- GLM-4.7 (智谱AI multimodal)
- Kimi K2 (1T context, MoE)
- Qwen3 Coder (coding)
No API keys. All free.
Developer Features
- Code analysis & debugging
- Project architecture understanding
- Context compression for large repos
- SubAgent system for complex tasks
Why I built this:
I wanted an AI that could actually do things for me, not just suggest code. One that could login to websites,
fill forms, organize files, and automate workflows.
Tech stack:
- Node.js 20+, TypeScript 5.3
- Ink for terminal UI
- Puppeteer/Playwright for browser automation
Try it:
npm i -g @xagent-ai/cli
xagent start
Cross-platform: Windows, macOS, Linux.
MIT licensed. Would love your feedback!
1 comments