Control any platform with Skills
Agent Skills are a format for extending AI coding agents with specialized capabilities. Midscene provides Agent Skills that let AI coding tools (like Claude Code, Cline, etc.) drive UI automation through CLI commands — no MCP server setup required.
Unlike MCP integration, Skills work by running CLI commands directly in the terminal. The AI agent acts as the brain: it takes screenshots, analyzes the UI, and decides which actions to perform next.
Supported platforms
Installation
Make sure Node.js is installed, then run:
Skills repository: github.com/web-infra-dev/midscene-skills
Model configuration
Midscene skills require a vision model with strong visual grounding capabilities. Configure the following environment variables — either as system environment variables or in a .env file in the current working directory (Midscene loads .env automatically):
For supported models and configuration details, see Model strategy and Common model configuration.
Use skills
In your AI chat assistant, you can use the following command to use skills:
Example: Coding Agent self-verifies after writing code
In this example, we ask Claude Code to develop an Electron Todo app, and after writing the code, it uses the desktop-computer-automation Skill to launch the app, interact with the UI, and take screenshots to verify the feature works as expected — no manual intervention or test scripts needed.
Prompt:
The coding agent autonomously completes the entire workflow: write the Todo component → launch the Electron app → connect to the desktop → take screenshots to understand the UI → interact via natural language → take screenshots to verify results. The developer only describes the intent, and Skills give the agent the ability to "see the screen and move the mouse", letting it verify its own code just like a human would.
More use cases
Skills go beyond local desktop testing. By combining different Skills, you can cover a wide range of automation scenarios:
- Desktop app testing — Verify functionality of Electron, Qt, WPF and other desktop applications
- Remote computer control — Operate applications on remote machines via remote desktop connections for remote ops and debugging
- Mobile app testing — Use
@midscene/androidand@midscene/iosSkills to test mobile apps on real devices or simulators - Cross-app workflows — Chain operations across multiple apps, e.g. fetch data from browser → paste into Excel → take screenshot and send to Slack
- CI/CD integration — Run desktop automation in headless mode on Linux CI via Xvfb, no physical display needed
- Daily task automation — Batch form filling, scheduled screenshot monitoring, automatic file organization, etc.
More
Please refer to the Skills Repository for more details.

