Midscene.js - joyful automation by AI
Driving all platforms UI automation with vision-based model
📣 v1.0 Release Announcement
We have released v1.0. It is currently published on npm.
For the latest documentation and code, please visit https://midscenejs.com/ and themainbranch.
For historical documentation, please visit https://v0.midscenejs.com/.
v1.0 changelog: https://midscenejs.com/changelog
Features
Write Automation with Natural Language
- Describe your goals and steps, and Midscene will plan and operate the user interface for you.
- Use Javascript SDK or YAML to write your automation script.
Web & Mobile App & Any Interface
- Web Automation: Either integrate with Puppeteer, with Playwright or use Bridge Mode to control your desktop browser.
- Android Automation: Use Javascript SDK with adb to control your local Android device.
- iOS Automation: Use Javascript SDK with WebDriverAgent to control your local iOS devices and simulators.
- Any Interface Automation: Use Javascript SDK to control your own interface.
For Developers
- Three kinds of APIs:
- Interaction API: interact with the user interface.
- Data Extraction API: extract data from the user interface and dom.
- Utility API: utility functions like
aiAssert(),aiLocate(),aiWaitFor().
- MCP: Midscene provides MCP services that expose atomic Midscene Agent actions as MCP tools so upper-layer agents can inspect and operate UIs with natural language. Docs
- Caching for Efficiency: Replay your script with cache and get the result faster.
- Debugging Experience: Midscene.js offers a visualized replay back report file, a built-in playground, and a Chrome Extension to simplify the debugging process. These are the tools most developers truly need.
Showcases
Register the GitHub form autonomously in a web browser and pass all field validations:
Plus these real-world showcases:
- iOS Automation - Meituan coffee order
- iOS Automation - Auto-like the first @midscene_ai tweet
- Android Automation - DCar: Xiaomi SU7 specs
- Android Automation - Booking a Tokyo hotel for Christmas
- MCP Integration - Midscene MCP UI prepatch release
Zero-code quick experience
- Chrome Extension: Start in-browser experience immediately through the Chrome Extension, without writing any code.
- Android Playground: There is also a built-in Android playground to control your local Android device.
- iOS Playground: There is also a built-in iOS playground to control your local iOS device.
Driven by Visual Language Model
Midscene.js is all-in on the pure-vision route for UI actions: element localization and interactions are based on screenshots only. It supports visual-language models like Qwen3-VL, Doubao-1.6-vision, gemini-3-flash, and UI-TARS. For data extraction and page understanding, you can still opt in to include DOM when needed.
- Pure-vision localization for UI actions; the DOM extraction mode is removed.
- Works across web, mobile, desktop, and even
<canvas>surfaces. - Far fewer tokens by skipping DOM for actions, which cuts cost and speeds up runs.
- DOM can still be included for data extraction and page understanding when needed.
- Strong open-source options for self-hosting.
Read more about Model Strategy
Two styles of automation
Auto planning
AI autonomously plans and executes the flow to complete the task.
Workflow style
Split complex logic into multiple steps to improve the stability of the automation code.
For more details about the workflow style, please refer to Use JavaScript to Optimize the AI Automation Code
Resources
- Home Page and Documentation: https://midscenejs.com
- Sample Projects: https://github.com/web-infra-dev/midscene-example
- API Reference: https://midscenejs.com/api.html
- GitHub: https://github.com/web-infra-dev/midscene
Community
Credits
We would like to thank the following projects:
- Rsbuild and Rslib for the build tool.
- UI-TARS for the open-source agent model UI-TARS.
- Qwen2.5-VL for the open-source VL model Qwen2.5-VL.
- scrcpy and yume-chan allow us to control Android devices with browser.
- appium-adb for the javascript bridge of adb.
- appium-webdriveragent for operating XCTest with JavaScript.
- YADB for the yadb tool which improves the performance of text input.
- Puppeteer for browser automation and control.
- Playwright for browser automation and control and testing.
License
Midscene.js is MIT licensed.

