English

New

Midscene 1.0 is released|See what's new

Midscene.js

Driving all platforms UI automation with vision-based model

11k+

Github Stars

No.2 in Github trending

Documentation Showcases

Platforms

Web, iOS, Android, and more

Control browsers and mobile apps with natural language across multiple platforms

Unified API design for seamless cross-platform automation

Web

Integrate with Puppeteer or Playwright, or use Bridge Mode to control desktop browsers.

iOS

Control iOS devices with WebDriver using natural language

Android

Control Android devices with adb using natural language

Any Interface

Automation on any interface, beyond DOM / Accessibility limitations.

MODEL STRATEGY

Vision Models Multi-model combination Adapt open-source models

Vision models boost action precision

Multi-model setups raise completion rates

Open-source options that still perform

Doubao Seed

Doubao Seed vision model optimized for visual understanding and UI element recognition with excellent performance.

Qwen3-VL

Qwen vision-language model with high-quality image understanding and UI element recognition at competitive pricing.

Gemini-3-Pro

Advanced Gemini multimodal model with powerful vision capabilities and comprehensive UI automation support.

Multi-model combo

Supports using different models for planning and interaction to improve task completion rates

DEVELOPER EXPERIENCE

Rich APIs & Tools

Practical APIs to control automation flows and run strategy

Supports extending your own UI action agents

Helps developers ship UI automation tasks faster

Rich APIs

Enables both smart automation workflows and fine-grained atomic control.

MCP Server

Exposes device operations as an MCP Server for collaboration with various models.

Reports & Playground

Provides intuitive visualization reports to help developers trace back the automation process

Flexible Integration

Supports using Yaml to write automation flows, supports custom Agent execution strategies

View All APIs

aiAct, aiLocate, aiAssert...

Explore the complete API documentation for more automation capabilities.

Documentation Showcases