There are three main capabilities: action, query, assert.
.ai
, .aiAction
) to execute a series of actions by describing the steps.aiQuery
) to extract customized data from the UI. Describe the JSON format you want, and AI will give the answer based on its "understand" of the page.aiAssert
) to perform assertions on the page.All these methods accept natural language prompt as param. Obviously, the cost of script maintenance will be greatly decreased.
To quickly experience the main features of Midscene, you can use the Midscene Chrome extension. It allows you to use Midscene on any webpage without writing any code.
Click here to install Midscene extension from Chrome Web Store.
For instructions, please refer to Quick Experience.
Maintaining automation scripts by Midscene could be a brand new experience. For example, to search for headphones on a website, you can do this:
There are several ways to integrate Midscene into your code project:
Midscene provides a visual report after each run. With this report, you can review the animated replay and view the details of each step in the process. What's more, there is a playground in the report file for you to adjust your prompt without re-running all your scripts.
Currently, the model we are using by default is the OpenAI GPT-4o model, while you can customize it to a different multimodal model like Gemini, Qwen, etc if needed.
All data gathered from pages will be sent directly to OpenAI or the custom model provider according to your configuration. Therefore, no third-party platform will access the data.
For more details, please refer to Data Privacy.