No. Midscene is an automation assistance SDK with a key feature of action stability — ensuring the same actions are performed in each run. To maintain this stability, we encourage you to provide detailed instructions to help the AI understand each step of your task.
Related Docs: Prompting Tips
There are some limitations with Midscene. We are still working on them.
Please refer to Choose a model.
The screenshot will be sent to the AI model. If you are using GPT-4o, some key information extracted from the DOM will also be sent.
If you are worried about data privacy issues, please refer to Data Privacy
When using general-purpose LLM in Midscene.js, the running time may increase by a factor of 3 to 10 compared to traditional Playwright scripts, for instance from 5 seconds to 20 seconds. To make the result more stable, the token and time cost is inevitable.
There are two ways to improve the running time:
It's common when the viewport deviceScaleFactor
does not match your system settings. Setting it to 2 in OSX will solve the issue.
The report files are saved in ./midscene-run/report/
by default.
It's mainly about the UI parsing and multimodal AI. Here is a flowchart that describes the core process of the interaction between Midscene and AI.