FAQ
Platform-Specific FAQ
The following platform-specific FAQs are maintained in their respective documentation:
- Web Browser - Playwright
- Web Browser - Puppeteer
- Web Browser - Chrome Extension
- Web Browser - Bridge Mode
- Android
- iOS
- HarmonyOS
- PC Desktop
What data is sent to AI model?
The screenshot will be sent to the AI model. In some cases, like setting the domIncluded option to true when calling aiAsk or aiQuery, the DOM information will also be sent.
If you are worried about data privacy issues, please refer to Data Privacy
My model provider requires adding specific headers to requests
You can use defaultHeaders in the MIDSCENE_MODEL_INIT_CONFIG_JSON environment variable to specify headers to include in the request. For example:
If your provider documentation calls this field extra_headers or extraHeaders, Midscene also accepts those aliases and normalizes them to defaultHeaders. When multiple aliases are present, the priority is: defaultHeaders > extra_headers > extraHeaders.
You can generate the JSON string with JSON serialization to avoid mistakes when writing it by hand:
How do I use Azure OpenAI Service?
When using Azure OpenAI Service, first choose the model and fill in the regular model configuration from Model configuration. Azure only requires changing the model service URL and API Key to the Azure form:
In other words, other settings such as MIDSCENE_MODEL_NAME and MIDSCENE_MODEL_FAMILY should still follow the corresponding model section in Model configuration. Azure is only a model provider with different authentication, not a special model.
This uses the normal OpenAI-compatible path and sends POST /openai/v1/chat/completions with Authorization: Bearer .... Do not append /chat/completions to MIDSCENE_MODEL_BASE_URL, and do not add api-version for /openai/v1 endpoints.
If an Azure-compatible gateway only accepts the api-key header, use this fallback:
In this fallback, MIDSCENE_MODEL_API_KEY="placeholder" only satisfies the OpenAI SDK constructor check. The real key is sent through defaultHeaders.api-key.
Azure AD / keyless auth (DefaultAzureCredential) is not supported. Use an API key.
How to improve the running time?
There are several ways to improve the running time:
- Use instant action interface like
agent.aiTap('Login Button')instead ofagent.ai('Click Login Button'). - Use a lower resolution if possible, this will reduce the input token cost.
- Change to a faster model service
- Use caching to accelerate the debug process. Read more about it in Caching.
How do I configure the midscene_run directory?
Midscene saves runtime artifacts (reports, logs, cache, etc.) in the midscene_run directory. By default, this directory is created in the current working directory.
You can customize the directory location using the MIDSCENE_RUN_DIR environment variable, which accepts both relative and absolute paths:
The directory contains the following subdirectories:
report/- Test report files (HTML format)log/- Debug log filescache/- Cache files (see Caching)
For more configuration options, see Model configuration.
How do I control the report player's default replay style via a link?
You can override the default values of the Focus on cursor and Show element markers toggles by adding query parameters to the report URL, which determines whether the report highlights the cursor position and element markers. Use focusOnCursor and showElementMarkers with values such as true, false, 1, or 0. For example: ...?focusOnCursor=false&showElementMarkers=true.
Inaccurate Element Positioning
If you encounter inaccurate element positioning when using Midscene, follow these steps to troubleshoot and resolve the issue:
1. Upgrade to the Latest Version
Make sure you are using the latest version of Midscene, as new versions typically include optimizations and improvements for positioning accuracy.
2. Use Better Vision Models
Midscene's element positioning capability relies on the AI model's visual understanding ability, so be sure to choose models that support visual capabilities.
Generally, newer versions and models with larger parameters perform better than older versions and smaller models. For example, Qwen3-VL performs better than Qwen2.5-VL, and its plus version performs better than the flash version.
For more model selection suggestions, please refer to Model Strategy.
3. Check Model Family Configuration
Verify that the MIDSCENE_MODEL_FAMILY parameter is set correctly in your model configuration. Incorrect MIDSCENE_MODEL_FAMILY configuration will affect Midscene's adaptation logic for the model. See Model Configuration for details.
4. Optimize prompts with visual features and position information
If the positioning result randomly lands on unrelated elements and varies significantly between runs, the model usually cannot understand the semantics behind the icon button.
For example, aiTap('profile center') is a functional description, and the model may not know the specific appearance of a profile icon. In contrast, aiTap('person avatar icon') is a visual description, so the model can locate the element based on its visual characteristics.
Solution: optimize prompts by combining visual features and position information to describe the element.
5. Enable deepLocate
If the positioning result lands near the target element but is still off by a few pixels, the model has probably identified the right target but still has some positioning deviation.
Solution: enabling deepLocate can significantly improve positioning accuracy.
For more information about deepLocate, please refer to the API documentation.
6. Increase the browser DPR to 2 on web
If you are running Midscene in a web browser, you can try increasing the DPR to 2. In CI environments, the default DPR is often 1. Raising it to 2 makes the page clearer, which usually improves positioning for small elements.
Keep in mind that this will consume more tokens.
Does the Doubao phone use Midscene under the hood?
No.

