English

Configure model and provider

Midscene uses the OpenAI SDK to call AI services. Using this SDK limits the input and output schema of AI services, but it doesn't mean you can only use OpenAI's services. You can use any model service that supports the same interface (most platforms or tools support this).

In this article, we will show you how to config AI service provider and how to choose a different model. You may read Choose a model first to learn more about how to choose a model.

Configs

Common configs

These are the most common configs, in which OPENAI_API_KEY is required.

Name	Description
`OPENAI_API_KEY`	Required. Your OpenAI API key (e.g. "sk-abcdefghijklmnopqrstuvwxyz")
`OPENAI_BASE_URL`	Optional. Custom endpoint URL for API endpoint. Use it to switch to a provider other than OpenAI (e.g. "https://some_service_name.com/v1")
`MIDSCENE_MODEL_NAME`	Optional. Specify a different model name other than `gpt-4o`

Extra configs to use Qwen 2.5 VL model:

Name	Description
`MIDSCENE_USE_QWEN_VL`	Set to "1" to use the adapter of Qwen 2.5 VL model

Extra configs to use UI-TARS model:

Name	Description
`MIDSCENE_USE_VLM_UI_TARS`	Version of UI-TARS model, supported values are `1.0` `1.5` `DOUBAO` (volcengine version)

Extra configs to use Gemini 2.5 Pro model:

Name	Description
`MIDSCENE_USE_GEMINI`	Set to "1" to use the adapter of Gemini 2.5 Pro model

For more information about the models, see Choose a model.

Advanced configs

Some advanced configs are also supported. Usually you don't need to use them.

Name	Description
`OPENAI_USE_AZURE`	Optional. Set to "true" to use Azure OpenAI Service. See more details in the following section.
`MIDSCENE_OPENAI_INIT_CONFIG_JSON`	Optional. Custom JSON config for OpenAI SDK initialization
`MIDSCENE_OPENAI_HTTP_PROXY`	Optional. HTTP/HTTPS proxy configuration (e.g. `http://127.0.0.1:8080` or `https://proxy.example.com:8080`). This option has higher priority than `MIDSCENE_OPENAI_SOCKS_PROXY`
`MIDSCENE_OPENAI_SOCKS_PROXY`	Optional. SOCKS proxy configuration (e.g. "socks5://127.0.0.1:1080")
`MIDSCENE_PREFERRED_LANGUAGE`	Optional. The preferred language for the model response. The default is `Chinese` if the current timezone is GMT+8 and `English` otherwise.
`MIDSCENE_REPLANNING_CYCLE_LIMIT`	Optional. The maximum number of replanning cycles, default is 10
`OPENAI_MAX_TOKENS`	Optional. Maximum tokens for model response, default is 2048

Debug configs

By setting the following configs, you can see more logs for debugging. And also, they will be printed into the ./midscene_run/log folder.

Name	Description
`DEBUG=midscene:ai:profile:stats`	Optional. Set this to print the AI service cost time, token usage, etc. in comma separated format, useful for analysis
`DEBUG=midscene:ai:profile:detail`	Optional. Set this to print the AI token usage details
`DEBUG=midscene:ai:call`	Optional. Set this to print the AI response details
`DEBUG=midscene:android:adb`	Optional. Set this to print the adb command calling details

Two ways to configure environment variables

Pick one of the following ways to config environment variables.

1. Set environment variables in your system

# replace by your own
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"

# if you are not using the default OpenAI model, you need to config more params
# export MIDSCENE_MODEL_NAME="..."

2. Set environment variables using dotenv

This is what we used in our demo project.

Dotenv is a zero-dependency module that loads environment variables from a .env file into process.env.

# install dotenv
npm install dotenv --save

Create a .env file in your project root directory, and add the following content. There is no need to add export before each line.

OPENAI_API_KEY=sk-abcdefghijklmnopqrstuvwxyz

Import the dotenv module in your script. It will automatically read the environment variables from the .env file.

import 'dotenv/config';

Using Azure OpenAI Service

There are some extra configs when using Azure OpenAI Service.

Use ADT token provider

This mode cannot be used in Chrome extension.

# this is always true when using Azure OpenAI Service
export MIDSCENE_USE_AZURE_OPENAI=1

export MIDSCENE_AZURE_OPENAI_SCOPE="https://cognitiveservices.azure.com/.default"
export AZURE_OPENAI_ENDPOINT="..."
export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"

Use keyless authentication

export MIDSCENE_USE_AZURE_OPENAI=1
export AZURE_OPENAI_ENDPOINT="..."
export AZURE_OPENAI_KEY="..."
export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"

Set config by JavaScript

You can also override the config by javascript. Remember to call this before running Midscene codes.

import { overrideAIConfig } from "@midscene/web/puppeteer";
// or import { overrideAIConfig } from "@midscene/web/playwright";
// or import { overrideAIConfig } from "@midscene/android";


overrideAIConfig({
  MIDSCENE_MODEL_NAME: "...",
  // ...
});