Config Model and Provider

Midscene uses the OpenAI SDK to call AI services. Using this SDK limits the input and output schema of AI services, but it doesn't mean you can only use OpenAI's services. You can use any model service that supports the same interface (most platforms or tools support this).

In this article, we will show you how to config AI service provider and how to choose a different model. You may read Choose a model first to learn more about how to choose a model.

Configs

Common configs

These are the most common configs, in which OPENAI_API_KEY is required.

NameDescription
OPENAI_API_KEYRequired. Your OpenAI API key (e.g. "sk-abcdefghijklmnopqrstuvwxyz")
OPENAI_BASE_URLOptional. Custom endpoint URL for API endpoint. Use it to switch to a provider other than OpenAI (e.g. "https://some_service_name.com/v1")
MIDSCENE_MODEL_NAMEOptional. Specify a different model name other than gpt-4o

Extra configs to use Qwen 2.5 VL model:

NameDescription
MIDSCENE_USE_QWEN_VLSet to "1" to use the adapter of Qwen 2.5 VL model

Extra configs to use UI-TARS model:

NameDescription
MIDSCENE_USE_VLM_UI_TARSVersion of UI-TARS model, supported values are 1.0 1.5 DOUBAO (volcengine version)

Extra configs to use Gemini 2.5 Pro model:

NameDescription
MIDSCENE_USE_GEMINISet to "1" to use the adapter of Gemini 2.5 Pro model

For more information about the models, see Choose a model.

Advanced configs

Some advanced configs are also supported. Usually you don't need to use them.

NameDescription
OPENAI_USE_AZUREOptional. Set to "true" to use Azure OpenAI Service. See more details in the following section.
MIDSCENE_OPENAI_INIT_CONFIG_JSONOptional. Custom JSON config for OpenAI SDK initialization
MIDSCENE_OPENAI_SOCKS_PROXYOptional. Proxy configuration (e.g. "socks5://127.0.0.1:1080")
OPENAI_MAX_TOKENSOptional. Maximum tokens for model response

Debug configs

By setting the following configs, you can see more logs for debugging. And also, they will be printed into the ./midscene_run/log folder.

NameDescription
DEBUG=midscene:ai:profile:statsOptional. Set this to print the AI service cost time, token usage, etc. in comma separated format, useful for analysis
DEBUG=midscene:ai:profile:detailOptional. Set this to print the AI token usage details
DEBUG=midscene:ai:callOptional. Set this to print the AI response details
DEBUG=midscene:android:adbOptional. Set this to print the adb command calling details

Two ways to config environment variables

Pick one of the following ways to config environment variables.

1. Set environment variables in your system

# replace by your own
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"

2. Set environment variables using dotenv

This is what we used in our demo project.

Dotenv is a zero-dependency module that loads environment variables from a .env file into process.env.

# install dotenv
npm install dotenv --save

Create a .env file in your project root directory, and add the following content. There is no need to add export before each line.

OPENAI_API_KEY=sk-abcdefghijklmnopqrstuvwxyz

Import the dotenv module in your script. It will automatically read the environment variables from the .env file.

import 'dotenv/config';

Using Azure OpenAI Service

There are some extra configs when using Azure OpenAI Service.

Use ADT token provider

This mode cannot be used in Chrome extension.

# this is always true when using Azure OpenAI Service
export MIDSCENE_USE_AZURE_OPENAI=1

export MIDSCENE_AZURE_OPENAI_SCOPE="https://cognitiveservices.azure.com/.default"
export AZURE_OPENAI_ENDPOINT="..."
export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"

Use keyless authentication

export MIDSCENE_USE_AZURE_OPENAI=1
export AZURE_OPENAI_ENDPOINT="..."
export AZURE_OPENAI_KEY="..."
export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"

Set Config by Javascript

You can also override the config by javascript. Remember to call this before running Midscene codes.

import { overrideAIConfig } from "@midscene/web/puppeteer";
// or import { overrideAIConfig } from "@midscene/web/playwright";
// or import { overrideAIConfig } from "@midscene/android";


overrideAIConfig({
  MIDSCENE_MODEL_NAME: "...",
  // ...
});

Example: Usinggpt-4o from OpenAI

Configure the environment variables:

export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://endpoint.some_other_provider.com/v1" # config this if you want to use a different endpoint
export MIDSCENE_MODEL_NAME="gpt-4o-2024-11-20" # optional, the default is "gpt-4o"

Example: Usingqwen-vl-max-latest from Aliyun

Configure the environment variables:

export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
export MIDSCENE_MODEL_NAME="qwen-vl-max-latest"
export MIDSCENE_USE_QWEN_VL=1

Example: Usingui-tars-72b-sft hosted by yourself

Configure the environment variables:

export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="http://localhost:1234/v1"
export MIDSCENE_MODEL_NAME="ui-tars-72b-sft"
export MIDSCENE_USE_VLM_UI_TARS=1

Example: Configclaude-3-opus-20240229 from Anthropic

When configuring MIDSCENE_USE_ANTHROPIC_SDK=1, Midscene will use Anthropic SDK (@anthropic-ai/sdk) to call the model.

Configure the environment variables:

export MIDSCENE_USE_ANTHROPIC_SDK=1
export ANTHROPIC_API_KEY="....."
export MIDSCENE_MODEL_NAME="claude-3-opus-20240229"

Example: config request headers (like for openrouter)

export OPENAI_BASE_URL="https://openrouter.ai/api/v1"
export OPENAI_API_KEY="..."
export MIDSCENE_MODEL_NAME="..."
export MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"defaultHeaders":{"HTTP-Referer":"...","X-Title":"..."}}'

Troubleshooting LLM Service Connectivity Issues

If you want to troubleshoot connectivity issues, you can use the 'connectivity-test' folder in our example project: https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test

Put your .env file in the connectivity-test folder, and run the test with npm i && npm run test.