Common Model Configuration

Ways to set environment variables

Midscene reads all model configuration from environment variables. Below are common approaches, but feel free to adopt any method used in your project.

Method 1: Set variables in the system

The Midscene Chrome extension also accepts this export KEY="value" format.

# Replace with your own API key
export MIDSCENE_MODEL_BASE_URL="https://.../compatible-mode/v1"
export MIDSCENE_MODEL_API_KEY="sk-abcde..."
export MIDSCENE_MODEL_NAME="qwen3-vl-plus"
export MIDSCENE_MODEL_FAMILY="qwen3-vl"

Method 2: Create a `.env` file (for CLI tools)

Create a .env file in the directory where you run the project. Midscene CLI tools load this file automatically.

MIDSCENE_MODEL_BASE_URL="https://.../compatible-mode/v1"
MIDSCENE_MODEL_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
MIDSCENE_MODEL_NAME="qwen3-vl-plus"
MIDSCENE_MODEL_FAMILY="qwen3-vl"

Keep in mind:

You do not need to prefix each line with export.
Only the Midscene CLI automatically reads this file. For the JavaScript SDK, load it manually as shown below.

Method 3: Load variables via dotenv

Dotenv is a zero-dependency npm package that loads variables from .env into Node.js process.env.

Our demo project uses this method.

# install dotenv
npm install dotenv --save

Create a .env file in the project root and add (no export prefix):

MIDSCENE_MODEL_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"

Import dotenv in your script; it will read .env automatically:

import 'dotenv/config';

Common model configurations

This section lists common model configurations. For details on model differences and selection strategies, see Recommended Vision Models.

Doubao Seed Model

Doubao-Seed-2.0-Lite is the recommended model. Doubao-Seed-1.6-Vision and Doubao-Seed-1.8 are also supported.

Obtain an API key from Volcano Engine and set:

MIDSCENE_MODEL_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
MIDSCENE_MODEL_API_KEY="...."
MIDSCENE_MODEL_NAME="ep-..." # Inference endpoint ID or model name from Volcano Engine
MIDSCENE_MODEL_FAMILY="doubao-seed" # "doubao-vision" is also supported
# Optional: control reasoning effort (low, medium, high)
# MIDSCENE_MODEL_REASONING_EFFORT="medium"

Qwen3.5

Using Alibaba Cloud's qwen3.5-plus as an example. It is recommended to disable the platform's default thinking mode to improve execution speed, the environment variable configuration is as follows:

MIDSCENE_MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
MIDSCENE_MODEL_API_KEY="......"
MIDSCENE_MODEL_NAME="qwen3.5-plus"
MIDSCENE_MODEL_FAMILY="qwen3.5"
MIDSCENE_MODEL_REASONING_ENABLED="false"

To enable thinking mode, remove the MIDSCENE_MODEL_REASONING_ENABLED="false" line and add MIDSCENE_MODEL_REASONING_BUDGET="500" to control thinking cost.

You can also use Qwen3.5 from OpenRouter.

Qwen3-VL

Using Alibaba Cloud's qwen3-vl-plus as an example:

MIDSCENE_MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
MIDSCENE_MODEL_API_KEY="......"
MIDSCENE_MODEL_NAME="qwen3-vl-plus"
MIDSCENE_MODEL_FAMILY="qwen3-vl"

You can also use Qwen3-VL from OpenRouter.

Qwen2.5-VL

Using Alibaba Cloud's qwen-vl-max-latest as an example:

MIDSCENE_MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
MIDSCENE_MODEL_API_KEY="......"
MIDSCENE_MODEL_NAME="qwen-vl-max-latest"
MIDSCENE_MODEL_FAMILY="qwen2.5-vl"

Zhipu GLM-V

Zhipu GLM-V is an open-source vision model from Zhipu AI. Using GLM-4.6V as an example:

Obtain an API key from Z.AI (Global) or BigModel (CN), and set:

MIDSCENE_MODEL_BASE_URL="https://api.z.ai/api/paas/v4" # Or https://open.bigmodel.cn/api/paas/v4
MIDSCENE_MODEL_API_KEY="......"
MIDSCENE_MODEL_NAME="glm-4.6v"
MIDSCENE_MODEL_FAMILY="glm-v"

Learn more about Zhipu GLM-V

Github: https://github.com/zai-org/GLM-V
Hugging Face: https://huggingface.co/zai-org/GLM-4.6V

Zhipu AutoGLM

Zhipu AutoGLM is an open-source mobile UI automation model (9B parameters) from Zhipu AI.

After obtaining an API key from Z.AI (Global) or BigModel (CN), configure:

MIDSCENE_MODEL_BASE_URL="https://api.z.ai/api/paas/v4" # Or https://open.bigmodel.cn/api/paas/v4
MIDSCENE_MODEL_API_KEY="......"
MIDSCENE_MODEL_NAME="autoglm-phone"
MIDSCENE_MODEL_FAMILY="auto-glm" # Or "auto-glm-multilingual"

About MIDSCENE_MODEL_FAMILY Configuration

AutoGLM provides two model versions, distinguished by MIDSCENE_MODEL_FAMILY:

auto-glm - Corresponds to AutoGLM-Phone-9B, optimized for Chinese mobile applications
auto-glm-multilingual - Corresponds to AutoGLM-Phone-9B-Multilingual, supports English and other languages

Choose the appropriate version based on your application language.

Note

AutoGLM is best suited for mobile interactions and operations. If you need to use aiAssert, aiQuery, or other APIs requiring page understanding/assertions, you should configure separate MIDSCENE_INSIGHT_MODEL_... environment variables to let an independent Insight model handle page comprehension. See Model Strategy for more information about multi-model configuration.

Learn more about Zhipu AutoGLM

Github: https://github.com/zai-org/Open-AutoGLM
Hugging Face: https://huggingface.co/zai-org/AutoGLM-Phone-9B

Gemini-3-Pro and Gemini-3-Flash

After requesting an API key from Google Gemini, configure. Use your specific Gemini-3-Pro or Gemini-3-Flash release name for MIDSCENE_MODEL_NAME:

MIDSCENE_MODEL_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
MIDSCENE_MODEL_API_KEY="......"
MIDSCENE_MODEL_NAME="gemini-3.0-pro-preview" # Or a gemini-3-flash release name
MIDSCENE_MODEL_FAMILY="gemini"

GPT-5.4

When using GPT-5.4 on OpenAI or an OpenAI-compatible provider, use this configuration:

MIDSCENE_MODEL_BASE_URL="https://api.openai.com/v1" # Or your compatible endpoint
MIDSCENE_MODEL_API_KEY="sk-..."
MIDSCENE_MODEL_NAME="gpt-5.4"
MIDSCENE_MODEL_FAMILY="gpt-5"

UI-TARS

Use the deployed doubao-1.5-ui-tars on Volcano Engine:

MIDSCENE_MODEL_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
MIDSCENE_MODEL_API_KEY="...."
MIDSCENE_MODEL_NAME="ep-2025..." # Inference endpoint ID or model name from Volcano Engine
MIDSCENE_MODEL_FAMILY="vlm-ui-tars-doubao-1.5"

About MIDSCENE_MODEL_FAMILY

This variable selects the UI-TARS version. Supported values:

vlm-ui-tars – for the 1.0 release
vlm-ui-tars-doubao – for the 1.5 release deployed on Volcano Engine (equivalent to vlm-ui-tars-doubao-1.5)
vlm-ui-tars-doubao-1.5 – for the 1.5 release deployed on Volcano Engine

Tip

The legacy configurations MIDSCENE_USE_VLM_UI_TARS=DOUBAO or MIDSCENE_USE_VLM_UI_TARS=1.5 are still supported but deprecated. Please migrate to MIDSCENE_MODEL_FAMILY.

Migration mapping:

MIDSCENE_USE_VLM_UI_TARS=1.0 → MIDSCENE_MODEL_FAMILY="vlm-ui-tars"
MIDSCENE_USE_VLM_UI_TARS=1.5 → MIDSCENE_MODEL_FAMILY="vlm-ui-tars-doubao-1.5"
MIDSCENE_USE_VLM_UI_TARS=DOUBAO → MIDSCENE_MODEL_FAMILY="vlm-ui-tars-doubao"

GPT-4o

Starting with version 1.0, Midscene no longer supports gpt series models as the default model. See Model strategy for details.

Multi-model example: GPT-5.1 planning/insight + Qwen3-VL for vision

For more information on combining multiple models, see Advanced: Combining Multiple Models.

Below is an example using GPT-5.1 for Planning/Insight and Qwen3-VL for vision. Use GPT-5.1 for Planning and/or Insight to handle heavy reasoning, while Qwen3-VL focuses on visual grounding. You can enable either role or both—toggle them based on your workload.

# Default vision model: Qwen3-VL
export MIDSCENE_MODEL_BASE_URL="https://..."       # Qwen3-VL endpoint
export MIDSCENE_MODEL_API_KEY="..."                # Your Qwen3-VL API key
export MIDSCENE_MODEL_NAME="qwen3-vl-plus"
export MIDSCENE_MODEL_FAMILY="qwen3-vl"

# Planning model: GPT-5.1
export MIDSCENE_PLANNING_MODEL_API_KEY="sk-..."    # Your GPT-5.1 API key
export MIDSCENE_PLANNING_MODEL_BASE_URL="https://..." 
export MIDSCENE_PLANNING_MODEL_NAME="gpt-5.1"

# Insight model: GPT-5.1
export MIDSCENE_INSIGHT_MODEL_API_KEY="sk-..."     # Your GPT-5.1 API key
export MIDSCENE_INSIGHT_MODEL_BASE_URL="https://..."
export MIDSCENE_INSIGHT_MODEL_NAME="gpt-5.1"

For additional, advanced model settings, see Model configuration.

Troubleshooting model service connectivity issues

If you want to troubleshoot connectivity issues, you can use the 'connectivity-test' folder in our example project: https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test

Put your .env file in the connectivity-test folder, and run the test with npm i && npm run test.

#Common Model Configuration

#Ways to set environment variables

#Method 1: Set variables in the system

#Method 2: Create a .env file (for CLI tools)

#Method 3: Load variables via dotenv

#Common model configurations

#Doubao Seed Model

#Qwen3.5

#Qwen3-VL

#Qwen2.5-VL

#Zhipu GLM-V

#Zhipu AutoGLM

#Gemini-3-Pro and Gemini-3-Flash

#GPT-5.4

#UI-TARS

#GPT-4o

#Multi-model example: GPT-5.1 planning/insight + Qwen3-VL for vision

#More

#Troubleshooting model service connectivity issues