Common Model Configuration
Ways to set environment variables
Midscene reads all model configuration from environment variables. Below are common approaches, but feel free to adopt any method used in your project.
Method 1: Set variables in the system
The Midscene Chrome extension also accepts this
export KEY="value"format.
Method 2: Create a .env file (for CLI tools)
Create a .env file in the directory where you run the project. Midscene CLI tools load this file automatically.
Keep in mind:
- You do not need to prefix each line with
export. - Only the Midscene CLI automatically reads this file. For the JavaScript SDK, load it manually as shown below.
Method 3: Load variables via dotenv
Dotenv is a zero-dependency npm package that loads variables from .env into Node.js process.env.
Our demo project uses this method.
Create a .env file in the project root and add (no export prefix):
Import dotenv in your script; it will read .env automatically:
Common model configurations
Doubao Seed vision models
Obtain an API key from Volcano Engine and set:
Qwen3-VL
Using Alibaba Cloud's qwen3-vl-plus as an example:
Qwen2.5-VL
Using Alibaba Cloud's qwen-vl-max-latest as an example:
Gemini-3-Pro
After requesting an API key from Google Gemini, configure:
UI-TARS
Use the deployed doubao-1.5-ui-tars on Volcano Engine:
About MIDSCENE_MODEL_FAMILY
This variable selects the UI-TARS version. Supported values:
vlm-ui-tars– for the 1.0 releasevlm-ui-tars-doubao– for the 1.5 release deployed on Volcano Engine (equivalent tovlm-ui-tars-doubao-1.5)vlm-ui-tars-doubao-1.5– for the 1.5 release deployed on Volcano Engine
The legacy configurations MIDSCENE_USE_VLM_UI_TARS=DOUBAO or MIDSCENE_USE_VLM_UI_TARS=1.5 are still supported but deprecated. Please migrate to MIDSCENE_MODEL_FAMILY.
Migration mapping:
MIDSCENE_USE_VLM_UI_TARS=1.0→MIDSCENE_MODEL_FAMILY="vlm-ui-tars"MIDSCENE_USE_VLM_UI_TARS=1.5→MIDSCENE_MODEL_FAMILY="vlm-ui-tars-doubao-1.5"MIDSCENE_USE_VLM_UI_TARS=DOUBAO→MIDSCENE_MODEL_FAMILY="vlm-ui-tars-doubao"
GPT-4o
Starting with version 1.0, Midscene no longer supports gpt series models as the default model. See Model strategy for details.
Multi-model example: GPT-5.1 planning/insight + Qwen3-VL for vision
Use GPT-5.1 for Planning and/or Insight to handle heavy reasoning, while Qwen3-VL focuses on visual grounding. You can enable either role or both—toggle them based on your workload.
More
For additional, advanced model settings, see Model configuration.
Troubleshooting model service connectivity issues
If you want to troubleshoot connectivity issues, you can use the 'connectivity-test' folder in our example project: https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test
Put your .env file in the connectivity-test folder, and run the test with npm i && npm run test.

