Model configuration

Midscene reads all model configuration from operating-system environment variables.

Midscene integrates the OpenAI SDK by default for AI calls. The SDK defines the parameter shape, and most model providers (or deployment tools) offer compatible endpoints.

This doc focuses on Midscene model configuration. For how we choose models, see Model strategy. For quick recipes for popular models, see Common model configuration.

Required settings

You need to set a default model for Midscene; see Model strategy for details.

NameDescription
MIDSCENE_MODEL_API_KEYModel API key, e.g., "sk-abcd..."
MIDSCENE_MODEL_BASE_URLAPI endpoint URL, usually ending with a version (e.g., /v1); do not append /chat/completion here since the underlying sdk will add it automatically
MIDSCENE_MODEL_NAMEModel name
MIDSCENE_MODEL_FAMILYModel family, determine the way of dealing with the coordinates

Advanced settings (optional)

NameDescription
MIDSCENE_MODEL_TIMEOUTTimeout for AI API calls in milliseconds (default intent). Defaults to OpenAI SDK default (10 minutes)
MIDSCENE_MODEL_MAX_TOKENSmax_tokens for responses, default 2048
MIDSCENE_MODEL_HTTP_PROXYHTTP/HTTPS proxy, e.g., http://127.0.0.1:8080 or https://proxy.example.com:8080. Takes precedence over MIDSCENE_MODEL_SOCKS_PROXY
MIDSCENE_MODEL_SOCKS_PROXYSOCKS proxy, e.g., socks5://127.0.0.1:1080
MIDSCENE_MODEL_INIT_CONFIG_JSONJSON blob that overrides the OpenAI SDK initialization config
MIDSCENE_RUN_DIRRun artifact directory, like report and logs. Defaults to midscene_run in the current working directory; accepts absolute or relative paths
MIDSCENE_PREFERRED_LANGUAGEOptional. Preferred response language. Defaults to Chinese if timezone is GMT+8, otherwise English

Note: Control replanning behavior with the agent option replanningCycleLimit (defaults to 20, or 40 for vlm-ui-tars), not with environment variables.

Configure a dedicated Insight model

Set the following if the Insight intent needs a different model:

NameDescription
MIDSCENE_INSIGHT_MODEL_API_KEYAPI key
MIDSCENE_INSIGHT_MODEL_BASE_URLAPI endpoint URL (omit the trailing /chat/completion)
MIDSCENE_INSIGHT_MODEL_NAMEModel name
MIDSCENE_INSIGHT_MODEL_TIMEOUTOptional; timeout for Insight intent AI API calls in milliseconds
MIDSCENE_INSIGHT_MODEL_HTTP_PROXYOptional; same effect as MIDSCENE_MODEL_HTTP_PROXY
MIDSCENE_INSIGHT_MODEL_SOCKS_PROXYOptional; same effect as MIDSCENE_MODEL_SOCKS_PROXY
MIDSCENE_INSIGHT_MODEL_INIT_CONFIG_JSONOptional; same effect as MIDSCENE_MODEL_INIT_CONFIG_JSON

Configure a dedicated Planning model

Set the following if the Planning intent needs a different model:

NameDescription
MIDSCENE_PLANNING_MODEL_API_KEYAPI key
MIDSCENE_PLANNING_MODEL_BASE_URLAPI endpoint URL (omit the trailing /chat/completion)
MIDSCENE_PLANNING_MODEL_NAMEModel name
MIDSCENE_PLANNING_MODEL_TIMEOUTOptional; timeout for Planning intent AI API calls in milliseconds
MIDSCENE_PLANNING_MODEL_HTTP_PROXYOptional; same effect as MIDSCENE_MODEL_HTTP_PROXY
MIDSCENE_PLANNING_MODEL_SOCKS_PROXYOptional; same effect as MIDSCENE_MODEL_SOCKS_PROXY
MIDSCENE_PLANNING_MODEL_INIT_CONFIG_JSONOptional; same effect as MIDSCENE_MODEL_INIT_CONFIG_JSON

Debug logging switches

Enable the following variables to print richer debug logs. Regardless of the switches, logs are also saved under ./midscene_run/log.

NameDescription
DEBUG=midscene:ai:profile:statsPrints model latency, token usage, etc., comma-separated for easier analysis
DEBUG=midscene:ai:profile:detailPrints detailed token-usage logs
DEBUG=midscene:ai:callPrints AI response details
DEBUG=midscene:android:adbPrints Android adb command details
DEBUG=midscene:*Prints every debug log

The following environment variables are deprecated but still compatible. We recommend migrating to the new configuration approach.

Planning model configuration

NameDescriptionNew approach
MIDSCENE_USE_DOUBAO_VISIONDeprecated. Enables Doubao vision modelUse MIDSCENE_MODEL_FAMILY="doubao-vision"
MIDSCENE_USE_QWEN3_VLDeprecated. Enables Qwen3-VL modelUse MIDSCENE_MODEL_FAMILY="qwen3-vl"
MIDSCENE_USE_QWEN_VLDeprecated. Enables Qwen2.5-VL modelUse MIDSCENE_MODEL_FAMILY="qwen2.5-vl"
MIDSCENE_USE_GEMINIDeprecated. Enables Gemini modelUse MIDSCENE_MODEL_FAMILY="gemini"
MIDSCENE_USE_VLM_UI_TARSDeprecated. Enables UI-TARS modelUse MIDSCENE_MODEL_FAMILY="vlm-ui-tars*"

General configuration

NameDescriptionNew approach
OPENAI_API_KEYDeprecated but supportedPrefer MIDSCENE_MODEL_API_KEY
OPENAI_BASE_URLDeprecated but supportedPrefer MIDSCENE_MODEL_BASE_URL
MIDSCENE_OPENAI_INIT_CONFIG_JSONDeprecated but supportedPrefer MIDSCENE_MODEL_INIT_CONFIG_JSON
MIDSCENE_OPENAI_HTTP_PROXYDeprecated but supportedPrefer MIDSCENE_MODEL_HTTP_PROXY
MIDSCENE_OPENAI_SOCKS_PROXYDeprecated but supportedPrefer MIDSCENE_MODEL_SOCKS_PROXY
OPENAI_MAX_TOKENSDeprecated but supportedPrefer MIDSCENE_MODEL_MAX_TOKENS

Configure settings via JavaScript

You can configure models for each agent in JavaScript. See the API reference for details.

const agent = new Agent(page, {
  // Configure via modelConfig
  modelConfig: {
    MIDSCENE_MODEL_TIMEOUT: '60000', // 60 seconds
    MIDSCENE_MODEL_NAME: 'qwen3-vl-plus',
    // ... other configurations
  }
});

FAQ

How can I monitor token usage?

Set DEBUG=midscene:ai:profile:stats to print usage and latency.

You can also find usage statistics inside the generated report files.

Using LangSmith

LangSmith is a platform for debugging large language models. Midscene provides auto-integration support - just install the dependency and set environment variables.

Step 1: Install dependency

npm install langsmith

Step 2: Set environment variables

# Enable Midscene's LangSmith auto-integration
export MIDSCENE_LANGSMITH_DEBUG=1

# LangSmith configuration
export LANGCHAIN_API_KEY="your-langchain-api-key-here"
export LANGCHAIN_TRACING=true
export LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
# export LANGCHAIN_ENDPOINT="https://eu.api.smith.langchain.com" # If signed up in the EU region

After starting Midscene, you should see logs similar to:

DEBUGGING MODE: langsmith wrapper enabled

Notes:

  • LangSmith and Langfuse can be enabled simultaneously.
  • Node.js only; browser environments will throw errors.
  • If you use createOpenAIClient, it overrides the env-based auto-integration.

For finer-grained control (e.g., enabling LangSmith only for specific tasks), use createOpenAIClient to wrap the client manually.

Using Langfuse

Langfuse is another popular LLM observability platform. Integration is similar to LangSmith.

Step 1: Install dependency

npm install langfuse

Step 2: Set environment variables

# Enable Midscene's Langfuse auto-integration
export MIDSCENE_LANGFUSE_DEBUG=1

# Langfuse configuration
export LANGFUSE_PUBLIC_KEY="your-langfuse-public-key-here"
export LANGFUSE_SECRET_KEY="your-langfuse-secret-key-here"
export LANGFUSE_BASE_URL="https://cloud.langfuse.com" # 🇪🇺 EU region
# export LANGFUSE_BASE_URL="https://us.cloud.langfuse.com" # 🇺🇸 US region

After starting Midscene, you should see logs similar to:

DEBUGGING MODE: langfuse wrapper enabled

Notes:

  • LangSmith and Langfuse can be enabled simultaneously.
  • Node.js only; browser environments will throw errors.
  • If you use createOpenAIClient, it overrides the env-based auto-integration.