Browser automation capabilities using Puppeteer, both support local and remote browser connection.
Add to Claude Desktop config.json
{
"mcpServers": {
"bytedance-ui-tars-desktop": {
"command": "node",
"args": [
"~/.mcp/UI-TARS-desktop/index.js"
]
}
}
} Get the source and run locally
git clone https://github.com/bytedance/UI-TARS-desktop.git ~/.mcp/UI-TARS-desktop
cd ~/.mcp/UI-TARS-desktop English | 简体中文
TARS* is a Multimodal AI Agent stack, currently shipping two projects: Agent TARS and UI-TARS-desktop:
| Agent TARS | UI-TARS-desktop |
|---|---|
|
Agent TARS is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product.
It primarily ships with a CLI and Web UI for usage. It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world MCP tools. |
UI-TARS Desktop is a desktop application that provides a native GUI Agent based on the UI-TARS model.
It primarily ships a local and remote computer as well as browser operators. |
Agent TARS is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product.
It primarily ships with a CLI and Web UI for usage.
It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world MCP tools.
Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline
https://github.com/user-attachments/assets/772b0eef-aef7-4ab9-8cb0-9611820539d8
| Booking Hotel | Generate Chart with extra MCP Servers |
|---|---|
| Instruction: I am in Los Angeles from September 1st to September 6th, with a budget of $5,000. Please help me book a Ritz-Carlton hotel closest to the airport on booking.com and compile a transportation guide for me | Instruction: Draw me a chart of Hangzhou's weather for one month |
For more use cases, please check out #842.
# Launch with `npx`.
npx @agent-tars/cli@latest
# Install globally, required Node.js >= 22
npm install @agent-tars/cli@latest -g
# Run with your preferred model provider
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key
Visit the comprehensive Quick Start guide for detailed setup instructions.
🌟 Explore Agent TARS Universe 🌟
UI-TARS Desktop is a native GUI agent for your local computer, driven by UI-TARS and Seed-1.5-VL/1.6 series models.
📑 Paper
| 🤗 Hugging Face Models
| 🫨 Discord
| 🤖 ModelScope
🖥️ Desktop Application
| 👓 Midscene (use in browser)
| Instruction | Local Operator | Remote Operator |
|---|---|---|
| Please help me open the autosave feature of VS Code and delay AutoSave operations for 500 milliseconds in the VS Code setting. | ||
| Could you help me check the latest open issue of the UI-TARS-Desktop project on GitHub? |
See Quick Start
See CONTRIBUTING.md.
This project is licensed under the Apache License 2.0.
If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:
@article{qin2025ui,
title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
journal={arXiv preprint arXiv:2501.12326},
year={2025}
} Official Microsoft Playwright MCP server, enabling LLMs to interact with web pages through structured accessibility snapshots
Automates browser-based workflows using LLMs and computer vision — navigate pages, fill forms, extract data, handle authentication, and automate any website via natural language
Automate your local Chrome browser
An MCP server using Playwright for browser automation and webscrapping
Automate browser interactions in the cloud (e.g. web navigation, data extraction, form filling, and more)
An MCP Server that autonomously debugs web applications with browser-use browser agents