【AI Agent】Browser Use WebUI + Deepseek編

前回のBrowseruseのご紹介/
Introduction to the Previous Use of Browseruse

はじめに(English is below)

「Browser Use WebUI」は、AIエージェントがウェブサイトと直接対話できるようにするためのユーザーフレンドリーなウェブインターフェースを提供するツールです。これは、AIエージェントがウェブブラウザを通じてウェブサイトにアクセスし、操作することを可能にします。

主な特徴は以下の通りです：

WebUI:
ユーザーフレンドリーなウェブインターフェースを提供し、ブラウザエージェントとの容易な対話を可能にします。
拡張されたLLMサポート:
Gemini、OpenAI、Azure OpenAI、Anthropic、DeepSeek、Ollamaなど、さまざまな大規模言語モデル（LLM）との統合をサポートしています。
カスタムブラウザサポート:
ユーザー自身のブラウザを使用できるため、サイトへの再ログインや認証の問題を回避できます。この機能は、高解像度の画面録画もサポートしています。
カスタマイズされたエージェント:
最適化されたプロンプトを使用して、ブラウザの使用を強化するカスタムエージェントが実装されています。

このツールは、Python 3.11以上で動作し、Playwrightを使用してブラウザ操作を行います。ユーザーは自身のブラウザを設定し、ウェブインターフェースを通じてAIエージェントと対話することができ、つまりコード一切書かずにAI Agentに指示を出すことができるようになります。（激アツ！！）

事前準備

Python > 3.11

clone https://github.com/browser-use/web-ui.git
pip install -r requirements.txt
playwright install

lsコマンドでフォルダの構成を確認し、「.env.example」というテンプレートがあれば問題無いです。

ls -a -l

vimで「.env.example」を確認し、各LLMがサポートされていることがわかります。

ここではDeepseekを利用しますので、「DEEP_SEEK_API_KEY=」行にAPIキーを入力し:wqで保存します。（Deepseek利用する理由も簡単、単純に安いからですw）
そうしたらcpコマンドで.envファイルを作成します。

cp .env.example .env

ここまでは初期準備は全部完了となります。

WEB-UIを立ち上げ

Web-UIを起動するのはDockerをBuildでも良いですが、ここでは利用しませんので、下記コマンドを入力して立ち上げます。

python webui.py --ip 127.0.0.1 --port 7788 --dark-mode

そうしたらブラウザーにlocalhost:7788を入力し下記の画面が見れます。

Agent Settings

では最大実行ステップ数、視覚LLM等の利用を設定できます。ここではDeepseek利用するので必ずUse Visionのチェックを外しましょう。そうしないと動けないので。

LLM configuration

Deepseek利用するためにLLM ProviderをDeepseekに変更し、他の設定は不要（前のStepで.env定義済みのため、API endpointとAPI Keyの設定がいらない）

Browser Settings

ここでは録画するか、録画の格納先、ブラウザーの大きさ、ヘッドレスモード等の設定できます。

Run Agent

ここではAI Agentに対して指示を出すところです。

Results

実行の結果、エラーなど全部こちらのページで表示されます。

Live Demo

YouTubeバージョン

最後

UIテストがかなり簡単になったものの、ジュニアレベルのエンジニアあるいは古い技術に固執する人が必ず失業するだろー皆様に良い未来がありますように

introduction

“Browser Use WebUI” – A User-Friendly Tool for Seamless AI-Agent Interaction with Websites

Key Features:

WebUI:
Offers a user-friendly interface for effortless interaction with browser agents.
Advanced LLM Support:
Compatible with leading large language models (LLMs), including Gemini, OpenAI, Azure OpenAI, Anthropic, DeepSeek, and Ollama.
Custom Browser Support:
Lets users operate their own browsers to avoid re-login or authentication issues. This feature also supports high-resolution screen recordings.
Custom Agents:
Implements optimized prompts, enhancing browser interactions and improving AI-agent performance.

This tool is built for Python 3.11 or later and uses Playwright for browser control. Users can configure their browsers and interact with AI agents via a simple web interface. no coding required!

Initial Setup

Python > 3.11

clone https://github.com/browser-use/web-ui.git
pip install -r requirements.txt
playwright install

Use the ls command to confirm the presence of a “.env.example” template file.

ls -a -l

Open the file with vim to verify support for the necessary LLMs.

Add your API key to the DEEP_SEEK_API_KEY= line, save the file with :wq, and create a .env file using the cp command.(Why DeepSeek? It’s simply the most cost-effective option!)

cp .env.example .env

Once these steps are complete, your initial setup is ready.

Launching the Web-UI

Instead of building Docker, you can start the Web-UI with the following command:

python webui.py --ip 127.0.0.1 --port 7788 --dark-mode

Then, open your browser and navigate to localhost:7788. You’ll see the main interface with the following settings:

Main Features and Settings

Agent Settings

Configure the maximum execution steps and toggle vision-based LLM features.(For DeepSeek, make sure to uncheck “Use Vision” as it’s not supported.)

LLM Configuration

Set the LLM Provider to DeepSeek. If your .env file is correctly set up, no additional configuration is needed.

Browser Settings

Customize settings for screen recording, storage location, browser size, headless mode, and more.

Run Agent

Use this section to provide specific instructions to the AI agent.

Results

View detailed execution results, including any errors, on this page.

Live Demo

Youtube version

Final Thoughts

This tool makes UI testing incredibly simple, but it may accelerate the displacement of junior engineers or those resistant to adopting modern technologies. Here’s wishing everyone a bright future. embrace innovation and grow with it!

【AI Agent】Browser Use WebUI + Deepseek編

はじめに(English is below)

事前準備

WEB-UIを立ち上げ

Agent Settings

LLM configuration

Browser Settings

Run Agent

Results

Live Demo

最後

introduction

Initial Setup

Launching the Web-UI

Agent Settings

LLM Configuration

Browser Settings

Run Agent

Results

Live Demo

Final Thoughts

いいね:

途工街をもっと見る

コメントを残すコメントをキャンセル

こんにちは !

入会する

カテゴリー

タグ

最近の投稿

DIY Aerochrome Revival: A Zero-to-Hero Guide to Dreamy Magenta Forests(DIY 复刻柯达 Aerochrome：零基础教程，拍出梦幻洋红森林)

RAG(Retrieval-Augmented Generation)アーキテクチャ完全解説：基本から最新動向までを徹底解剖

Harman phoenix ii leak

【AI Agent】Browser Use WebUI + Deepseek編

はじめに(English is below)

事前準備

WEB-UIを立ち上げ

Agent Settings

LLM configuration

Browser Settings

Run Agent

Results

Live Demo

最後

introduction

Initial Setup

Launching the Web-UI

Agent Settings

LLM Configuration

Browser Settings

Run Agent

Results

Live Demo

Final Thoughts

共有:

いいね:

途工街をもっと見る

コメントを残すコメントをキャンセル

こんにちは !

入会する

カテゴリー

タグ

最近の投稿

DIY Aerochrome Revival: A Zero-to-Hero Guide to Dreamy Magenta Forests(DIY 复刻柯达 Aerochrome：零基础教程，拍出梦幻洋红森林)

RAG(Retrieval-Augmented Generation)アーキテクチャ完全解説：基本から最新動向までを徹底解剖

Harman phoenix ii leak