UKGovernmentBEIS · jjallaire · Jan 16, 2025 · Jan 3, 2025 · Jan 9, 2025 · Jan 8, 2025
diff --git a/docs/approval.qmd b/docs/approval.qmd
@@ -33,20 +33,39 @@ You can chain to together the `human` and `auto` approvers in an *approval polic
 ``` yaml
 approvers:
   - name: human
-    tools: ["web_browser_click", "web_browser_type*"]
+    tools: ["web_browser_click", "web_browser_type"]
 
   - name: auto
     tools: "*"
 ```
 
-Navigational web browser tool calls (e.g. `web_browser_go`) are approved automatically via the catch-all `auto` approver at the end of the chain. Note that when listing an approver in a policy you indicate which tools it should handle using a glob or list of globs.
+
+Navigational web browser tool calls (e.g. `web_browser_go`) are approved automatically via the catch-all `auto` approver at the end of the chain. Note that when listing an approver in a policy you indicate which tools it should handle using a glob or list of globs. These globs are prefix matched so the `web_browser_type` glob matches both `web_browser_type` and `web_browser_type_submit`.
 
 To use this policy, pass the path to the policy YAML file as the approver. For example:
 
 ``` bash
 inspect eval browser.py --approval approval.yaml
 ```
 
+You can also match on tool arguments (for tools that dispatch many action types). For example, here is an approval policy for the [Computer Tool](tools.qmd#sec-computer) which allows typing and mouse movement but requires approval for key combos (e.g. Enter or a shortcut) and typing:
+
+
+```{.yaml filename="approval.yaml"}
+approvers:
+  - name: human
+    tools:
+      - computer(action='key'
+      - computer(action='left_click'
+      - computer(action='middle_click'
+      - computer(action='double_click'
+
+  - name: auto
+    tools: "*"
+```
+
+Note that since this is a prefix match and there could be other arguments, we don't end the tool match pattern with a parentheses.
+
 ## Approvers in Code
 
 We've demonstrated configuring approvers via a YAML approval policy file—you can also provide a policy directly in code (useful if it needs to be more dynamic). Here's a pure Python version of the example from the previous section:
@@ -152,7 +171,7 @@ Assuming we have properly [registered our approver](extensions.qmd#sec-extension
 ``` yaml
 approvers:
   - name: evaltools/bash_allowlist
-    tools: "*bash*"
+    tools: "bash"
     allowed_commands: ["ls", "echo", "cat"]
 
   - name: human

diff --git a/docs/images/vnc-port-info.png b/docs/images/vnc-port-info.png
diff --git a/docs/images/vnc-view-only.png b/docs/images/vnc-view-only.png
diff --git a/docs/tools.qmd b/docs/tools.qmd
@@ -6,7 +6,7 @@ title: Tools
 
 Many models now have the ability to interact with client-side Python functions in order to expand their capabilities. This enables you to equip models with your own set of custom tools so they can perform a wider variety of tasks.
 
-Inspect natively supports registering Python functions as tools and providing these tools to models that support them (currently OpenAI, Claude 3, Google Gemini, and Mistral). Inspect also includes several built-in tools ([bash](#sec-bash-and-python), [python](#sec-bash-and-python), and [web_search](#sec-web-search)).
+Inspect natively supports registering Python functions as tools and providing these tools to models that support them (currently OpenAI, Claude 3, Google Gemini, and Mistral). Inspect also includes several built-in tools ([bash](#sec-bash-and-python), [python](#sec-bash-and-python), [computer](#sec-computer), [web browser](#sec-web-browser), and [web_search](#sec-web-search)).
 
 ::: callout-note
 ### Tools and Agents
@@ -22,6 +22,8 @@ Inspect has several built-in tools, including:
 
 -   [Web Browser](#sec-web-browser), which provides the model with a headless Chromium web browser that supports navigation, history, and mouse/keyboard interactions.
 
+-   [Computer](#sec-computer), which provides the model with a desktop computer (viewed through screenshots) that supports mouse and keyboard interaction.
+
 -   [Web Search](#sec-web-search), which uses the Google Search API to execute and summarise web searches.
 
 If you are only interested in using the built-in tools, check out their respective documentation links above. To learn more about creating your own tools read on immediately below.
@@ -371,16 +373,16 @@ Note that unlike some other tool functions like `bash()`, the `web_browser()` fu
 
 If you review the transcripts of a sample with access to the web browser tool, you'll notice that there are several distinct tools made available for control of the web browser. These tools include:
 
-| Tool | Description |
+| Tool                                        | Description                                                                           |
 |------------------------------------|------------------------------------|
-| `web_browser_go(url)` | Navigate the web browser to a URL. |
-| `web_browser_click(element_id)` | Click an element on the page currently displayed by the web browser. |
-| `web_browser_type(element_id)` | Type text into an input on a web browser page. |
+| `web_browser_go(url)`                       | Navigate the web browser to a URL.                                                    |
+| `web_browser_click(element_id)`             | Click an element on the page currently displayed by the web browser.                  |
+| `web_browser_type(element_id)`              | Type text into an input on a web browser page.                                        |
 | `web_browser_type_submit(element_id, text)` | Type text into a form input on a web browser page and press ENTER to submit the form. |
-| `web_browser_scroll(direction)` | Scroll the web browser up or down by one page. |
-| `web_browser_forward()` | Navigate the web browser forward in the browser history. |
-| `web_browser_back()` | Navigate the web browser back in the browser history. |
-| `web_browser_refresh()` | Refresh the current page of the web browser. |
+| `web_browser_scroll(direction)`             | Scroll the web browser up or down by one page.                                        |
+| `web_browser_forward()`                     | Navigate the web browser forward in the browser history.                              |
+| `web_browser_back()`                        | Navigate the web browser back in the browser history.                                 |
+| `web_browser_refresh()`                     | Refresh the current page of the web browser.                                          |
 
 : {tbl-colwidths=\[35,65\]}
 
@@ -420,6 +422,162 @@ CMD ["python3", "/app/web_browser/web_server.py"]
 
 Note that all of the Python files in the [\_resources](https://github.com/UKGovernmentBEIS/inspect_ai/blob/main/src/inspect_ai/tool/_tools/_web_browser/_resources/) directory alongside the `Dockerfile` need to be available for copying when building the container.
 
+## Computer (Beta) {#sec-computer}
+
+::: {.callout-note appearance="simple"}
+The beta version of the computer tool described below is currently available only in the development version of Inspect. To install the development version:
+
+``` bash
+pip install git+https://github.com/UKGovernmentBEIS/inspect_ai
+```
+:::
+
+The `computer()` tool provides models with a computer desktop environment along with the ability to view the screen and perform mouse and keyboard gestures. The computer tool is based on the Anthropic [Computer Use Beta](https://docs.anthropic.com/en/docs/build-with-claude/computer-use) reference implementation and works with any model that supports image input. 
+
+ The current release of the computer tool is a beta version (exported from the `inspect_ai.tool.beta` module). We expect to finalise the interface and move it into the main `inspect_ai.tool` module over the next several weeks.
+
+### Configuration
+
+The `computer()` tool runs within a Docker container. To use it with a task you need to reference the `inspect-computer-tool-beta` image in your Docker compose file. For example:
+
+``` {.yaml filename="compose.yaml"}
+services:
+  default:
+    image: inspect-computer-tool-beta
+```
+
+You can configure the container to not have Internet access as follows:
+
+``` {.yaml filename="compose.yaml"}
+services:
+  default:
+    image: inspect-computer-tool-beta
+    network_mode: none
+```
+
+Note that if you'd like to be able to view the model's interactions with the computer desktop in realtime, you will need to also do some port mapping to enable a VNC connection with the container. See the [VNC Client](#vnc-client) section below for details on how to do this.
+
+The `inspect-computer-tool-beta` image is based on the [ubuntu:22.04](https://hub.docker.com/layers/library/ubuntu/22.04/images/sha256-965fbcae990b0467ed5657caceaec165018ef44a4d2d46c7cdea80a9dff0d1ea?context=explore) image and includes the following additional applications pre-installed:
+
+- Firefox
+- VS Code
+- Xpdf
+- Xpaint
+- galculator
+
+We'll be refining this list as well as publishing more information on creating custom containers for use with the computer tool soon.
+
+### Task Setup
+
+A task configured to use the computer tool might look like this:
+
+``` python
+from inspect_ai import Task, task
+from inspect_ai.scorer import match
+from inspect_ai.solver import generate, use_tools
+from inspect_ai.tool.beta import computer
+
+@task
+def computer_task():
+    return Task(
+        dataset=read_dataset(),
+        solver=[
+            use_tools([computer()]),
+            generate(),
+        ],
+        scorer=match(),
+        sandbox=("docker", "compose.yaml"),
+    )
+```
+
+Two of the Inspect examples demonstrate basic computer use:
+
+-   [computer](https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/examples/computer/computer.py) — Three simple computing tasks as a minimal demonstration of computer use.
+
+    ``` bash
+    inspect eval examples/computer
+    ```
+
+-   [intervention](https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/examples/intervention/intervention.py) — Computer task driven interactively by a human operator.
+
+    ``` bash
+    inspect eval examples/intervention -T mode=computer --display conversation
+    ```
+
+### VNC Client {#vnc-client}
+
+You can use a [VNC](https://en.wikipedia.org/wiki/VNC) connection to the container to watch computer use in real-time. This requires some additional port-mapping in the Docker compose file. You can define dynamic port ranges for VNC (5900) and a browser based noVNC client (6080) with the following `ports` entries:
+
+``` {.yaml filename="compose.yaml"}
+services:
+  default:
+    image: inspect-computer-tool-beta
+    ports:
+      - "5900"
+      - "6080"
+```
+
+To connect to the container for a given sample, locate the sample in the **Running Samples** UI and expand the sample info panel at the top:
+
+![](images/vnc-port-info.png){width=958 .lightbox}
+
+Click on the link for the noVNC browser client, or use a native VNC client to connect to the VNC port. Note that the VNC server will take a few seconds to start up so you should give it some time and attempt to reconnect as required if the first connection fails.
+
+The browser based client provides a view-only interface. If you use a native VNC client you should also set it to "view only" so as to not interfere with the model's use of the computer. For example, for Real VNC Viewer:
+
+![](images/vnc-view-only.png){width="549"}
+
+### Approval
+
+If the container you are using is connected to the Internet, you may want to configure human approval for a subset of computer tool actions. Here are the possible actions (specified using the `action` parameter to the `computer` tool):
+
+- `key`: Press a key or key-combination on the keyboard.
+- `type`: Type a string of text on the keyboard.
+- `cursor_position`: Get the current (x, y) pixel coordinate of the cursor on the screen.
+- `mouse_move`: Move the cursor to a specified (x, y) pixel coordinate on the screen.
+- Example: execute(action="mouse_move", coordinate=(100, 200))
+- `left_click`: Click the left mouse button.
+- `left_click_drag`: Click and drag the cursor to a specified (x, y) pixel coordinate on the screen.
+- `right_click`: Click the right mouse button.
+- `middle_click`: Click the middle mouse button.
+- `double_click`: Double-click the left mouse button.
+- `screenshot`: Take a screenshot.
+
+
+Here is an approval policy that requires approval for key combos (e.g. `Enter` or a shortcut) and mouse clicks:
+
+```{.yaml filename="approval.yaml"}
+approvers:
+  - name: human
+    tools:
+      - computer(action='key'
+      - computer(action='left_click'
+      - computer(action='middle_click'
+      - computer(action='double_click'
+
+  - name: auto
+    tools: "*"
+```
+
+Note that since this is a prefix match and there could be other arguments, we don't end the tool match pattern with a parentheses.
+
+You can apply this policy using the `--approval` commmand line option:
+
+```bash
+inspect eval computer.py --approval approval.yaml
+```
+
+### Tool Binding
+
+The computer tool's schema is based on the standard Anthropoic [computer tool-type](https://docs.anthropic.com/en/docs/build-with-claude/computer-use#computer-tool). When using Claude 3.5 the coputer tool will automatically bind to the native Claude computer tool definition. This presumably provides improved performance due to fine tuning on the use of the tool but we have not verified this.
+
+If you want to experiement with bypassing the native Claude computer tool type and just register the computer tool as a normal function based tool then specify the `--no-internal-tools` generation option as follows:
+
+```bash
+inspect eval computer.py --no-internal-tools
+```
+
+
 ## Web Search {#sec-web-search}
 
 The `web_search()` tool provides models the ability to enhance their context window by performing a search. By default web searches retrieve 10 results from a provider, uses a model to determine if the contents is relevant then returns the top 3 relevant search results to the main model. Here is the definition of the `web_search()` function:
@@ -465,3 +623,4 @@ The `web_search()` tool uses [Google Programmable Search Engine](https://program
 -   `GOOGLE_CSE_ID` — Google Custom Search Engine ID
 
 -   `GOOGLE_CSE_API_KEY` — Google API key used to enable the Search API
+
diff --git a/examples/computer/compose.yaml b/examples/computer/compose.yaml
@@ -0,0 +1,24 @@
+services:
+  default:
+    # Temporary internal image until the official one is available
+    image: inspect-computer-tool-beta
+    init: true
+
+    # If you only launch a single container, you can vnc into that container by using
+    # the following port mapping
+    # ports:
+    #   - "5900:5900"
+    #   - "6080:6080"
+
+    # If you launch multiple containers, you can vnc into each container by using the
+    # following port mapping which will dynamically bind to host ports. The specific
+    # bindings can be found by using `docker inspect <container_id_or_name>`. This
+    # info is included in Running Samples tab. The output will look something like:
+    #
+    #  service                  container port            host port                 url
+    #  VNC                      5900                      61029                     vnc://localhost:61029
+    #  noVNC                    6080                      61030                     http://localhost:61030?view_only=true&autoconnect=true
+
+    ports:
+      - "5900"
+      - "6080"
diff --git a/examples/computer/computer.py b/examples/computer/computer.py
@@ -0,0 +1,50 @@
+from inspect_ai import Task, task
+from inspect_ai.dataset import Sample
+from inspect_ai.scorer import includes
+from inspect_ai.solver import system_message
+from inspect_ai.solver._basic_agent import basic_agent
+from inspect_ai.tool.beta import computer
+
+SYSTEM_MESSAGE = """
+Before each step, please clearly explain your intent for performing a tool action: "I expect tool command X to ...".
+
+After each step, carefully evaluate the resulting screenshot to see if the command achieved the right outcome.
+
+Explicitly show your thinking: "I have evaluated step X..." If not correct, try again. Only when
+you confirm a step was executed correctly should you move on to the next one.
+
+Note that launching applications from the bottom task bar requires a single left click.
+"""
+
+
+@task
+def computer_task():
+    return Task(
+        dataset=[
+            Sample(
+                input="Use the computer tool to find the contents of the file '/tmp/flag.txt'. When you've found it, report its contents.",
+                target="Frunobulax",
+                files={"/tmp/flag.txt": "Frunobulax"},
+            ),
+            Sample(
+                input="Use the computer tool and open the spreadsheet '/tmp/moonWeight.ods`. Enter your weight as 156 lbs. Report the result from cell B2. Important: If the 'Tip of the Day' dialog appears, you'll need to close it before proceeding. Important: You may need to install a spreadsheet using a command like 'sudo apt-get install -y libreoffice'.",
+                target="26",
+                files={"/tmp/moonWeight.ods": "moonWeight.ods"},
+            ),
+            Sample(
+                input="Use the computer tool to launch a terminal. Type 'Trudging across the tundra. Mile after Mile.' into the terminal. Important: Make sure that the terminal window is active before typing. When you are done, please use the submit tool to record the result of hitting enter in the terminal after entering that text.",
+                target="bash: Trudging: command not found",
+            ),
+            Sample(
+                input="Use the computer tool to launch a calculator. Calculate 123 x 456. Report the result.",
+                target="56088",
+            ),
+        ],
+        solver=basic_agent(
+            init=system_message(SYSTEM_MESSAGE),
+            tools=[computer()],
+            max_messages=100,
+        ),
+        scorer=includes(),
+        sandbox="docker",
+    )
diff --git a/examples/computer/moonWeight.ods b/examples/computer/moonWeight.ods