Sotopia API and UI (#264)

* feat: FastAPI Implementation of Sotopia Part Two (w websocket) (#252) * api doc * add PUT * add an temp example for websocket * websocket * update readme * Update README.md * update websocket live simulation api doc * [autofix.ci] apply automated fixes * update websocket doc * add api server with websocket as well as a client * fix mypy errors * support stopping the chat * add 404 to the status code * fix mypy issue * update the returned message types * redesign websocket api * update websocket, fix mypy error * add example of using websocket * clean code & change to existing functions for simulation * fix typing mismatch * update doc & mypy type fix * add type check for run_async_server * move example --------- Co-authored-by: Hao Zhu <[email protected]> Co-authored-by: Zhe Su <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Add customizable evaluation dimensions (#256) * add customizable evaluation dimensions * add docs * fix mypy error & refactor examples * add docs for evaluation dimensions * update docs and examples * add test cases and fix mypy issue * fix mypy issue * Fix test_create_custom_dimension to use CustomEvaluationDimension.get(pk) (#262) Co-authored-by: openhands <[email protected]> * Fix/custom eval dimension test (#263) * Fix test_create_custom_dimension to use CustomEvaluationDimension.get(pk) * Update documentation for SotopiaDimension and EvaluationDimensionBuilder * [autofix.ci] apply automated fixes * Add API documentation for evaluation dimensions * Refine API documentation for evaluation_dimensions.py to match style * [autofix.ci] apply automated fixes --------- Co-authored-by: openhands <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * add doc --------- Co-authored-by: XuhuiZhou <[email protected]> Co-authored-by: openhands <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Feat/addtional fast apis for non-streaming simulation and managing relationshio (#265) * temp run * add relationship api * fix mypy error * update relationship api * simulate episode non-streaming * modify sim episodes * add simulation status * task error * add background task * [autofix.ci] apply automated fixes * back to arun one episode * upload the code * use rq to execute background tasks * temp sol --------- Co-authored-by: Hao Zhu <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix ci error * solving pytests * improve the tests * add custom eval fast api (#268) * fix mypy error * aact moderator (#257) * initial framework * initial conv * fix module error * feat: Add 3 new features to Moderator (#266) * feat:introduce booting procedure, saving, and ending chat to moderator * fix: moderator will now ignore none AgentAction, Observations now don't have to include all channels in the mapping * merge changes of example into the original one * fix: 1. save() method now accepts push_to_db config 2. booting()'s waiting time is changed to 0.1 sec * fix: rewrite booting() so that different agent will receive different background information * fix: moderator now inherits from Node directly, instead of from BaseAgent --------- Co-authored-by: JXZhou <JXZhou> * add save condition for moderator * push to db false * to fully stop * stopping all agents * fix mypy * fix mypy error --------- Co-authored-by: JXZhou <[email protected]> * Deploy the api to modal (#267) * prototype for modal serving * add openai secret * fix type annotation * add doc * bug fix for simulation api * add customize model, evaluator model and evaluation dimensions * Implement modal API server with Redis integration and FastAPI setup - Added a new script for the modal API server that initializes a Redis instance. - Created a persistent volume for Redis data and included a function to download initial data if not present. - Configured a Docker image with necessary dependencies including Redis Stack and FastAPI. - Implemented a web API class that sets up and cleans up the Redis connection, ensuring readiness before serving requests. - Integrated the SotopiaFastAPI application within the modal framework. --------- Co-authored-by: XuhuiZhou <[email protected]> * Feature/sotopia demo UI (#261) * initial * initial ui * merge main * add new ui * switch to fastAPI * websocket check * fix render episode error * add page; make a simplified page and still WIP * [autofix.ci] apply automated fixes * fix simplified streaming version * semi-done character page + avatar assets * Fixed character card styling * [autofix.ci] apply automated fixes * unified rendering and chat display * updated chat character icons * add some tags * add typing * temp fix * add characters avatar to simulation * fix episode full avatar * go to modal config * clean up code * add modal streamlit app * clean codebase except websocket * remove repeated local css * clean websocket * fix get name error * fix errors * pre render scenario * add custom eval * change streamlit to dynamic path * new uv * revert to previous install commands * a fix for modal * add customized dimension * [autofix.ci] apply automated fixes * sort scenarios in simulation * for demo video * update deploy instruction * update intro page * update intro page * [autofix.ci] apply automated fixes * update intro page * add customized dimensions * update api link and modal environment * move folder * fix relative import * update modal image build * use uv to build environment * change folder name * change test * fix modal serve * environment change * refactor * fix ui --------- Co-authored-by: Zhe Su <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: astrophie <[email protected]> * remove dev tag * add custom eval * base dimension * fix ui mypy * fix mypy * add delete dimension * update streamlit ui * ignores the ui directory * Committing changes before push * pytest for eval dimension * fix mypy * clean up comments * run non streaming simulation * add pytest for websocket * fix mypy issue --------- Co-authored-by: Hao Zhu <[email protected]> Co-authored-by: Zhe Su <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: openhands <[email protected]> Co-authored-by: JXZhou <[email protected]> Co-authored-by: astrophie <[email protected]>
sotopia-lab · Jan 8, 2025 · d7724db · d7724db
1 parent 61f190e
commit d7724db
Show file tree

Hide file tree

Showing 59 changed files with 5,844 additions and 1,171 deletions.
diff --git a/.github/.codecov.yml b/.github/.codecov.yml
@@ -6,6 +6,7 @@ ignore:
   - ".github" # ignore the .github directory
   - "docs" # ignore the tests directory
   - "figs" # ignore the figs directory
+  - "ui" # ignore the ui directory
 
 coverage:
   status:

diff --git a/docs/pages/concepts/evaluation_dimension.md b/docs/pages/concepts/evaluation_dimension.md
@@ -0,0 +1,116 @@
+## Overview
+
+Evaluation dimensions are used to evaluate the quality of social interactions.
+In original Sotopia paper, there are 7 dimensions to evaluate the quality of social interactions, where we named them as `sotopia` evaluation dimensions:
+- believability
+- relationship
+- knowledge
+- secret
+- social rules
+- financial and material benefits
+- goal
+
+The `SotopiaDimensions` can be used directly without initializing the database. It provides a set of predefined evaluation dimensions that are ready to use for evaluating social interactions. For example,
+
+```python
+from sotopia.envs.parallel import ParallelSotopiaEnv
+from sotopia.envs.evaluators import EvaluationForTwoAgents, ReachGoalLLMEvaluator, RuleBasedTerminatedEvaluator, SotopiaDimensions
+
+env = ParallelSotopiaEnv(
+    env_profile=env_profile,
+        model_name=model_names["env"],
+        action_order="round-robin",
+        evaluators=[
+            RuleBasedTerminatedEvaluator(max_turn_number=20, max_stale_turn=2),
+        ],
+        terminal_evaluators=[
+            ReachGoalLLMEvaluator(
+                model_names["env"],
+                EvaluationForTwoAgents[SotopiaDimensions],  # type: ignore
+                # TODO check how to do type annotation
+            ),
+        ],
+    )
+```
+
+
+However we observe under many use cases people may want to evaluate with customized evaluation metrics, so we provide a way to build custom evaluation dimensions.
+For a quick reference, you can directly check out the `examples/use_custom_dimensions.py`.
+
+### CustomEvaluationDimension
+The [`CustomEvaluationDimension`](/python_API/database/evaluation_dimensions) is a class that can be used to create a custom evaluation dimension.
+There are four parameters:
+- name: the name of the dimension
+- description: the description of the dimension
+- range_low: the minimum score of the dimension (should be an integer)
+- range_high: the maximum score of the dimension (should be an integer)
+
+### CustomEvaluationDimensionList
+The [`CustomEvaluationDimensionList`](/python_API/database/evaluation_dimensions) is a class that can be used to create a custom evaluation dimension list based on the existing dimensions. It helps one to group multiple dimensions together for a specific use case.
+There are two parameters:
+- name: the name of the dimension list
+- dimension_pks: the primary keys of the dimensions in the dimension list
+
+### EvaluationDimensionBuilder
+The [`EvaluationDimensionBuilder`](/python_API/database/evaluation_dimensions) is a class that can be used to generate a custom evaluation dimension model based on the existing dimensions.
+
+
+## Usage
+### Initialize the database
+The default evaluation metric is still `SotopiaDimensions` in `sotopia.env.evaluators`.There is no `CustomEvaluationDimension` in the database by default. To initialize the database, please refer to `examples/use_custom_dimensions.py`.
+
+
+### Use the custom evaluation dimensions
+After you initialize your customized evaluation dimensions, you can choose to use any one of these methods provided below:
+
+#### Method 1: Choose dimensions by names
+```python
+evaluation_dimensions = (
+    EvaluationDimensionBuilder.select_existing_dimension_model_by_name(
+        ["transactivity", "verbal_equity"]
+    )
+)
+```
+
+#### Method 2: Directly choose the grouped evaluation dimension list
+```python
+evaluation_dimensions = (
+    EvaluationDimensionBuilder.select_existing_dimension_model_by_list_name(
+        "sotopia"
+    )
+)
+```
+
+#### Method 3: Build a custom evaluation dimension model temporarily
+We provide multiple ways to build a custom evaluation dimension model with `EvaluationDimensionBuilder`, specifically:
+- `generate_dimension_model`: build an evaluation dimension from existing dimension primary keys.
+- `generate_dimension_model_from_dict`: build an evaluation dimension from a dictionary that specifies the parameters of the `CustomEvaluationDimension`. For example
+```json
+[
+    {
+        "name": "believability",
+        "description": "The believability of the interaction",
+        "range_low": 0,
+        "range_high": 10
+    },
+    ...
+]
+```
+- `select_existing_dimension_model_by_name`: build an evaluation dimension from existing dimension names. For example `['believability', 'goal']`
+- `select_existing_dimension_model_by_list_name`: build an evaluation dimension from existing `CustomEvaluationDimensionList` list names. For example, directly use `sotopia`.
+
+
+After you get the evaluation dimension model, you can pass it as a parameter for the `Evaluator`, for example,
+```python
+evaluation_dimensions = (
+    EvaluationDimensionBuilder.select_existing_dimension_model_by_list_name(
+        "sotopia"
+    )
+)
+terminal_evaluators=[
+    ReachGoalLLMEvaluator(
+        model_names["env"],
+        EvaluationForTwoAgents[evaluation_dimensions],  # type: ignore
+    ),
+],
+```
diff --git a/docs/pages/contribution/contribution.md b/docs/pages/contribution/contribution.md
@@ -133,7 +133,7 @@ Please refer to [Dev Containers](https://containers.dev/supporting#editors) to s
 
 You can also set up the development environment without Dev Containers. There are three things you will need to set up manually:
 
-- Python and uv: Please start from an environment supporting Python 3.10+ and install uv using `pip install uv; uv sync --all-extra`.
+- Python and uv: Please start from an environment supporting Python 3.10+ and install uv using `pip install uv; uv sync --all-extras`. (Note that this will install all the extra dependencies)
 - Redis: Please refer to introduction page for the set up of Redis.
 - Local LLM (optional): If you don't have access to model endpoints (e.g. OpenAI, Anthropic or others), you can use a local model. You can use Ollama, Llama.cpp,  vLLM or many others which support OpenAI compatible endpoints.
 

diff --git a/docs/pages/examples/deployment.md b/docs/pages/examples/deployment.md
@@ -0,0 +1,6 @@
+# Deploy Sotopia Python API to Modal
+We offer a script to deploy Sotopia Python API to [Modal](https://modal.com/).
+To do so, simply go to the `sotopia/sotopia/ui` directory and run the following command:
+```bash
+modal deploy sotopia/ui/modal_api_server.py
+```
diff --git a/docs/pages/python_API/database/evaluation_dimensions.md b/docs/pages/python_API/database/evaluation_dimensions.md
@@ -0,0 +1,54 @@
+# `evaluation_dimensions.py`
+
+This module provides classes and utilities for defining and managing custom evaluation dimensions within the Sotopia environment. It includes classes for individual dimensions, lists of dimensions, and a builder for creating dimension models.
+
+## Classes
+
+### `CustomEvaluationDimension`
+
+Represents a custom evaluation dimension with specific attributes such as name, description, and score range.
+
+#### Attributes
+- `name`: `str`. The name of the dimension.
+- `description`: `str`. A brief description of the dimension.
+- `range_low`: `int`. The minimum score for the dimension.
+- `range_high`: `int`. The maximum score for the dimension.
+
+### `CustomEvaluationDimensionList`
+
+Groups multiple custom evaluation dimensions together.
+
+#### Attributes
+- `name`: `str`. The name of the dimension list.
+- `dimension_pks`: `list[str]`. A list of primary keys for the dimensions included in the list.
+
+### `EvaluationDimensionBuilder`
+
+Provides utility methods to create and manage evaluation dimension models.
+
+#### Methods
+- `create_range_validator(low: int, high: int)`: Creates a validator for score ranges.
+
+  **Arguments:**
+  - `low`: `int`. The minimum score allowed.
+  - `high`: `int`. The maximum score allowed.
+
+- `build_dimension_model(dimension_ids: list[str])`: Builds a dimension model from primary keys.
+
+  **Arguments:**
+  - `dimension_ids`: `list[str]`. A list of dimension primary keys.
+
+- `build_dimension_model_from_dict(dimensions: list[dict[str, Union[str, int]]])`: Builds a dimension model from a dictionary.
+
+  **Arguments:**
+  - `dimensions`: `list[dict[str, Union[str, int]]]`. A list of dictionaries specifying dimension attributes.
+
+- `select_existing_dimension_model_by_name(dimension_names: list[str])`: Selects a dimension model by dimension names.
+
+  **Arguments:**
+  - `dimension_names`: `list[str]`. A list of dimension names.
+
+- `select_existing_dimension_model_by_list_name(list_name: str)`: Selects a dimension model by list name.
+
+  **Arguments:**
+  - `list_name`: `str`. The name of the dimension list.
diff --git a/examples/experiment_eval.py b/examples/experiment_eval.py
@@ -17,6 +17,7 @@
     EnvAgentComboStorage,
     EnvironmentProfile,
     EpisodeLog,
+    EvaluationDimensionBuilder,
 )
 from sotopia.envs.evaluators import (
     EvaluationForTwoAgents,
@@ -34,6 +35,7 @@
 )
 from sotopia.server import run_async_server
 from sotopia_conf.gin_utils import parse_gin_flags, run
+# from sotopia.database import EvaluationDimensionBuilder
 
 _DEFAULT_GIN_SEARCH_PATHS = [
     os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
@@ -109,6 +111,18 @@ def _iterate_env_agent_combo_not_in_db(
     tag: str | None = None,
 ) -> Generator[EnvAgentCombo[Observation, AgentAction], None, None]:
     """We iterate over each environment and return the **first** env-agent combo that is not in the database."""
+    # loading evaluation metric
+    try:
+        evaluation_dimensions = EvaluationDimensionBuilder.select_existing_dimension_model_by_list_name(
+            "sotopia"
+        )  # Initialize your customized dimension, please refer to `examples/use_custom_dimensions.py`
+    except Exception as e:
+        print(
+            "No customized evaluation dimensions found, using default SotopiaDimensions",
+            e,
+        )
+        evaluation_dimensions = SotopiaDimensions
+
     if not env_ids:
         env_ids = list(EnvironmentProfile.all_pks())
     for env_id in env_ids:
@@ -152,7 +166,8 @@ def _iterate_env_agent_combo_not_in_db(
                 terminal_evaluators=[
                     ReachGoalLLMEvaluator(
                         model_names["env"],
-                        EvaluationForTwoAgents[SotopiaDimensions],
+                        EvaluationForTwoAgents[evaluation_dimensions],  # type: ignore
+                        # TODO check how to do type annotation
                     ),
                 ],
             )

diff --git a/examples/experimental/nodes/initial_message_node.py b/examples/experimental/nodes/initial_message_node.py
@@ -18,6 +18,7 @@ def __init__(
         input_tick_channel: str,
         output_channels: list[str],
         env_scenario: str,
+        node_name: str,
         redis_url: str = "redis://localhost:6379/0",
     ):
         super().__init__(
@@ -26,6 +27,7 @@ def __init__(
                 (output_channel, Text) for output_channel in output_channels
             ],
             redis_url=redis_url,
+            node_name=node_name,
         )
         self.env_scenario = env_scenario
         self.output_channels = output_channels

diff --git a/examples/experimental/sotopia_original_replica/llm_agent_sotopia.py b/examples/experimental/sotopia_original_replica/llm_agent_sotopia.py
@@ -0,0 +1,113 @@
+import logging
+import sys
+from rich.logging import RichHandler
+
+from aact import NodeFactory
+
+from sotopia.experimental.agents.base_agent import BaseAgent
+from sotopia.experimental.agents.datamodels import Observation, AgentAction
+
+from sotopia.generation_utils import agenerate
+from sotopia.generation_utils.generate import StrOutputParser
+
+# Check Python version
+if sys.version_info >= (3, 11):
+    pass
+else:
+    pass
+
+# Configure logging
+FORMAT = "%(asctime)s - %(levelname)s - %(name)s - %(message)s"
+logging.basicConfig(
+    level=logging.WARNING,
+    format=FORMAT,
+    datefmt="[%X]",
+    handlers=[RichHandler()],
+)
+
+
+@NodeFactory.register("llm_agent")
+class LLMAgent(BaseAgent[Observation, AgentAction]):
+    def __init__(
+        self,
+        input_channels: list[str],
+        output_channel: str,
+        query_interval: int,
+        agent_name: str,
+        node_name: str,
+        goal: str,
+        model_name: str,
+        redis_url: str,
+    ):
+        super().__init__(
+            [(input_channel, Observation) for input_channel in input_channels],
+            [(output_channel, AgentAction)],
+            redis_url,
+            node_name,
+        )
+        self.output_channel = output_channel
+        self.query_interval = query_interval
+        self.count_ticks = 0
+        self.message_history: list[Observation] = []
+        self.name = agent_name
+        self.model_name = model_name
+        self.goal = goal
+
+    def _format_message_history(self, message_history: list[Observation]) -> str:
+        ## TODO: akhatua Fix the mapping of action to be gramatically correct
+        return "\n".join(message.to_natural_language() for message in message_history)
+
+    async def aact(self, obs: Observation) -> AgentAction:
+        if obs.turn_number == -1:
+            return AgentAction(
+                agent_name=self.name,
+                output_channel=self.output_channel,
+                action_type="none",
+                argument=self.model_name,
+            )
+
+        self.message_history.append(obs)
+
+        if len(obs.available_actions) == 1 and "none" in obs.available_actions:
+            return AgentAction(
+                agent_name=self.name,
+                output_channel=self.output_channel,
+                action_type="none",
+                argument="",
+            )
+        elif len(obs.available_actions) == 1 and "leave" in obs.available_actions:
+            self.shutdown_event.set()
+            return AgentAction(
+                agent_name=self.name,
+                output_channel=self.output_channel,
+                action_type="leave",
+                argument="",
+            )
+        else:
+            history = self._format_message_history(self.message_history)
+            action: str = await agenerate(
+                model_name=self.model_name,
+                template="Imagine that you are a friend of the other persons. Here is the "
+                "conversation between you and them.\n"
+                "You are {agent_name} in the conversation.\n"
+                "{message_history}\n"
+                "and you plan to {goal}.\n"
+                "You can choose to interrupt the other person "
+                "by saying something or not to interrupt by outputting notiong. What would you say? "
+                "Please only output a sentence or not outputting anything."
+                "{format_instructions}",
+                input_values={
+                    "message_history": history,
+                    "goal": self.goal,
+                    "agent_name": self.name,
+                },
+                temperature=0.7,
+                output_parser=StrOutputParser(),
+            )
+
+            return AgentAction(
+                agent_name=self.name,
+                output_channel=self.output_channel,
+                action_type="speak",
+                argument=action,
+            )