feat: anthropic livebook

thmsmlr · Mar 25, 2024 · 2ce3f3e · 2ce3f3e
1 parent 4ef6dd8
commit 2ce3f3e
Showing 1 changed file with 134 additions and 0 deletions.
diff --git a/pages/llm-providers/anthropic.livemd b/pages/llm-providers/anthropic.livemd
@@ -0,0 +1,134 @@
+<!-- livebook:{"persist_outputs":true} -->
+
+# Text Classification - Anthropic
+
+```elixir
+Mix.install(
+  [
+    {:instructor,
+     path:
+       Path.expand("/Users/abhishek/Downloads/perps/experiments/open_source_elixir/instructor_ex")}
+  ],
+  config: [
+    instructor: [
+      adapter: Instructor.Adapters.Anthropic,
+      anthropic: [api_key: System.fetch_env!("LB_ANTHROPIC_API_KEY")]
+    ]
+  ]
+)
+```
+
+## Motivation
+
+Text classification is a common task in NLP and broadly applicable across software. Whether it be spam detection, or support ticket categorization, NLP is at the core. Historically, this required training custom, bespoke models that required collecting thousands of pre-labeled examples. With LLMs a lot of this knowledge is already encoded into the model. With proper instruction and guiding the output to a known set of classifications using GPT you can be up and running with a text classification model in no time.
+
+Hell, you can even use instructor to help generate the training set to train your own more efficient model. But let's not get ahead of ourselves, there's more on that later in the tutorials.
+
+## Binary Text Classification
+
+Spam detection is a classic example of binary text classification. It's as simple as returning a true / false of whether an example is in the class. This is pretty trivial to implement in instructor.
+
+````elixir
+defmodule SpamPrediction do
+  use Ecto.Schema
+  use Instructor.Validator
+
+  @doc """
+  ## Field Descriptions:
+  - class: Whether or not the email is spam.
+  - reason: A short, less than 10 word rationalization for the classification.
+  - score: A confidence score between 0.0 and 1.0 for the classification.
+  """
+  @primary_key false
+  embedded_schema do
+    field(:class, Ecto.Enum, values: [:spam, :not_spam])
+    field(:reason, :string)
+    field(:score, :float)
+  end
+
+  @impl true
+  def validate_changeset(changeset) do
+    changeset
+    |> Ecto.Changeset.validate_number(:score,
+      greater_than_or_equal_to: 0.0,
+      less_than_or_equal_to: 1.0
+    )
+  end
+end
+
+is_spam? = fn text ->
+  Instructor.chat_completion(
+    model: "claude-3-haiku-20240307",
+    response_model: SpamPrediction,
+    max_retries: 3,
+    mode: :md_json,
+    messages: [
+      %{
+        role: "user",
+        content: """
+        Your purpose is to classify customer support emails as either spam or not.
+        This is for a clothing retail business.
+        They sell all types of clothing.
+
+        Classify the following email: 
+        ```
+        #{text}
+        ```
+        """
+      }
+    ]
+  )
+end
+
+is_spam?.("Hello I am a Nigerian prince and I would like to send you money")
+````
+
+We don't have to stop just at a boolean inclusion, we can also easily extend this idea to multiple categories or classes that we can classify the text into. In this example, let's consider classifying support emails. We want to know whether it's a `general_inquiry`, `billing_issue`, or a `technical_issue` perhaps it rightly fits in multiple classes. This can be useful if we want to cc' specialized support agents when intersecting customer issues occur
+
+We can leverage `Ecto.Enum` to define a schema that restricts the LLM output to be a list of those values. We can also provide a `@doc` description to help guide the LLM with the semantic understanding of what these classifications ought to represent.
+
+```elixir
+defmodule EmailClassifications do
+  use Ecto.Schema
+
+  @doc """
+  A classification of a customer support email.
+
+  technical_issue - whether the user is having trouble accessing their account.
+  billing_issue - whether the customer is having trouble managing their billing or credit card
+  general_inquiry - all other issues
+  """
+  @primary_key false
+  embedded_schema do
+    field(:tags, {:array, Ecto.Enum},
+      values: [:general_inquiry, :billing_issue, :technical_issue]
+    )
+  end
+end
+
+classify_email = fn text ->
+  {:ok, %{tags: result}} =
+    Instructor.chat_completion(
+      model: "gpt-3.5-turbo",
+      response_model: EmailClassifications,
+      messages: [
+        %{
+          role: "user",
+          content: "Classify the following text: #{text}"
+        }
+      ]
+    )
+
+  result
+end
+
+classify_email.("My account is locked and I can't access my billing info.")
+```
+
+<!-- livebook:{"output":true} -->
+
+```
+[:technical_issue, :billing_issue]
+```
+
+<!-- livebook:{"offset":4335,"stamp":{"token":"XCP.z64pnpwAt12QVZZsOwZ6F4ZHjZz_AQ4EOrHm2Oty-LTs40gELuOk8NpLoz4A0d8TU_d_JsWFYjmtedcbLWMIT5fQxG9kUhv7g339Y6UB8ejsS0VXBeBShcuthFzN","version":2}} -->