Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add format support for Ollama generators #1232

Open
marfago opened this issue Dec 9, 2024 · 2 comments · May be fixed by #1282
Open

Add format support for Ollama generators #1232

marfago opened this issue Dec 9, 2024 · 2 comments · May be fixed by #1282
Assignees
Labels
feature request Ideas to improve an integration P1

Comments

@marfago
Copy link

marfago commented Dec 9, 2024

Is your feature request related to a problem? Please describe.
Ollama has introduced structured output as reported here.
This feature now allows developers to define a json schema which represents the expected output format

Describe the solution you'd like
Adding the support should be straightforward since it just require to add and propagate a new parameter 'format' to the generator. It may look like:

generator = OllamaGenerator(
    model="llama3.2",
    "format":aPydanticModel.model_json_schema()
)

Describe alternatives you've considered
N/A

Additional context
The Ollama launch blog

@marfago marfago added the feature request Ideas to improve an integration label Dec 9, 2024
@julian-risch julian-risch added the P1 label Dec 9, 2024
@julian-risch
Copy link
Member

Related issue: deepset-ai/haystack#8276

@SotirisKot
Copy link

A simple hack for now is to override/extend the OllamaGenerator component. Example below:

# override the current OllamaGenerator to account for structured outputs
# since haystack does not currently support this
@component
class StructuredOllamaGenerator(OllamaGenerator):
    def __init__(self,
            model: str = "orca-mini",
            url: str = "http://localhost:11434",
            generation_kwargs: Optional[Dict[str, Any]] = None,
            system_prompt: Optional[str] = None,
            template: Optional[str] = None,
            raw: bool = False,
            timeout: int = 120,
            keep_alive: Optional[Union[float, str]] = None,
            streaming_callback: Optional[Callable[[StreamingChunk], None]] = None,
            format: Optional[Dict[str, Any]] = None
        ):
        super(StructuredOllamaGenerator, self).__init__(
            model=model,
            url=url,
            generation_kwargs=generation_kwargs,
            system_prompt=system_prompt,
            template=template,
            raw=raw,
            timeout=timeout,
            keep_alive=keep_alive,
            streaming_callback=streaming_callback
        )
        self.format = format
    
    @component.output_types(replies=List[str], meta=List[Dict[str, Any]])
    def run(
        self,
        prompt: str,
        generation_kwargs: Optional[Dict[str, Any]] = None,
    ):
        """
        Runs an Ollama Model on the given prompt.

        :param prompt:
            The prompt to generate a response for.
        :param generation_kwargs:
            Optional arguments to pass to the Ollama generation endpoint, such as temperature,
            top_p, and others. See the available arguments in
            [Ollama docs](https://github.com/jmorganca/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values).
        :returns: A dictionary with the following keys:
            - `replies`: The responses from the model
            - `meta`: The metadata collected during the run
        """
        generation_kwargs = {**self.generation_kwargs, **(generation_kwargs or {})}

        stream = self.streaming_callback is not None

        response = self._client.generate(
            model=self.model, 
            prompt=prompt, 
            stream=stream, 
            keep_alive=self.keep_alive, 
            options=generation_kwargs,
            format=self.format
        )

        if stream:
            chunks: List[StreamingChunk] = self._handle_streaming_response(response)
            return self._convert_to_streaming_response(chunks)

        return self._convert_to_response(response)

You must have the latest ollama version of course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Ideas to improve an integration P1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants