Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LlamaCppGenerator randomness not working as expected #1269

Open
erlebach opened this issue Dec 27, 2024 · 1 comment
Open

LlamaCppGenerator randomness not working as expected #1269

erlebach opened this issue Dec 27, 2024 · 1 comment

Comments

@erlebach
Copy link

erlebach commented Dec 27, 2024

Consider the code below, which run a Llama-3.1 model with non-zero temperature. When I execute the code below multiple times, I always get the same resposne, even though Llama.cpp uses a non-deterministic seed by default. Is this expected behavior? What approach should I use to get a different result on every run? Setting seed=-1 in the generation_kwargs dictionary solves the problem. It is not clear why this is necessary though because seed=-1 by default in Llama.cpp (:#define LLAMA_DEFAULT_SEED 0xFFFFFFFF in spm-headers/llama.h in the https://github.com/ggerganov/llama.cpp.git repository. This suggest that there is an error somewhere.

      generation_kwargs={
          "max_tokens": 128,
          "temperature": 0.7,
          "top_k": 40,
          "top_p": 0.9,
          "seed": -1,
      },
from haystack_integrations.components.generators.llama_cpp import LlamaCppGenerator

# Set the seed to a random value based on the current time
random.seed(int(time.time()))

generator = LlamaCppGenerator(
    model="data/llm_models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf",
    n_ctx=512,
    n_batch=128,
    model_kwargs={
        "n_gpu_layers": -1,
        "verbose": False,
        "n_gpu_layers": -1,
    },
    generation_kwargs={
        "max_tokens": 128,
        "temperature": 1.7,
        "top_k": 40,
        "top_p": 0.9,
    },
)
generator.warm_up()

simplified_schema = '{"content": "Your single sentence answer here"}'
system =  "You are a helpful assistant. Respond to questions with a single sentence " \
          f"using clean JSON only following the JSON schema{simplified_schema}. " \
          " Never use markdown formatting or code block indicators."
user_query = "What is artificial intelligence?"

prompt = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>" \
         f"{system}<|eot_id|>" \
         f"<|start_header_id|>user<|end_header_id|> {user_query}" \
         f"<|start_header_id|>assistant<|end_header_id|>"
print(f"{prompt=}")

result = generator.run(prompt)
print("result= ", result["replies"][0])
@julian-risch
Copy link
Member

@erlebach Thanks for reaching out about this issue. Have you seen this related issue? abetlen/llama-cpp-python#1809

@julian-risch julian-risch transferred this issue from deepset-ai/haystack Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants