Skip to content

Releases: ndif-team/nnsight

v0.4.1

04 Feb 22:01
Compare
Choose a tag to compare

What's Changed

Fixed problem with multi invoke LanguageModel (attention mask batching)
Removed bug causing preloaded model to try and load.

Full Changelog: v0.4.0...v0.4.1

v0.4.0

31 Jan 21:11
Compare
Choose a tag to compare

Changelog

0.4.0

released: 2025-1-31

Refer to the colab for interactive walkthrough.

Breaking Changes

  • The InterventionGraph now follows a sequential execution order. Module envoys are expected to be referenced following the model’s architecture hierarchy. This means that out-of-order in-place operations will not take effect.

  • Saved node values are automatically injected into their proxy reference in the Python frames post graph execution. If you are calling .value in your code after tracing, this could lead to the wrong behavior.

New Features

  • NNsightError. nnsight will now produce a more comprehensive and user friendly explanation when an error occurs during execution, as a result of operations defined during tracing. Error messages will now point directly to the original traceback where the operation was initially defined. This feature can be toggled using Config.APP.DEBUG, defaults to true.

  • Traceable Python Control Flow. This feature adds Python’s Conditional and Iterator block statements as traceable operations on InterventionProxys. This feature can be toggled using Config.APP.CONTROL_FLOW_HACKS, default to true. Tracer.cond(…) and Tracer.iter(…) are still supported.

import nnsight

...

with lm.trace("Hello World!"):
    foo = nnsight.list([0, 1, 2, 3])
    for item in foo:
        if item % 2 == 0:
            nnsight.log(item)

>>> 0
>>> 2
  • Value Injection. References to saved proxies are now automatically replaced by their node value after the execution of the nnsight backend. This feature can be toggled using Config.APP.FRAME_INJECTION, defaults to true.
import nnsight

...

with lm.trace("Hello World!"):
    foo = nnsight.list().save()

print(type(foo))

>>> <class 'list'>
  • Module Renaming. Modules can now be easily renamed by passing a rename keyword argument into an NNsight model constructor. In the following example, we rename all '.attn' modules to '.attention':
import nnsight

rename = {
    "attn": "attention"
}

model = nnsight.LanguageModel("openai-community/gpt2", dispatch=False, device_map = "auto", rename=rename)

with model.trace("Hello"):
    
    value= model.transformer.h[2].attention.output.save()
from nnsight.modeling.vllm import VLLM

vllm_gpt2 = VLLM("gpt2",
                 tensor_parallel_size=2,
                 gpu_memory_utilization=0.5,
                 dispatch=True)

with vllm_gpt2.trace("The Eiffel Tower is located in the city of", temperature=0.0, top_p=1.0, max_tokens=1):
    hs = vllm_gpt2.transformer.h[5].output.save()
    logit = vllm_gpt2.logits.output.save()

print(vllm_gpt2.tokenizer.decode(logit.argmax(dim=-1)))

>>> " Paris"

Arguments to the vllm.LLM engine can be directly passed to the nnsight VLLM constructor, while arguments of vllm.SamplingParams (typically passed during generation) can be passed to the trace call or to individual invoker calls which will override any parameters specified in the trace call for that single batch.

vLLM flattens the batch dimension, however, nnsight converts the indexing values so that interventions can still be easily carried on different batches, similar to how it would conducted on a non-vLLM model.

Tensor Parallelism > 1 is supported with the vLLM integration.

  • Trace Decorator. You can now decorate your external functions to make them traceable within your nnsight experiment. This is required for user-defined functionality to be traced.
import nnsight

...

@nnsight.trace
def my_func(value):
    print(value)

with lm.trace("Hello World!"):
    num = nnsight.int(5)
    my_func(num)

>>> 5
  • IteratorEnvoy context. It’s easier now to define interventions for multiple iterations of token generation by opening a context with .all or .iter on an envoy. The .all context will apply the envoy specific interventions at every forward pass, while .iter can be indexed and sliced to apply the interventions on specifically chosen iterations.

Ex: Envoy.all()

import nnsight

...

with lm.generate("Hello", max_new_tokens=10):
    logits = nnsight.list().save()
    with lm.lm_head.all():
        logits.append(lm.lm_head.output)

print(len(logits))

>>> 10

Ex: Envoy.iter

import nnsight

...

with lm.generate("Hello", max_new_tokens=10):
    logits = nnsight.list().save()
    with lm.lm_head.iter[5:8]:
        logits.append(lm.lm_head.output)

print(len(logits))

>>> 3

Known Issues

  • Inline Control Flow are not supported.

Ex:

with lm.trace("Hello World!"):
    foo = nnsight.list([0, 1, 2, 3]).save()
    [nnsight.log(item) for item in foo]

>>> Error
  • Value Injection is not supported for proxies referenced within objects.

  • The vllm.LLM engine performs max_tokens + 1 forward passes which can lead to undesired behavior if you are running interventions on all iterations of multi-token generation.

Ex:

with vllm_gpt2("Hello World!", max_tokens=10):
    logits = nnsight.list().save()
    with vllm_gpt2.logits.all():
        logits.append(vllm_gpt2.logits.output)

print(len(logits))

>>> 11 # expected: 10
  • IteratorEnvoy contexts can produce undesired behavior for subsequent operations defined below it that are not dependent on InterentionProxys.

Ex:

with lm.generate("Hello World!", max_new_tokens=10):
    hs_4 = nnsight.list().save()

    with lm.transformer.h[4].all():
        hs_4.append(lm.transformer.h[4].output)

    hs_4.append(433)

print(len(hs_4))

>>> 20 # expected: 11

Important Considerations

  • Remote execution is currently not available with this pre-release version!

  • Tracer.cond(…) and Tracer.iter(…) are still supported.

  • vLLM does not come as a pre-installed dependency of nnsight.

  • nnsight supports vllm==0.6.4.post1

  • vLLM support only includes cuda and auto devices at the moment.

  • vLLM models do not support gradients.

  • The @nnsight.trace decorator does not enable user-defined operations to be executed remotely. Something coming soon for that...

What's Changed

New Contributors

Full Changelog: v0.3.7...v0.4.0

v0.4.0.dev

05 Dec 21:15
Compare
Choose a tag to compare
v0.4.0.dev Pre-release
Pre-release

Changelog

0.4.0.dev

released: 2024-12-05

Refer to the colab for interactive walkthrough of the new changes.

Breaking Changes

  • The InterventionGraph now follows a sequential execution order. Module envoys are expected to be referenced following the model’s architecture hierarchy. This means that out-of-order in-place operations will not take effect.

  • Saved node values are automatically injected into their proxy reference in the Python frames post graph execution. If you are calling .value in your code after tracing, this could lead to the wrong behavior.

New Features

  • NNsightError. nnsight will now produce a more comprehensive and user friendly explanation when an error occurs during execution, as a result of operations defined during tracing. Error messages will now point directly to the original traceback where the operation was initially defined. This feature can be toggled using Config.APP.DEBUG, defaults to true.

  • Traceable Python Control Flow. This feature adds Python’s Conditional and Iterator block statements as traceable operations on InterventionProxys. This feature can be toggled using Config.APP.CONTROL_FLOW_HACKS, default to true. Tracer.cond(…) and Tracer.iter(…) are still supported.

import nnsight

...

with lm.trace("Hello World!"):
    foo = nnsight.list([0, 1, 2, 3])
    for item in foo:
        if item % 2 == 0:
            nnsight.log(item)

>>> 0
>>> 2
  • Value Injection. References to saved proxies are now automatically replaced by their node value after the execution of the nnsight backend. This feature can be toggled using Config.APP.FRAME_INJECTION, defaults to true.
import nnsight

...

with lm.trace("Hello World!"):
    foo = nnsight.list().save()

print(type(foo))

>>> <class 'list'>
from nnsight.modeling.vllm import VLLM

vllm_gpt2 = VLLM("gpt2",
                 tensor_parallel_size=2,
                 gpu_memory_utilization=0.5,
                 dispatch=True)

with vllm_gpt2.trace("The Eiffel Tower is located in the city of", temperature=0.0, top_p=1.0, max_tokens=1):
    hs = vllm_gpt2.transformer.h[5].output.save()
    logit = vllm_gpt2.logits.output.save()

print(vllm_gpt2.tokenizer.decode(logit.argmax(dim=-1)))

>>> " Paris"

Arguments to the vllm.LLM engine can be directly passed to the nnsight VLLM constructor, while arguments of vllm.SamplingParams (typically passed during generation) can be passed to the trace call or to individual invoker calls which will override any parameters specified in the trace call for that single batch.

vLLM flattens the batch dimension, however, nnsight converts the indexing values so that interventions can still be easily carried on different batches, similar to how it would conducted on a non-vLLM model.

Tensor Parallelism > 1 is supported with the vLLM integration.

  • Trace Decorator. You can now decorate your external functions to make them traceable within your nnsight experiment. This is required for user-defined functionality to be traced.
import nnsight

...

@nnsight.trace
def my_func(value):
    print(value)

with lm.trace("Hello World!"):
    num = nnsight.int(5)
    my_func(num)

>>> 5
  • IteratorEnvoy context. It’s easier now to define interventions for multiple iterations of token generation by opening a context with .all or .iter on an envoy. The .all context will apply the envoy specific interventions at every forward pass, while .iter can be indexed and sliced to apply the interventions on specifically chosen iterations.

Ex: Envoy.all()

import nnsight

...

with lm.generate("Hello", max_new_tokens=10):
    logits = nnsight.list().save()
    with lm.lm_head.all():
        logits.append(lm.lm_head.output)

print(len(logits))

>>> 10

Ex: Envoy.iter

import nnsight

...

with lm.generate("Hello", max_new_tokens=10):
    logits = nnsight.list().save()
    with lm.lm_head.iter[5:8]:
        logits.append(lm.lm_head.output)

print(len(logits))

>>> 3

Known Issues

  • Inline Control Flow are not supported.

Ex:

with lm.trace("Hello World!"):
    foo = nnsight.list([0, 1, 2, 3]).save()
    [nnsight.log(item) for item in foo]

>>> Error
  • Value Injection is not supported for proxies referenced within objects.

  • The vllm.LLM engine performs max_tokens + 1 forward passes which can lead to undesired behavior if you are running interventions on all iterations of multi-token generation.

Ex:

with vllm_gpt2("Hello World!", max_tokens=10):
    logits = nnsight.list().save()
    with vllm_gpt2.logits.all():
        logits.append(vllm_gpt2.logits.output)

print(len(logits))

>>> 11 # expected: 10
  • IteratorEnvoy contexts can produce undesired behavior for subsequent operations defined below it that are not dependent on InterentionProxys.

Ex:

with lm.generate("Hello World!", max_new_tokens=10):
    hs_4 = nnsight.list().save()

    with lm.transformer.h[4].all():
        hs_4.append(lm.transformer.h[4].output)

    hs_4.append(433)

print(len(hs_4))

>>> 20 # expected: 11

Important Considerations

  • Remote execution is currently not available with this pre-release version!

  • Tracer.cond(…) and Tracer.iter(…) are still supported.

  • vLLM does not come as a pre-installed dependency of nnsight.

  • nnsight supports vllm==0.6.4.post1

  • vLLM support only includes cuda and auto devices at the moment.

  • vLLM models do not support gradients.

  • The @nnsight.trace decorator does not enable user-defined operations to be executed remotely. Something coming soon for that...

What's Changed

New Con...

Read more

v0.3.7

20 Nov 18:56
7664e72
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.6...v0.3.7

v0.3.6

22 Sep 01:38
ad06cca
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.3.5...v0.3.6

v0.3.5

08 Sep 17:22
2f41edd
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.3.4...v0.3.5

v0.3.4

03 Sep 21:19
2f59fd0
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.3.3...v0.3.4

v0.3.3

03 Sep 15:42
60bca6b
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.3.2...v0.3.3

v0.3.2

02 Sep 18:26
0f3340d
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.3.1...v0.3.2

v0.3.1

01 Sep 23:11
5850fe9
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.3.0...v0.3.1