Skip to content

API Introduction

easyroutine.interpretability is the module that implement code for extract the hidden rappresentations of HuggingFace LLMs and intervening on the forward pass.

Simple Tutorial

# First we need to import the HookedModel and the config classes
from easyroutine.interpretability import HookedModel, ExtractionConfig

# Then we can create the hooked model
hooked_model = HookedModel.from_pretrained(model_name="mistral-community/pixtral-12b", device_map = "auto")

# Now let's define a simple dataset
dataset = [
    "This is a test",
    "This is another test"
]

tokenizer = hooked_model.get_tokenizer()

dataset = tokenizer(dataset, padding=True, truncation=True, return_tensors="pt") 

cache = hooked_model.extract_cache(
    dataset,
    target_token_positions = ["last"],
    extraction_config = ExtractionConfig(
        extract_resid_out = True
    )
)