Wanda Pruning

What is Wanda Pruning?

Wanda (Pruning by Weights and Activations) is a technique for compressing Large Language Models without the need for retraining. It decides which weights to remove based on the product of their magnitude and the input activation, effectively maintaining model performance while reducing size.

Where did the term "Wanda Pruning" come from?

Proposed in 'A Simple and Effective Pruning Approach for Large Language Models' (Sun et al., 2023).

How is "Wanda Pruning" used today?

Enables running massive models on consumer hardware with minimal accuracy loss.

Related Terms

Model Weights
Quantization
Inference