Wanda (Pruning by Weights and Activations) requires no retraining and removes weights based on the product of their magnitude and input activation, effectively shrinking models.
Efficient 2024 pruning method.
Allows running larger models on smaller hardware.