剪枝

定义

Pruning removes redundant or low-impact weights (or neurons/heads) from a model. Unstructured pruning drops individual weights; structured pruning removes entire channels or layers for efficient execution.

它是 part of model compression; often used with quantization or knowledge distillation for smaller, faster models. Unstructured pruning saves parameters but may not speed up much on standard hardware; structured pruning (例如 channels) yields real speedups.

工作原理

Start from a trained model. Score weights (or channels/heads) by importance (例如 magnitude, gradient, or learned mask). Prune: zero out or remove the lowest-scoring parameters (unstructured) or entire channels/layers (structured). Fine-tune the pruned model to recover accuracy. Pruning can be one-shot (after training) or iterative (train → prune → fine-tune, repeat). Sparsity is often enforced with L1 or other regularizers during training so the model adapts to pruning. The final model has fewer non-zero weights and, with structured pruning, faster inference.

应用场景

Pruning helps when you want a smaller or faster model by removing low-importance weights or structures.

Shrinking models for edge or mobile deployment
Reducing compute and memory with structured pruning (例如 channels)
Combining with quantization for smaller, faster models

剪枝

定义

工作原理

应用场景

外部文档

另请参阅

定义​

工作原理​

应用场景​

外部文档​

另请参阅​

定义

工作原理

应用场景

外部文档

另请参阅