Pular para o conteúdo principal

Poda (Pruning)

Definição

Pruning removes redundant or low-impact weights (or neurons/heads) from a model. Unstructured pruning drops individual weights; structured pruning removes entire channels or layers for efficient execution.

É part of model compression; often used with quantization or knowledge distillation for smaller, faster models. Unstructured pruning saves parameters but may not speed up much on standard hardware; structured pruning (por ex. channels) yields real speedups.

Como funciona

Parte de um modelo treinado. Pontua pesos (ou canais/cabeças) por importância (por ex. magnitude, gradiente ou aprrned mask). Prune: zero out or remove the lowest-scoring parameters (unstructured) or entire channels/layers (structured). Fine-tune the pruned model to recover accuracy. Pruning can be one-shot (after training) or iterative (train → prune → fine-tune, repeat). Sparsity is often enforced with L1 or other regularizers during training so the model adapts to pruning. The final model has fewer non-zero weights and, with structured pruning, faster inference.

Casos de uso

Pruning helps when you want a smaller or faster model by removing low-importance weights or structures.

  • Shrinking models for edge or mobile deployment
  • Reducing compute and memory with structured pruning (por ex. channels)
  • Combining with quantization for smaller, faster models

Documentação externa

Veja também