Poda (Pruning)

Definição

Pruning removes redundant or low-impact weights (or neurons/heads) from a model. Unstructured pruning drops individual weights; structured pruning removes entire channels or layers for efficient execution.

É part of model compression; often used with quantization or knowledge distillation for smaller, faster models. Unstructured pruning saves parameters but may not speed up much on standard hardware; structured pruning (por ex. channels) yields real speedups.

Como funciona

Parte de um modelo treinado. Pontua pesos (ou canais/cabeças) por importância (por ex. magnitude, gradiente ou aprrned mask). Prune: zero out or remove the lowest-scoring parameters (unstructured) or entire channels/layers (structured). Fine-tune the pruned model to recover accuracy. Pruning can be one-shot (after training) or iterative (train → prune → fine-tune, repeat). Sparsity is often enforced with L1 or other regularizers during training so the model adapts to pruning. The final model has fewer non-zero weights and, with structured pruning, faster inference.

Casos de uso

Pruning helps when you want a smaller or faster model by removing low-importance weights or structures.

Shrinking models for edge or mobile deployment
Reducing compute and memory with structured pruning (por ex. channels)
Combining with quantization for smaller, faster models

Poda (Pruning)

Definição

Como funciona

Casos de uso

Documentação externa

Veja também

Definição​

Como funciona​

Casos de uso​

Documentação externa​

Veja também​

Definição

Como funciona

Casos de uso

Documentação externa

Veja também