Poda (Pruning)

Definición

La poda elimina pesos redundantes o de bajo impacto (or neurons/heads) from a model. Unstructured pruning drops individual weights; structured pruning removes entire channels or layers for efficient execution.

Es part of model compression; often used with quantization or knowledge distillation for smaller, faster models. Unstructured pruning saves parameters but may not speed up much on standard hardware; structured pruning (por ej. channels) yields real speedups.

Cómo funciona

Start from a trained model. Score weights (or channels/heads) by importance (por ej. magnitude, gradient, or learned mask). Prune: zero out or remove the lowest-scoring parameters (unstructured) or entire channels/layers (structured). Fine-tune the pruned model to recover accuracy. Pruning can be one-shot (after training) or iterative (train → prune → fine-tune, repeat). Sparsity is often enforced with L1 or other regularizers during training so the model adapts to pruning. The final model has fewer non-zero weights and, with structured pruning, faster inference.

Casos de uso

Pruning helps when you want a smaller or faster model by removing low-importance weights or structures.

Shrinking models for edge or mobile deployment
Reducing compute and memory with structured pruning (por ej. channels)
Combining with quantization for smaller, faster models

Poda (Pruning)

Definición

Cómo funciona

Casos de uso

Documentación externa

Ver también

Definición​

Cómo funciona​

Casos de uso​

Documentación externa​

Ver también​

Definición

Cómo funciona

Casos de uso

Documentación externa

Ver también