Seguridad de la IA
Definición
La seguridad de la IA aborda los riesgos de la IA avanzada: mal uso, comportamiento no deseado y alineación (systems doing what we intend). It includes robustness, interpretability, and value alignment.
Se superpone con AI ethics (governance, fairness) and bias in AI (unfair outcomes). For LLMs and agents, alignment (por ej. RLHF, constitutional AI) and guardrails are the main levers; explainable AI supports auditing and debugging.
Cómo funciona
La entrada es procesada por el modelo para producir salida. Auditoría (pruebas, monitoreo, red-teaming) verifica que las salidas son seguras, alineadas y robust. Research and practice focus on: alignment (RLHF, constitutional AI, scalable oversight) so models follow intent; robustness (adversarial testing, distribution shift) so they behave under edge cases; monitoring in production to detect misuse or drift. Safety is considered across the lifecycle from diseño and data to training, evaluation, and deployment. Formal methods and interpretability (XAI) support the audit step.
Casos de uso
La seguridad de IA es relevante para cualquier sistema de alto riesgo o público: alineamiento, robustez y monitoreo desde el diseño hasta el despliegue.
- Auditing and red-teaming high-stakes or public-facing models
- Alignment and guardrails for LLMs and agents (por ej. RLHF, constitutional AI)
- Robustness testing and monitoring in production
Documentación externa
- Anthropic – Safety — Research on AI safety and alignment
- OpenAI – Safety and responsibility