Agent debugging and observability
Techniques and tools for tracing, logging, and diagnosing failures in AI agent systems.
Deep technical content for practitioners
View all tagsTechniques and tools for tracing, logging, and diagnosing failures in AI agent systems.
How to measure, benchmark, and systematically test AI agent performance in production and development.
Threats, attack vectors, and defensive techniques for securing AI agent systems in production.
Agentless configuration management and automation tool that uses declarative YAML playbooks to configure servers, install software, and manage ML training environments at scale.
DAG-based workflow orchestration for ML and data pipelines — operators, sensors, hooks, XComs, and scheduler architecture.
Distributed event streaming with Apache Kafka — topics, partitions, producers, consumers, and real-time ML feature pipelines.
Distributed data processing with Apache Spark — RDDs, DataFrames, Spark SQL, MLlib, and driver/executor architecture.
Automatic Prompt Engineering (APE) uses LLMs to generate, score, and iteratively refine prompt instructions, replacing manual trial-and-error with a data-driven optimization loop that discovers high-performing prompts at scale.
How to build MCP clients that connect AI applications to MCP servers — covering client initialization, capability discovery, tool invocation, resource reading, and transport selection.
How to build MCP servers that expose tools, resources, and prompts to any MCP-compatible AI application — covering server setup, capability registration, transport configuration, and the full server lifecycle.
Directed acyclic graph workflows for agents — parallel execution, task dependencies, and dynamic graph construction.
Debiasing techniques are prompt-level and evaluation strategies for identifying and reducing systematic bias in LLM outputs — covering social biases, sycophancy, positional effects, and evaluation distortions — to produce fairer and more reliable responses.
Generative models based on denoising diffusion.
Running lightweight reasoning and inference at the edge (devices, gateways).
Centralized repositories for computing, storing, and serving ML features consistently across training and production.
Training across decentralized data without centralizing it.
Adversarial training for generative models.
Open-source analytics and visualization platform for building interactive dashboards over time-series and log data, essential for ML infrastructure and model performance monitoring.
Training a small student model to mimic a large teacher.
Open-source ML toolkit for Kubernetes — pipelines, hyperparameter tuning, and model serving at scale.
Model Context Protocol (MCP) in Claude Code — what MCP servers are, how they extend Claude's capabilities, how to install and configure them, and how to build custom MCP servers.
Running machine learning workloads on Kubernetes — containerizing models, GPU scheduling, and scaling strategies.
Open-source monitoring and alerting toolkit built around a time-series database and a pull-based scraping model, widely used for ML infrastructure and model metrics.
How Claude Code uses prompt caching to reduce latency and token costs by reusing previously processed system prompts, tool definitions, and conversation prefixes across API calls.
Removing weights or structures to shrink models.
Using lower precision (e.g. int8) for weights and activations.
Agents that evaluate their own output and iteratively improve through reflection, critic agents, and the Reflexion framework.
Techniques that prompt an LLM to assess the quality and confidence of its own outputs — enabling iterative self-correction, uncertainty quantification, and more trustworthy responses without external supervision.
Declarative Infrastructure as Code tool by HashiCorp for provisioning and managing cloud resources, widely used to create reproducible ML infrastructure including GPU instances, storage buckets, and Kubernetes clusters.
Probabilistic autoencoders for generation and representation.