Zum Hauptinhalt springen

Infrastructure

Definition

KI-Infrastruktur umfasst Hardware (GPUs, TPUs, benutzerdefinierte Beschleuniger) und Software (distributed training, serving, orchestration) for training and deploying großes Modells.

Die Skalierung wird von LLMs und großen Visionsmodellen vorangetrieben; Training kann Tausende von GPUs nutzen; Bereitstellung nutzt Modellkompressompression (z. B. quantization) and batching to meet latency and cost. Frameworks (PyTorch, JAX, TensorFlow) provide the programming model; clouds and on-prem clusters provide the hardware and orchestration.

Funktionsweise

Daten und Konfiguration (Modell, Hyperparameter) fließen in das Training ein: verteiltes Training läuft über viele Geräte mitg Datenparallelismus (replicate model, split data) and/or model parallelism (split model across devices). Frameworks (PyTorch, JAX) and orchestrators (SLURM, Kubernetes, cloud jobs) manage scheduling and communication. The trained model is then served: loaded on inference hardware, optionally quantized, and exposed via an API. Serving uses batching, replication, and load balancing to meet throughput and latency; monitoring and versioning are part of the pipeline.

Anwendungsfälle

ML infrastructure covers training at scale and serving with das richtige latency, throughput, and reliability.

  • Distributed training of großes Modells across GPU/TPU clusters
  • Serving models at scale with batching and replication
  • End-to-end ML pipelines aus Daten to deployment

Externe Dokumentation

Siehe auch