Infrastructure
定义
AI infrastructure covers hardware (GPUs, TPUs, custom accelerators) and software (distributed training, serving, orchestration) for training and deploying large models.
规模由 LLMs 和大型视觉模型驱动;训练可能使用数千个 GPU;服务使用模型压缩ompression (例如 quantization) and batching to meet latency and cost. Frameworks (PyTorch, JAX, TensorFlow) provide the programming model; clouds and on-prem clusters provide the hardware and orchestration.
工作原理
数据和配置(模型、超参数)输入训练:分布式训练在多个设备上运行,使用g data parallelism (replicate model, split data) and/or model parallelism (split model across devices). Frameworks (PyTorch, JAX) and orchestrators (SLURM, Kubernetes, cloud jobs) manage scheduling and communication. The trained model is then served: loaded on inference hardware, optionally quantized, and exposed via an API. Serving uses batching, replication, and load balancing to meet throughput and latency; monitoring and versioning are part of the pipeline.
应用场景
ML infrastructure covers training at scale and serving with the right latency, throughput, and reliability.
- Distributed training of large models across GPU/TPU clusters
- Serving models at scale with batching and replication
- End-to-end ML pipelines from data to deployment