Pular para o conteúdo principal

Infrastructure

Definição

AI infrastructure covers hardware (GPUs, TPUs, custom accelerators) and software (distributed training, serving, orchestration) for training and deploying large models.

A escala é impulsionada por LLMs e grandes modelos de visão; o treinamento pode usar milhares de GPUs; a implantação usa compressão de modeloompression (por ex. quantization) and batching to meet latency and cost. Frameworks (PyTorch, JAX, TensorFlow) provide the programming model; clouds and on-prem clusters provide the hardware and orchestration.

Como funciona

Dados e configuração (modelo, hiperparâmetros) alimentam o treinamento: treinamento distribuído executado em muitos dispositivos usandog data parallelism (replicate model, split data) and/or model parallelism (split model across devices). Frameworks (PyTorch, JAX) and orchestrators (SLURM, Kubernetes, cloud jobs) manage scheduling and communication. The trained model is then served: loaded on inference hardware, optionally quantized, and exposed via an API. Serving uses batching, replication, and load balancing to meet throughput and latency; monitoring and versioning are part of the pipeline.

Casos de uso

ML infrastructure covers training at scale and serving with the right latency, throughput, and reliability.

  • Distributed training of large models across GPU/TPU clusters
  • Serving models at scale with batching and replication
  • End-to-end ML pipelines from data to deployment

Documentação externa

Veja também