The Qualcomm Cloud AI 100 Ultra is a high-performance, cost-efficient AI inference card designed specifically for generative AI applications and large language models (LLMs). Here are the detailed specifications:
• Form Factor: PCIe Full-Height 3/4 Length
• TDP (Thermal Design Power): 150W
• Machine Learning Capacity (INT8): 870 TOPS (Tera Operations Per Second)
• On-die SRAM: 576 MB
• On-card DRAM: 128 GB LPDDR4x
• Memory Bandwidth: 548 GB/s
• Host Interface: PCIe Gen 4, 16 lanes
• Number of AI Cores: 64 AI cores on a single card
• Performance Metrics: Supports 1008 generative AI models on a single card
• Programmability: Fully programmable, supporting the latest AI techniques and data formats
• Efficiency: Delivers optimal performance per Total Cost of Ownership (TCO)
• Model Handling: Capable of managing 8x larger models within a single server
• Software Support: Comes with software tools that facilitate easy porting of pre-trained models.
This card is tailored for accelerating large AI workloads, especially those involving generative AI and LLMs, making it ideal for data centers and enterprise applications requiring high inference speeds and efficiency.