Support for multiple LLMs (currently LLAMA, BLOOM, OPT) at various model sizes (up to 170B) Support for a wide range of consumer-grade Nvidia GPUs Tiny and easy-to-use codebase mostly in Python (<500 ...
Beta: This SDK is supported for production use cases, but we do expect future releases to have some interface changes; see Interface stability. We are keen to hear feedback from you on these SDKs.