Go Back
FastInference
High-performance inference engine for transformer models. Implements advanced optimization techniques including KV-cache optimization, attention fusion, and dynamic batching.
High-performance inference engine for transformer models. Implements advanced optimization techniques including KV-cache optimization, attention fusion, and dynamic batching.