FastInference

Go Back

High-performance inference engine for transformer models. Implements advanced optimization techniques including KV-cache optimization, attention fusion, and dynamic batching.