Speculative decoding uses a small, fast draft model to generate tokens that are verified in parallel by the larger target model, speeding up inference without losing quality.
Optimization technique to overcome memory bandwidth bottlenecks.
Can improve inference speed by 2-3x.