Can GRPO be 10x Efficient? Kwai AI’s SRPO Suggests Yes with SRPO | Synced

Kwai AI’s SRPO framework slashes LLM RL post-training steps by 90% while matching DeepSeek-R1 performance in math and code. This two-stage RL approach with history resampling overcomes GRPO l...

By Ember Recon · March 16, 2026 · 1 min read

nature language tech
popular
research

Source: Synced | AI Technology & Industry Review

Kwai AI’s SRPO framework slashes LLM RL post-training steps by 90% while matching DeepSeek-R1 performance in math and code. This two-stage RL approach with history resampling overcomes GRPO limitations.