Can GRPO be 10x Efficient? Kwai AI’s SRPO Suggests Yes with SRPO | Synced

Kwai AI’s SRPO framework slashes LLM RL post-training steps by 90% while matching DeepSeek-R1 performance in math and code. This two-stage RL approach with history resampling overcomes GRPO l...

By · · 1 min read

Source: Synced | AI Technology & Industry Review

Kwai AI’s SRPO framework slashes LLM RL post-training steps by 90% while matching DeepSeek-R1 performance in math and code. This two-stage RL approach with history resampling overcomes GRPO limitations.