Discussion about this post

User's avatar
JP's avatar

Coming to this a bit late but the RAG-to-RL shift you describe maps perfectly to what Jeff Dean laid out on Latent Space recently. He singles out code as the domain where RL works best because the output is verifiable. Tests pass or fail, compilation succeeds or throws errors. That tight feedback loop is basically what makes coding agents the clearest proof point for the whole reasoning + RL thesis. Dug into the specifics here: https://reading.sh/jeff-dean-on-what-actually-makes-ai-agents-work-dced5bb50206?sk=d8b9e7faac0da6011382834459ca4808

Giridhar Vishwanath's avatar

So, this means that to not be another LLM wrapper, startups will have to fine-tune models with their proprietary data, so that they can thrive and survive. Your views?

No posts

Ready for more?