Rejecting “reward is enough” / RL is the bottleneck:
> It is this pursuit of novelty that separates us from the rest of the animal kingdom, equally as much as it is our intelligence. Intelligence and the pursuit of novelty go hand in hand: it's impossible to pursue novelty without intelligence, and having intelligence without pursuing novelty is pointless -- it's just walking into the corner of a maze... humans follow their own interests and creativity in a way that's unexplained by the direct maximization of the utilitarian reward that evolution hard-coded into us.
> How can we formalize this? .. Informally speaking, the novelty reward is the rate of change of our ability to compress our past observations at a particular time. Understanding is compression, so it is the rate of change of our understanding of the world, with respect to time. The total amount of novelty reward will then correspond to the total increase of our understanding of the world over our initial predictions.
I knew Suchir for 8 years. He was one of the sharpest people I’d ever met, and a rare example of a true first principles, independent thinker. From the start, he was entirely self taught... seeing how incredible he had become purely from self teaching inspired me to work on open sourcing courses at top universities, specifically for helping young and self motivated people like him... It's a rare privilege that I treasure, to have had the chance to know Suchir, and to be inspired by someone who stands up for what they believe in, no matter what the status quo is.
Rejecting “reward is enough” / RL is the bottleneck:
> It is this pursuit of novelty that separates us from the rest of the animal kingdom, equally as much as it is our intelligence. Intelligence and the pursuit of novelty go hand in hand: it's impossible to pursue novelty without intelligence, and having intelligence without pursuing novelty is pointless -- it's just walking into the corner of a maze... humans follow their own interests and creativity in a way that's unexplained by the direct maximization of the utilitarian reward that evolution hard-coded into us.
> How can we formalize this? .. Informally speaking, the novelty reward is the rate of change of our ability to compress our past observations at a particular time. Understanding is compression, so it is the rate of change of our understanding of the world, with respect to time. The total amount of novelty reward will then correspond to the total increase of our understanding of the world over our initial predictions.
Memorial: https://x.com/yush_g/status/1885121418209484850