“reinforcement learning”
聽聽 "reinforcement learning" 在真實對話中怎麼說——來自5部影片、電影和影集的316個範例。 頻道:Lex Fridman, The Diary Of A CEO.
316
個片段
5
個影片
影片中的範例
YouTubeLex Fridman1.3M 次觀看 · 2024-03-07
Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast #416
“reinforcement learning
with human feedback.”
- So you've mentioned RLHF, … Why do you still hate
reinforcement learning?
播放 90:00
YouTubeThe Diary Of A CEO2.4M 次觀看 · 2025-08-18
Brain Experts WARNING: Watch This Before Using ChatGPT Again! (Shocking New Discovery)
“reinforcement learning. and and if we”
have basil ganglia. They they don't use … want to make them uh to be adopt a
播放 24:55
YouTubeLex Fridman4.3M 次觀看 · 2022-02-26
Mark Zuckerberg: Meta, Facebook, Instagram, and the Metaverse | Lex Fridman Podcast #267
“reinforcement learning on it i was gonna”
around and i've just been doing … release it
播放 36:20
YouTubeLex Fridman2.1M 次觀看 · 2023-03-30
Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization | Lex Fridman Podcast #368
“reinforcement learning by human feedback”
showing that … has made the GPT series worse in some
播放 9:40
YouTubeLex Fridman787K 次觀看 · 2026-01-31
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
“reinforcement learning with human feedback.
So it's more on the algorithmic side than the”
What was new was adding
supervised fine-tuning and … architecture.
播放 45:40
YouTubeLex Fridman787K 次觀看 · 2026-01-31
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
“reinforcement learning with verifiable
rewards training just kind of let the models”
when we say enabled, is almost entirely
downstream of the fact that this … pick up these skills very
easily. So let the models learn,
播放 50:21
YouTubeLex Fridman787K 次觀看 · 2026-01-31
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
“Reinforcement learning is all about
optimizing reward. In practice,”
completions, and these completions
are what you're going to grade. … you can have a lot of different actors
in different parts of the world
播放 59:57
YouTubeLex Fridman787K 次觀看 · 2026-01-31
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
“reinforcement learning with verifiable
rewards. You can scale up the training”
- The biggest one from 2025 is learning this … there, which means doing a lot of
this kind of iterative generate-grade
播放 97:35
YouTubeLex Fridman787K 次觀看 · 2026-01-31
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
“reinforcement learning from human
feedback, where in that era the score”
RL gradient updates. The
infrastructure evolved from … they were trying to optimize was
a learned reward model of human
播放 100:05
YouTubeLex Fridman787K 次觀看 · 2026-01-31
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
“Reinforcement Learning with Verifiable
Rewards—in real scientific domains,”
- There are interesting bets. A lot
of people are trying to do RLVR— … where startups with hundreds of millions
of funding have wet labs where they're
播放 193:42