“reinforcement learning”
Höre wie "reinforcement learning" in echter Sprache klingt — 316 Beispiele aus 5 Videos, Filmen und Serien. Kanäle: Lex Fridman, The Diary Of A CEO.
316
Clips gefunden
5
Videos
Beispiele im Video
YouTubeLex Fridman1.3M Aufrufe · 2024-03-07
Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast #416
“reinforcement learning
with human feedback.”
- So you've mentioned RLHF, … Why do you still hate
reinforcement learning?
Abspielen bei 90:00
YouTubeThe Diary Of A CEO2.4M Aufrufe · 2025-08-18
Brain Experts WARNING: Watch This Before Using ChatGPT Again! (Shocking New Discovery)
“reinforcement learning. and and if we”
have basil ganglia. They they don't use … want to make them uh to be adopt a
Abspielen bei 24:55
YouTubeLex Fridman4.3M Aufrufe · 2022-02-26
Mark Zuckerberg: Meta, Facebook, Instagram, and the Metaverse | Lex Fridman Podcast #267
“reinforcement learning on it i was gonna”
around and i've just been doing … release it
Abspielen bei 36:20
YouTubeLex Fridman2.1M Aufrufe · 2023-03-30
Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization | Lex Fridman Podcast #368
“reinforcement learning by human feedback”
showing that … has made the GPT series worse in some
Abspielen bei 9:40
YouTubeLex Fridman787K Aufrufe · 2026-01-31
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
“reinforcement learning with human feedback.
So it's more on the algorithmic side than the”
What was new was adding
supervised fine-tuning and … architecture.
Abspielen bei 45:40
YouTubeLex Fridman787K Aufrufe · 2026-01-31
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
“reinforcement learning with verifiable
rewards training just kind of let the models”
when we say enabled, is almost entirely
downstream of the fact that this … pick up these skills very
easily. So let the models learn,
Abspielen bei 50:21
YouTubeLex Fridman787K Aufrufe · 2026-01-31
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
“Reinforcement learning is all about
optimizing reward. In practice,”
completions, and these completions
are what you're going to grade. … you can have a lot of different actors
in different parts of the world
Abspielen bei 59:57
YouTubeLex Fridman787K Aufrufe · 2026-01-31
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
“reinforcement learning with verifiable
rewards. You can scale up the training”
- The biggest one from 2025 is learning this … there, which means doing a lot of
this kind of iterative generate-grade
Abspielen bei 97:35
YouTubeLex Fridman787K Aufrufe · 2026-01-31
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
“reinforcement learning from human
feedback, where in that era the score”
RL gradient updates. The
infrastructure evolved from … they were trying to optimize was
a learned reward model of human
Abspielen bei 100:05
YouTubeLex Fridman787K Aufrufe · 2026-01-31
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
“Reinforcement Learning with Verifiable
Rewards—in real scientific domains,”
- There are interesting bets. A lot
of people are trying to do RLVR— … where startups with hundreds of millions
of funding have wet labs where they're
Abspielen bei 193:42