ClipPhrase

reinforcement learning

Услышьте как "reinforcement learning" звучит в реальной речи — 316 примеров из 5 видео, фильмов и сериалов. Каналы: Lex Fridman, The Diary Of A CEO.

316
клипов найдено
5
видео

Примеры в видео

YouTubeLex Fridman1.3M просм. · 2024-03-07
Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast #416
reinforcement learning with human feedback.”
- So you've mentioned RLHF,Why do you still hate reinforcement learning?
Играть с 90:00
YouTubeThe Diary Of A CEO2.4M просм. · 2025-08-18
Brain Experts WARNING: Watch This Before Using ChatGPT Again! (Shocking New Discovery)
reinforcement learning. and and if we”
have basil ganglia. They they don't usewant to make them uh to be adopt a
Играть с 24:55
YouTubeLex Fridman4.3M просм. · 2022-02-26
Mark Zuckerberg: Meta, Facebook, Instagram, and the Metaverse | Lex Fridman Podcast #267
reinforcement learning on it i was gonna”
around and i've just been doingrelease it
Играть с 36:20
YouTubeLex Fridman2.1M просм. · 2023-03-30
Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization | Lex Fridman Podcast #368
reinforcement learning by human feedback”
showing thathas made the GPT series worse in some
Играть с 9:40
YouTubeLex Fridman787K просм. · 2026-01-31
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
reinforcement learning with human feedback. So it's more on the algorithmic side than the”
What was new was adding supervised fine-tuning andarchitecture.
Играть с 45:40
YouTubeLex Fridman787K просм. · 2026-01-31
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
reinforcement learning with verifiable rewards training just kind of let the models”
when we say enabled, is almost entirely downstream of the fact that thispick up these skills very easily. So let the models learn,
Играть с 50:21
YouTubeLex Fridman787K просм. · 2026-01-31
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
Reinforcement learning is all about optimizing reward. In practice,”
completions, and these completions are what you're going to grade.you can have a lot of different actors in different parts of the world
Играть с 59:57
YouTubeLex Fridman787K просм. · 2026-01-31
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
reinforcement learning with verifiable rewards. You can scale up the training”
- The biggest one from 2025 is learning thisthere, which means doing a lot of this kind of iterative generate-grade
Играть с 97:35
YouTubeLex Fridman787K просм. · 2026-01-31
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
reinforcement learning from human feedback, where in that era the score”
RL gradient updates. The infrastructure evolved fromthey were trying to optimize was a learned reward model of human
Играть с 100:05
YouTubeLex Fridman787K просм. · 2026-01-31
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
Reinforcement Learning with Verifiable Rewards—in real scientific domains,”
- There are interesting bets. A lot of people are trying to do RLVR—where startups with hundreds of millions of funding have wet labs where they're
Играть с 193:42