AI 行业热点

🎙️ 播客精选

OpenAI’s Dan Roberts: Why AI Can Now Make Discoveries

The MAD Podcast with Matt Turck · 2026-06-04

Speaker 1 | 00:00 - 00:23
One of the things that CHAT GPT was able to do was assume it was false. When you go against the grain and do something contrarian like that, you really have to have strong conviction in what you’re doing in order to persevere down a really long calculation path. I feel really excited that we will get to really answer a lot of fundamental questions in the field of science that that we care about with the aid or the models being the driving force. And so that’s just really thrilling.

Speaker 2 | 00:23 - 01:03
Hi. I’m Matt Turk. Welcome to the Matt podcast. It’s been ye…

🎧 收听完整节目

🐦 X/Twitter 热点

Swyx (@swyx)

ibid [5 ❤️]
about time that the leading database company born in Singapore actually had Singapore investors take it seriously! [60 ❤️ 3 🔄]
Finally! the first eval ship from cog!!!!!!!!!! 👼🏼

To contextualize: @METR_Evals cap out at ~16 hours.

Cog has private enterprise evals up to 100hrs, and is confident enough to put a financial guarantee on it 🤯

METR dataset: ML eng, GPU kernels, cybersecurity

“METR (2026) used a combination of GPT-4o and GPT-5 to estimate the human-equivalent times from compressed Claude Code transcripts. These transcripts were collected from 7 METR technical staff on 34 sessions labeled on human ground truth”. rlog of 0.83

Cog dataset: real life java/typescript/python/c# feature dev, bugfixes, migrations

“We collected a ground-truth dataset by asking Devin users to review recent representative sessions, and estimate how long each completed session would have taken without Devin. Our dataset consists of 258 sessions from 126 users across a diverse set of enterprise customers.” rlog of 0.74 on held out set

this is pioneering real world evals work and part 1 of a broader frontier code evals drop that I’m really looking forward to writing up. huge kudos to @annarmitchell and @ryanbai1412 for leading the unglamorous last mile data collection!! [179 ❤️ 12 🔄]

Josh Woodward (@joshwoodward)

Love this Gemini feature on my macOS app! [140 ❤️ 4 🔄]

Thibault Sottiaux (@thsottiaux)

You can use codex within your own programs using the Python SDK. It’s awesome. Built by @ah20im and friends

pip install openai-codex
``` [1164 ❤️ 64 🔄]
- We're fixing a codex bug today that was causing us to undercount tokens being served to some Pro and Plus accounts by a small amount. This impacted &lt; 15% of accounts.

Not the kind of bug you want us to fix, but didn't want to do this silently and thought you should know. [3945 ❤️ 90 🔄]

### Peter Yang (@petergyang)

- lol it's on now [6 ❤️]
- Also as great as Codex is (and I'm really starting to love it) the frontend design still leaves alot to be desired. 

I have a /slides skill and you can guess which one Codex made vs. Claude.

Yes I know I can make an image with ChatGPT first and then tell Codex to build it but Claude can one shot great looking HTML slides.

Codex needs to fix this imo it's often a novice's first impression of a coding or knowledge work tool. [48 ❤️ 1 🔄]
- I spent the whole day today setting up integrations and skills in Codex for my top creator workflows.

Now I'm convinced that you can save at least 50% of your time on any type of knowledge work if you just set up the system upfront.

Note that all my workflows have human checkpoints along the way so I can apply my "taste."

Anyway, it feels extremely liberating to have all this set up. If you want to do the same, just follow these three steps:

1. Reflect on your past week:
- What work did you spend the most time on?
- What work was the most repetitive?

Pick your most painful, manual workflow to start.

2. List out every single step of that manual workflow. Be very detail oriented.

3. Open Codex (or Claude Code) and paste the list of steps from 2 and ask it "What integrations and skills can I build to streamline this with your help?"

AI will guide you the rest of the way. [98 ❤️ 4 🔄]

### Cat Wu (@_catwu)

- I'm hiring a PM for Claude Code, focused on model performance.

If you have experience writing agentic evals and want to integrate research ideas into our core products, I'd love to hear from you here: [993 ❤️ 44 🔄]

### Thariq (@trq212)

- An app can be a home-cooked meal (2020) 

personal software was a bit early in 2020 but in 2026, it really can be as personal as a home cooked meal, or a handwritten letter [511 ❤️ 20 🔄]
- How dynamic workflows allow Claude Code to handle whole new types of tasks [30 ❤️ 2 🔄]

### Amjad Masad (@amasad)

- Prompt to shop: [159 ❤️ 10 🔄]

### Guillermo Rauch (@rauchg)

- Congrats Void team!

We @vercel reaffirm our collaboration on an open platform for the web, with our investment in @nitrojsdev, open runtimes, and native support for Vite-based frameworks like Nuxt, Svelte, and TanStack Start 🫡 [843 ❤️ 25 🔄]

### Alex Albert (@alexalbert__)

- We just published internal data on how much of Claude's development is already being done by Claude:
- Over 80% of all code merged into our codebase is now written by Claude
- It's been months since many researchers at Anthropic hand-wrote code
- The typical Anthropic engineer ships 8x as much code as they did in 2024
- On the most open-ended engineering tasks, Claude's success rate jumped from ~26% to 76% in 6 months
- When research sessions went off-track, Claude proposed a better next step than the human took 64% of the time

We're not at recursive self-improvement yet, but it could come sooner than most expect. I highly recommend reading the full blog post. [2294 ❤️ 149 🔄]

### Aaron Levie (@levie)

- Good thought provoking post from Anthropic. I think this paragraph points to the key element of the optimistic scenario of AI:

“There has been an explosion of new ideas, initiatives, tools, and simulations, as a result of Anthropic employees working with highly capable models—far more than we have the capacity to pursue. The rate at which organizations can spot and fix these bottlenecks may be a skill that improves over time, and it may become the most important skill for any organization.”

AI lowers the barrier dramatically to allowing us to do more. As a result of that, we have far more ideas than we can pursue, and for the ones that we want to pursue we’re ultimately limited by our ability to go take on the surrounding work to execute those ideas. There’s almost no amount of AI progress that can happen where that goes away.

AI is going to let us build much more software, launch more marketing campaigns, research more drugs, and so on. All of this work, even when augmented by agents, still ultimately requires people to manage. [195 ❤️ 18 🔄]

### Garry Tan (@garrytan)

- Two YC decacorns in one day and one of them is building commercial fusion. Polaris hit 150 million degrees C, first privately funded machine to do it. This is the abundance future, built by people who actually ship. [105 ❤️ 4 🔄]
- So close to product market fit is still not product market fit [93 ❤️ 3 🔄]
-  [83 ❤️ 5 🔄]

### Matt Turck (@mattturck)

- This great conversation with @danintheory of @OpenAI  is also available on Spotify, Apple Podcasts and here on YouTube: [10 ❤️]
- Why AI Can Now Make Discoveries - my conversation with @danintheory, Lead of the Foundations of Reinforcement Learning team at @OpenAI 

00:00 Intro: AI's wild week in mathematics

01:21 What OpenAI's Foundations of RL team does

03:08 Dan's journey: from black holes and quantum gravity to frontier AI

07:04 Are AI systems becoming useful for real science

08:21 The AI math moment: Erdős, OpenAI, DeepMind, and Anthropic

08:52 Why the OpenAI result was an act of exploration

10:25 OpenAI vs. DeepMind: informal reasoning vs. formal proof

12:13 RL 101: learning by doing, not just watching

15:10 Why reinforcement learning works

15:58 How RL breaks: sparse feedback and long-horizon tasks

17:03 RLHF: how human feedback shaped early language models

18:48 Move 37, self-play, and the search for novel strategies

22:16 Explore vs. exploit in scientific discovery

24:49 Why RL may now be "the cake," not the cherry on top

25:46 Why RL started working with large language models

27:29 Is RL "sucking supervision through a straw"?

28:47 Why language may be the grounding layer for intelligence

31:46 A contrarian take on the Bitter Lesson

32:41 What test-time compute actually is

34:50 How RL gives models the ability to think

35:40 Verifiable rewards, math, coding, and the messy real world

38:00 What physics can teach us about AI

42:08 Is there a thermodynamics of AI?

43:08 From Erdős problems to Einstein-level AI

45:16 Is AI already doing original science?

45:51 How far are we from AI automating AI research

47:41 Why Dan is excited about the future of science [63 ❤️ 6 🔄]

### Nikunj Kothari (@nikunj)

- Introducing the Nock skill (powered by @meetgranola) 👀

Someone recently asked me if I can be replaced by AI, and my ego initially said NO WAY.. 

Then, I thought about what's the best way to have the most accurate "AI" representation of me for these conversations.

So, I used @claudeai Code to pull >200 1:1 founder pitch meeting notes that were captured by Granola in the last couple of years. 

Focused all the notes to only things I asked. Distilled it further down to ~53 meetings where there was a lot of good discussion and debate.

Finally, I took a few of my essays which describe the kind of founders that I love.

Using those, I used Claude Code again to build this skill called "Nock" that has some principles and a question bank that are grounded in real conversations. 

To refine, I used this skill over 5-10 actual decks and conversations where we first had the skill output what it thinks and then we compared it against the real conversation. We kept improving it - till it felt like an accurate representation of me. 

So, if you are a founder raising and want to see what I would think about your deck, try this out: nikunjk dot com / nock

And, if you are a VC who wants to build your own Nock, we have that too: nikunjk dot com / buildnock [60 ❤️ 3 🔄]
- For a year now, I’ve been texting @toddsaunders every week asking him when he’s going to quit to build the future for the trades..

And finally when I got the text, I knew we @FPVventures had to invest. A lot of “businessmen” and technologists treat this area as a lucrative business. 

Very few people have the empathy, trust and sheer love for this space. And that’s what will enable him, Sean and @DaltonMillsAI to win. 

If you are in the NY area, and want to join this phenomenal group, please reach out! [31 ❤️ 1 🔄]

### Dan Shipper (@danshipper)

- @every @TrySpiral with your agent here: [7 ❤️]
- NEW:

Spiral 4.0—a writing partner for you and your agent by @every

-> Stylometry: we built a new Style Engine based on the principles of stylometry to extract you and your brand's voice and produce great writing every time, based on examples of your past work

-> MCP and CLI: Spiral is now built to be used by your agent like Codex, Claude Code, OpenClaw and more so you can get great writing automatically

we use it every day internally to write landing pages, tweets, podcasts, marketing emails and more and to make sure it's ALL on-brand across our entire 30 person team @every [222 ❤️ 17 🔄]

### Aditya Agarwal (@adityaag)

- A lot of roles will now have engineering infused into them.

Marketing Engineer is a great example of this. [16 ❤️ 1 🔄]

### Sam Altman (@sama)

- man the early days of the internet were so special [8911 ❤️ 513 🔄]
- build and publish web apps with chatgpt!

i really wish i had this when i was a kid, but i do miss hypercard. [1997 ❤️ 86 🔄]
- big upgrade to chatgpt memory rolling out today! [3765 ❤️ 175 🔄]

### Claude (@claudeai)

- From The Problem Solvers, our series featuring founders taking on hard problems with Claude: [76 ❤️ 9 🔄]
- Anton Osika (@antonosika) is the co-founder and CEO of @lovable, where anyone can build software through conversation.

His working thesis: the most underrated moat in AI is trust, and earning it takes craft, care, and obsession. [1987 ❤️ 134 🔄]

---

*由 [Follow Builders](https://github.com/zarazhangrui/follow-builders) 自动生成 · 2026-06-05*

沉鱼的博客