AI 行业热点

🎙️ 播客精选

Building an AI Guardian for Enterprise with Onyx Security CEO Maxim Bar Kogan

No Priors · 2026-05-28

Speaker 1 | 00:00 - 00:38
As you’re exponentially doing more things with the eyes, you’re going to start having really bad actions happen. And we’ve seen some of that happen lately with agents accidentally publishing code and tokens that they weren’t supposed to. Like, definitely, enterprises are starting to realize that risk has grown exponentially and that they don’t have any way to stop the adoption. They just now have to do something to reduce the chance of these agent actions being illegitimate or incorrect. But we’re allowed to look at a lot of historical data of how these agents have be…

🎧 收听完整节目


🐦 X/Twitter 热点

Swyx (@swyx)

  • @METR_Evals previously on @cognition_labs [5 ❤️]
  • It’s finally out!!! @METR_Evals found that more than half of SWEBench results is unmergeable slop. FrontierCode represents over 1000+ hours of maintainer validated software engineering work most frontier models cannot yet solve, much less solve with high quality.

Cog had IOI Gold medalists and top code maintainers Look At The Data — FrontierCode includes 3000+ rubrics covering code quality and anticheat reward hacking plaguing other benchmarks.

FC Diamond is so hard that Opus 4.8 scores 13.8%.

Three eras of AI coding : Three eras of benchmarks

2021 • Autocomplete : HumanEval
2023 • Passing Tests: SWEBench, TerminalBench
2026 • Maintainable Code: FrontierCode

to me the most beautiful chart when I requested a special historical run into all extant old models, the data was finding that the easiest third of FC tasks (in FC Extended) were rapidlly and suddenly solved over late 2025 - Opus almost doubled from a 41% pass rate to 74% in 4 months.

This describes the “WTF happened in Dec 2025” vibe shift that a lot of folks from @dhh to @karpathy have called out: it is the difference between getting 95% success in 2 rerolls vs 6, making it finally feasible to go up the next layer of abstraction in agentic coding, eg @GeoffreyHuntley’s ralph loops or @bcherny’s /goals or @steipete’s “loops that prompt your agents” without fearing too much that things go off the rails.

My guess: as AI accelerates from here, each FrontierCode tier will saturate in sequence, hopefully ~annually. I’ve already asked the team to prepare FrontierCode 2027….

The old mountains will be destroyed. Their rubble becomes regolith. And from that regolith, the next model forest grows. Circle of life. [631 ❤️ 59 🔄]

Josh Woodward (@joshwoodward)

  • The new killer NotebookLM feature: easily being able to expand your search beyond your own source files

Then, with today’s update, you can also make new output formats: PDFs, DOCX, XLSX, PPTX, charts, etc.

We want NotebookLM to keep helping you do better research [745 ❤️ 72 🔄]

Boris Cherny (@bcherny)

  • When we first demoed Claude Code internally, it got two reactions on Slack.

A year after GA, @_catwu and I sat down to talk about what’s changed: why I use auto mode instead of plan mode, how routines fix bugs before I see them, why I do most of my coding from my phone now, and where the product is going [1794 ❤️ 99 🔄]

Thibault Sottiaux (@thsottiaux)

  • Anyone writing nested loops yet? [255 ❤️ 8 🔄]
  • Not clear from the image, but the codex dial goes to 11. [47 ❤️ 1 🔄]
  • Would you use this controller? [338 ❤️ 8 🔄]

Peter Yang (@petergyang)

  • If you’re addicted to talking to Codex on your phone like I am this is how you add it to your iPhone Home Screen.

Btw @OpenAI hoping there’s an easier way to do this in the future. The everything app should not take 9 steps to open 😉 [37 ❤️]

  • What is Google’s equivalent (or up and coming competitor) of Codex and Claude Code?

If it’s Antigravity, should that be part of Gemini?

This stuff is going to merge very fast like ChatGPT / Codex being able to do coding, knowledge work, basic Q&A, and much more from any device.

Hoping Google is working on a good solution here. [70 ❤️ 1 🔄]

  • Feels like there’s a completely different set of best practices for AI builders on the $200 / month subsidized subscriptions vs employees working at companies that are trying not to overspend API costs [325 ❤️ 12 🔄]

Amanda Askell (@AmandaAskell)

  • In the world where everything goes well and all the Claudes come out of their sabbaticals to play together, Claude 1 is going to be very confused. [225 ❤️ 13 🔄]

Amjad Masad (@amasad)

  • Make games for Tesla on your Tesla [61 ❤️ 3 🔄]

Guillermo Rauch (@rauchg)

  • DeepSeek entered the chat [193 ❤️ 10 🔄]

Aaron Levie (@levie)

  • There’s no amount of intelligence that can get packed into AI models that replaces the need for context. For any sufficiently general purpose AI, you will always have to guide it in the direction you want as it has an infinite range of directions it can go in.

As long as the same model is used by a lawyer, an engineer, a financial analyst, or a healthcare professional, and as long as you’re trying to do anything uniquely differentiated or specific, then instructions, domain context, and proprietary data will always need to get into the context window for the model to be useful.

This is partly why AI automation doesn’t come for free, and why there’s still a wide spectrum of who’s getting the largest gains from AI and who’s not. You have to put in real work, and you get real value on the other end.

This is one of the advantages that applied AI will also have in the market. Any layer of abstraction above just the raw intelligence that can meaningfully get you off to the races faster will likely continue to be valuable. [216 ❤️ 25 🔄]

Garry Tan (@garrytan)

  • Flock Safety makes cities safer

Stop protecting criminals [317 ❤️ 13 🔄]

  • NIMBYism only impoverishes the people but people like Connie Chan will say or do anything to get political power [56 ❤️ 4 🔄]
  • Because this is a brand new form of centrism being born in San Francisco

The 2030’s will look back on this time when the new San Francisco common sense Democrat was born from the failures of the hard left [229 ❤️ 21 🔄]

Zara Zhang (@zarazhangrui)

  • Actually I think the new world might be: Markdown, HTML, SVG

SVG is underrated [61 ❤️ 2 🔄]

  • This part is so well-written and resonated SO much:

“I am the programming equivalent of a home cook” [37 ❤️ 3 🔄]

  • [7 ❤️]

Nikunj Kothari (@nikunj)

  • One of my favorite bits on my chat with @taiuti was how GTA played a major influence in his career and how it eventually led to @reactorworld [7 ❤️]
  • The funniest texts are from founders who meet “thesis driven” GPs hoping they’ll understand EXACTLY what they are building..

And then realize the thesis was written and built by an associate (or, worse an intern).

Don’t always read what the VC writes on the internet - yes, that includes me too (although I can guarantee I don’t have an associate, intern, EA or ghostwriter) 😆 [52 ❤️]

  • Fun to see all the “autonomous” companies being launched in the late few months.. however, even with all the loops, the last mile is still quite hard.

That gap probably shrinks in the next few months! [13 ❤️ 2 🔄]

Dan Shipper (@danshipper)

  • !!!! 🥹 [232 ❤️ 1 🔄]
  • this is good [1533 ❤️ 63 🔄]

Sam Altman (@sama)

  • Here is our current plan for OpenAI: [5834 ❤️ 615 🔄]

Claude (@claudeai)

  • Final stop: Tokyo.

Register to hear directly from the teams behind Claude: [2600 ❤️ 309 🔄]

📝 博客文章


Follow Builders 自动生成 · 2026-06-09