Discussion about this post

User's avatar
Elliott Thornley's avatar

Really great post. I'd guess the thing that's got people worried is not so much the benchmark scores as the stories about Mythos finding exploits in Firefox, OpenBSD, Linux, etc. I don't know if GPT-5.5 could've found those.

Alexander Barry's avatar

Drive by comment on an old post, but I think this shares the (common) confusion between the early pre-release checkpoint of Mythos from Feb/March and April 7th release version that is notably stronger.

Both the CTI-REALM and original AISI results used the early pre-release checkpoint, but this seems to be notably less capable than the April 7th version. AISI's recent updates show the performance of the April 7th version as being notably higher (and above 5.5)

2 more comments...

Ready for more?