OpenAI’s new open-weight gpt-oss models set a new bar for health performance relative to their size. The 120b model outperforms all other OpenAI frontier models on HealthBench — GPT-4o, o1, o4-mini — except o3, despite being much smaller.
OpenAI’s new open-weight gpt-oss models set a new bar for health performance relative to their size. The 120b model outperforms all other OpenAI frontier models on HealthBench — GPT-4o, o1, o4-mini — except o3, despite being much smaller.