openai

7 articles

← All topics

caveman: Julius Brussee's terse-output skill
open-sourcetypescriptclaude-codecodexgemini+21

GitHub repo JuliusBrussee/caveman — a TypeScript Claude Code / Codex / Gemini / Cursor skill that asks the agent to talk like a caveman. 74,940 stars and 4,230 forks as of 2026-06-20, MIT, 15 releases (latest v1.9.0 on 2026-06-12). Project-published benchmark of 10 real Claude API prompts shows 65% average output-token reduction (range 22–87%); caveman-compress sub-skill cuts 46% of tokens from real memory files. The README's own Important box is the lead caveat: caveman only affects output tokens — thinking/reasoning tokens are untouched.

OpenAI ships ChatGPT health; o3 re-solves 4.8% of rare
openaichatgptgpt-5-5-instanthealth-airare-disease+11

On 2026-06-18, OpenAI published two health stories: a consumer ChatGPT product/evaluation update built on GPT-5.5 Instant that OpenAI reports as rated higher than physician-written responses on a 3,500-response physician panel and a 71% drop in flagged factuality issues on production health traffic over the last two months; and a peer-reviewed NEJM AI study in which OpenAI o3 Deep Research reanalyzed 376 previously unsolved rare-disease cases at Boston Children's Hospital's Manton Center and surfaced candidate diagnoses for 18 cases (4.8% additional yield) after expert ACMG/AMP review and CLIA-certified confirmation — 7 of 18 were rediscoveries of diagnoses already in public databases. Two stories in one day, two separate artifacts, with a load-bearing clinical boundary: the model did not diagnose any patient, and the retrospective study was on heterogeneous cohorts with unblinded reviewers.

LifeSciBench: GPT-Rosalind 36.1%, artifact gap 17pts
openailifesci-benchlifescibenchlife-sciencesbenchmark+14

On 2026-06-17, OpenAI published LifeSciBench, a 750-task, 1,062-artifact, 19,020-criterion life-sciences evaluation built with 173 PhD-level scientists and 453 independent reviewers. GPT-Rosalind reports a 36.1% exact pass rate vs 25.7% for GPT-5.5, with the largest gains in Scientific Communication (56.3% → 71.1%, n=9) and Translation (36.8% → 57.7%). The under-reported finding is the artifact-handling gap: GPT-Rosalind drops from 45.1% on text-only tasks to 28.1% on tasks with artifacts or URLs — a 17-percentage-point drop. Design/Optimization (30.7%) and Analysis (30.3%) barely moved. LifeSciBench is a self-report by the model owner, no third-party reproduction exists, and GPT-Rosalind access is gated by a request form. The article leads with the artifact gap, preserves all five load-bearing caveats from the brief, and does not invent head-to-head comparisons against GeneBench or BixBench.

OpenAI Deployment Simulation: 1.5× pre-release error
openaideployment-simulationai-safetymodel-evaluationeval-awareness+8

On June 16, 2026, OpenAI published Deployment Simulation, a method that replays anonymized production conversations through candidate models to forecast real-world misbehavior rates before release. Across GPT-5-series Thinking deployments, the technique produced pre-registered predictions with a median 1.5× multiplicative error across 20 categories of undesirable behavior; external auditing with WildChat, a 1M-conversation public dataset, ran 2.44× vs 1.75× error on OpenAI production data. The paper is OpenAI's, independent replication is pending, absolute misbehavior rates are not disclosed, and the harness is not open-sourced.

OpenAI Files Confidential Draft S-1 with the SEC
openaiipos-1secedgar+7

On June 8, 2026, OpenAI published a Rule 135 announcement disclosing a confidential draft S-1 submission to the SEC. The post says 'we have not decided on timing yet; it may be a while'. SEC EDGAR full-text search between 2026-06-01 and 2026-06-16 returns no OpenAI S-1 or S-1/A filing. This is the gating event for any eventual IPO, not the IPO itself — and the public markers that will signal an imminent listing are not on the record as of June 16, 2026.

OpenAI to Acquire Ona: Cloud Runtime for Codex Agents
openaionacodexacquisitionm-and-a+5

On June 11, 2026, OpenAI announces the acquisition of Ona (formerly Gitpod), a cloud-development-environments platform that already runs secure, customer-controlled execution for 2 million developers. The deal gives Codex — used by 5 million people per week, up 400% year-to-date — a persistent runtime for long-running agents inside the customer's cloud. Until close, both companies remain independent; deal terms, closing date, and headcount were not disclosed.

Instructor: Pydantic Structured Outputs for LLMs
open-sourcepythonpydanticstructured-outputsllm+6

MIT Python library (567-labs) for extracting validated JSON from any LLM via Pydantic models. 13.2k stars, 1.1k forks, v1.15.1 with Bedrock SSRF fix, support for OpenAI, Anthropic, Gemini, Cohere, Ollama, Bedrock, and 15+ other providers. v1.15.2 (May 10, 2026) adds sensitive log redaction.