GitHub repo JuliusBrussee/caveman — a TypeScript Claude Code / Codex / Gemini / Cursor skill that asks the agent to talk like a caveman. 74,940 stars and 4,230 forks as of 2026-06-20, MIT, 15 releases (latest v1.9.0 on 2026-06-12). Project-published benchmark of 10 real Claude API prompts shows 65% average output-token reduction (range 22–87%); caveman-compress sub-skill cuts 46% of tokens from real memory files. The README's own Important box is the lead caveat: caveman only affects output tokens — thinking/reasoning tokens are untouched.
On 2026-06-18, OpenAI published two health stories: a consumer ChatGPT product/evaluation update built on GPT-5.5 Instant that OpenAI reports as rated higher than physician-written responses on a 3,500-response physician panel and a 71% drop in flagged factuality issues on production health traffic over the last two months; and a peer-reviewed NEJM AI study in which OpenAI o3 Deep Research reanalyzed 376 previously unsolved rare-disease cases at Boston Children's Hospital's Manton Center and surfaced candidate diagnoses for 18 cases (4.8% additional yield) after expert ACMG/AMP review and CLIA-certified confirmation — 7 of 18 were rediscoveries of diagnoses already in public databases. Two stories in one day, two separate artifacts, with a load-bearing clinical boundary: the model did not diagnose any patient, and the retrospective study was on heterogeneous cohorts with unblinded reviewers.
On 2026-06-17, OpenAI published LifeSciBench, a 750-task, 1,062-artifact, 19,020-criterion life-sciences evaluation built with 173 PhD-level scientists and 453 independent reviewers. GPT-Rosalind reports a 36.1% exact pass rate vs 25.7% for GPT-5.5, with the largest gains in Scientific Communication (56.3% → 71.1%, n=9) and Translation (36.8% → 57.7%). The under-reported finding is the artifact-handling gap: GPT-Rosalind drops from 45.1% on text-only tasks to 28.1% on tasks with artifacts or URLs — a 17-percentage-point drop. Design/Optimization (30.7%) and Analysis (30.3%) barely moved. LifeSciBench is a self-report by the model owner, no third-party reproduction exists, and GPT-Rosalind access is gated by a request form. The article leads with the artifact gap, preserves all five load-bearing caveats from the brief, and does not invent head-to-head comparisons against GeneBench or BixBench.
On June 16, 2026, OpenAI published Deployment Simulation, a method that replays anonymized production conversations through candidate models to forecast real-world misbehavior rates before release. Across GPT-5-series Thinking deployments, the technique produced pre-registered predictions with a median 1.5× multiplicative error across 20 categories of undesirable behavior; external auditing with WildChat, a 1M-conversation public dataset, ran 2.44× vs 1.75× error on OpenAI production data. The paper is OpenAI's, independent replication is pending, absolute misbehavior rates are not disclosed, and the harness is not open-sourced.
On June 8, 2026, OpenAI published a Rule 135 announcement disclosing a confidential draft S-1 submission to the SEC. The post says 'we have not decided on timing yet; it may be a while'. SEC EDGAR full-text search between 2026-06-01 and 2026-06-16 returns no OpenAI S-1 or S-1/A filing. This is the gating event for any eventual IPO, not the IPO itself — and the public markers that will signal an imminent listing are not on the record as of June 16, 2026.
On June 11, 2026, OpenAI announces the acquisition of Ona (formerly Gitpod), a cloud-development-environments platform that already runs secure, customer-controlled execution for 2 million developers. The deal gives Codex — used by 5 million people per week, up 400% year-to-date — a persistent runtime for long-running agents inside the customer's cloud. Until close, both companies remain independent; deal terms, closing date, and headcount were not disclosed.
MIT Python library (567-labs) for extracting validated JSON from any LLM via Pydantic models. 13.2k stars, 1.1k forks, v1.15.1 with Bedrock SSRF fix, support for OpenAI, Anthropic, Gemini, Cohere, Ollama, Bedrock, and 15+ other providers. v1.15.2 (May 10, 2026) adds sensitive log redaction.