Executive Snapshot
- 143 candidates quét trong run; GitHub 56, HN 40, arXiv 30 → đủ nền kỹ thuật để trial harness, social thiếu → confidence 68%.
- 56 repo liên quan coding-agent/SWE-bench/Claude/OpenCode → ưu tiên NEXA runtime + sandbox eval trong 2 tuần.
- 30 paper/arXiv về agentic SE/code benchmark → SYNCA cần thêm bộ đo reliability, không chỉ pass@1.
- 40 HN story/discussion → thị trường global quan tâm adoption nhưng counter-signal lớn: cost/latency/security.
- 5 product sources (Claude Code/Codex/Cursor/Copilot/Sourcegraph) → enterprise IDE/CLI đang hội tụ vào workflow có governance.
KPI Dashboard
143
candidates
candidates
56
GitHub
GitHub
30
papers
papers
52
social/dev signals
social/dev signals
68%
confidence
confidence
Trend Radar
- Hot now Coding-agent CLI + repo-context execution: 56 repo signals.
- Emerging SWE-bench/Terminal-Bench-like eval → 30 academic signals.
- Noise generic AI coding hype without issue/release metrics.
- Watchlist enterprise governance/HITL: Sourcegraph/Copilot/Cursor changelog sources 3+.
Impact Coverage
| Domain | 0-2w | 1-2m | 3-6m |
|---|---|---|---|
| FARE | Index codebase eval | Context quality KPI | Knowledge graph |
| NEXA | Agent CLI pilot | Sandbox harness | Multi-agent runtime |
| SYNCA | Risk score gate | Eval dashboard | Governance OS |
| DOMUS | Monitor | Backoffice automation | Adopt selective |
| Japan/VN/Global | Trial internal | JP compliance package | Offer AI-SDLC service |
KOL/OG Feed Watch
| Platform | Author | Time | Engagement | URL | Why matters |
|---|---|---|---|---|---|
| HN | Imbiss | 2026-05-31T19:33 | 7 pts/0 cmt | The UI problem of AI coding agents | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | CoffeeOnWrite | 2026-05-31T17:39 | 3 pts/0 cmt | Sandboxes and Worktrees: My Secure Agentic AI Setup | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | ronbenton | 2026-05-31T16:35 | 2 pts/1 cmt | Ask HN: How much is fully agentic coding costing you per month? | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | pbjerkeseth | 2026-05-31T16:29 | 7 pts/0 cmt | Show HN: Ouijit, an open-source task and terminal manager for coding agents | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | memcoder | 2026-05-31T16:21 | 6 pts/2 cmt | Show HN: Agents, run any coding agent on your subscription not API costs | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | spinchange | 2026-05-30T02:04 | 2 pts/0 cmt | Show HN: A Claude Code skill that scopes problems like Peter Naur | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | vbutsomesayw | 2026-05-27T04:01 | 3 pts/0 cmt | Bill Gates AI on AI (one month later) | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | xendo | 2026-05-23T11:13 | 3 pts/0 cmt | Zero – Programming Language for Agents | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | afshinmeh | 2026-05-19T20:19 | 3 pts/0 cmt | Zero: The Programming Language for Agents | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | amitbidlan | 2026-05-19T17:40 | 1 pts/3 cmt | Show HN: Korveo – a local firewall for AI agents | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | mahdikaz | 2026-05-31T21:51 | 1 pts/0 cmt | Agent-stack – one command to make any repo token-efficient for Claude Code | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | ilkkao | 2026-05-31T20:20 | 3 pts/0 cmt | Researchers let AI models run a simulated society | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | pbjerkeseth | 2026-05-31T16:29 | 7 pts/0 cmt | Show HN: Ouijit, an open-source task and terminal manager for coding agents | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | memcoder | 2026-05-31T16:21 | 6 pts/2 cmt | Show HN: Agents, run any coding agent on your subscription not API costs | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | dbvdh | 2026-05-31T15:24 | 5 pts/0 cmt | Show HN: Strudai, browser based agentic wrapper around Strudel | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | shudv | 2026-05-31T10:50 | 2 pts/0 cmt | Accountability Throughput | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | rane | 2026-05-30T19:23 | 3 pts/0 cmt | Show HN: Use Kimi and OpenAI Subscriptions in Claude Code | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | ramonga | 2026-05-28T16:11 | 3 pts/0 cmt | Show HN: Free open source coding models in Slack | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | vashchylau | 2026-05-28T13:49 | 3 pts/0 cmt | First thing you see when Googling "OpenAI Codex app" is a fake malware website | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | dnw | 2026-05-27T15:48 | 2 pts/0 cmt | Building self-improving tax agents with Codex | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | ronbenton | 2026-05-31T16:35 | 2 pts/1 cmt | Ask HN: How much is fully agentic coding costing you per month? | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | memcoder | 2026-05-31T16:21 | 6 pts/2 cmt | Show HN: Agents, run any coding agent on your subscription not API costs | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | detente18 | 2026-05-30T23:51 | 6 pts/0 cmt | Show HN: Lite-Harness – Self-Hosted Cursor Agents (Use Claude Code/OpenCode) | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | ananandreas | 2026-05-29T14:35 | 5 pts/0 cmt | Show HN: OpenHive – AI agents share solutions so other agents dont re-solve them | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | kiBytes | 2026-05-29T13:18 | 2 pts/0 cmt | Show HN: TheFoundry – Easy bootstrapping framework for MultiAgent Systems | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | vektormemory | 2026-05-30T22:03 | 2 pts/0 cmt | We Benchmarked Our Open Source Memory Tool Against a Microsoft Research Paper | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | fittingopposite | 2026-05-28T05:05 | 2 pts/0 cmt | Mini-SWE-agent scores up to 74% on SWE-bench in 100 lines of Python code | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | kimjune01 | 2026-05-24T18:03 | 2 pts/0 cmt | Show HN: 97% on SWE-bench Verified with subscription-token agents | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | Sushrutkm | 2026-05-19T10:02 | 2 pts/0 cmt | Bito's AI Architect Boosts Claude Opus's task success rate by 35% | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | azurewraith | 2026-05-12T14:24 | 126 pts/59 cmt | Show HN: Statewright – Visual state machines that make AI agents reliable | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | neversettles | 2026-05-03T03:40 | 1 pts/2 cmt | The Terminal Bench 3.0 community is looking for task contributors | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | gk1 | 2026-04-29T18:16 | 4 pts/0 cmt | ForgeCode: Top open source coding agent in Terminal-Bench 2.0 | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | ubermon | 2026-04-28T19:11 | 6 pts/9 cmt | Open-weight 27B hits 38% on Terminal-Bench 2.0 (Opus 4.1 hit 38% in Aug 2025) | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | GodelNumbering | 2026-04-27T12:35 | 393 pts/148 cmt | Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | neversupervised | 2026-04-15T00:42 | 6 pts/2 cmt | Show HN: Terminal-Wrench, a dataset of 331 realistic hackable environments | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | memcoder | 2026-05-31T16:21 | 6 pts/2 cmt | Show HN: Agents, run any coding agent on your subscription not API costs | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | geoctl | 2026-05-31T10:41 | 3 pts/0 cmt | Show HN: Cordium: FOSS sandbox platform that eliminates credential injection | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | Imbiss | 2026-05-30T13:34 | 2 pts/1 cmt | Spatial IDE's for agentic coding workflows | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | v-mdev | 2026-05-28T09:20 | 2 pts/0 cmt | Superpowers: An Agentic Skills Framework for AI Coding Workflows | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| HN | sjhalani7 | 2026-05-27T20:52 | 8 pts/3 cmt | Show HN: VAEN – Package and import portable AI coding-agent Harnesses | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| YouTube | AI Explained | 2026-03-30T20:18 | N/A RSS | How American Tanks are being Destroyed in Ukraine | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| YouTube | Latent Space | 2026-05-31T14:00 | N/A RSS | Billionaires Impressed By New College Grads Being AI Natives: They Are Totally Cracked | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| YouTube | Latent Space | 2026-05-29T22:47 | N/A RSS | Anthropic's Digital God, Pope vs AI, Job Loss Narrative Flips, Open Source Crackdown Comin | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| YouTube | Latent Space | 2026-05-28T14:33 | N/A RSS | David Sacks: AI Will Create More Jobs, Not Destroy Them | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| YouTube | Latent Space | 2026-05-27T16:00 | N/A RSS | 3 Reasons Why College Students Hate AI - David Friedberg | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| YouTube | Matthew Berman | 2026-05-29T19:09 | N/A RSS | Self Improving AI actually solves everything | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| YouTube | Matthew Berman | 2026-05-29T17:03 | N/A RSS | Breaking Down the Pope's AI Essay | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| X | Simon Willison | 2026-06-01T05:24 | N/A public/API limited | KOL feed profile checked; public post metrics unavailable without API | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| X | Armin Ronacher | 2026-06-01T05:24 | N/A public/API limited | KOL feed profile checked; public post metrics unavailable without API | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| X | Addy Osmani | 2026-06-01T05:24 | N/A public/API limited | KOL feed profile checked; public post metrics unavailable without API | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| X | Hamel Husain | 2026-06-01T05:24 | N/A public/API limited | KOL feed profile checked; public post metrics unavailable without API | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
| X | Swyx | 2026-06-01T05:24 | N/A public/API limited | KOL feed profile checked; public post metrics unavailable without API | Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence. |
CTO Evaluation Matrix
| Signal | Thesis | Evidence | Counter-signal | Fabbi implication | Conf | Decision | Next validation |
|---|---|---|---|---|---|---|---|
| Agent runtime/CLI | Adopt guarded execution | 56 GitHub + 5 product | Security/cost N/A | NEXA pilot | 72% | trial | 10 repos x 3 tasks |
| Benchmark harness | Eval before rollout | 30 papers | Benchmark transfer gap | SYNCA KPI | 70% | adopt | pass/fail + human review |
| Context layer | Codebase understanding is moat | HN 40 + repo signals | Freshness drift | FARE roadmap | 66% | trial | retrieval precision@10 |
CTO Recommendations
| Action | ROI/time-saving | Risk | Owner | TTV | Validation |
|---|---|---|---|---|---|
| Pilot NEXA coding-agent harness trên 10 repo nội bộ | 15-25% | 3/5 | AI Platform Lead | 2 tuần | task success/cost/security incidents |
| Thêm SYNCA eval gate: SWE-bench mini + reviewer rubric | 10-18% | 2/5 | QA/DevEx Lead | 1 tuần | defect escape rate |
| Xây FARE context-quality dashboard | 8-15% | 2/5 | Data/Knowledge Lead | 3 tuần | retrieval precision + dev survey |
| Đóng gói AI-SDLC offer cho Japan/VN | 5-12% revenue assist | 3/5 | BU Head | 4 tuần | 3 client discovery calls |
Repo Watch
| Repo | Metrics | Updated | Move |
|---|---|---|---|
| frido22/gradient-descent-memory | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| RahulRachhoya/ai-data-analysis-agent | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| cattailfarmer/TheBrain | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| contact715/jidoka | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| KaelanRichards/agents | 1 stars/0 forks/1 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| Enxs969/skiller | 1 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| tiendatne2004/tikhub_api_skill | 2 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| KadenMc/work-buddy | 12 stars/1 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| RahulRachhoya/ai-data-analysis-agent | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| Himanshu123cyber/claude-code | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| KadenMc/work-buddy | 12 stars/1 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| lyzzhimmm/SkillManager | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| gtapps/claude-code-hermit | 59 stars/7 forks/5 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| ChisomNwokoro/andy-universal-agent-rules | 2 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| dulcekllr/agent-os | 2 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| netdata/ai-viewer | 1 stars/2 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| linny006/agent-eval-harness | 0 stars/0 forks/3 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| kimjune01/swebench-pro-flash-composer | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| zero-iteration/vard | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| OpenInterpretability/openinterp-swebench-harness | 1 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| lmnst/SWE-Review-Bench | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| ZaikoXeas/mcpbr | 1 stars/1 forks/1 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| Human-Agent-Society/CORAL | 680 stars/90 forks/9 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| Sharkoon1/code-complexity-llm-performance | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| AlexRosito67/resistor | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| harbor-framework/harbor | 2212 stars/1091 forks/364 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| anjalii40/Terminal-Benchmark-Tasks-for-AI-Agents | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| sachinthink/Terminal-Bench---Harbor-Task | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| RDI-Foundation/terminal-bench-leaderboard | 0 stars/3 forks/5 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| scitix/Agent-Sandbox | 3 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| leoncuhk/awesome-llm-bench | 1 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| nbajpai-code/tb | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| Marwane83930/structured-prompt-skill | 2 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| electron-stagewright/electron-stagewright | 5 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| tytsxai/claude-code-guide-zh | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| EL-HAMDAOUI-Othmane/agent-reachout | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| Thomasneatbiggers/Perplexity-Comet-MCP | 2 stars/3 forks/1 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| ssamssae/claude-skills | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| frank-syncmarket/skills | 3 stars/1 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| DeusData/codebase-memory-mcp | 2834 stars/297 forks/50 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| nicolashuber/opencode-config | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| DeusData/codebase-memory-mcp | 2834 stars/297 forks/50 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| koopticon/opencode-bas-plugin | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| dohzoh/llm-provider-unsloth | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| jonanderson10/enhanced-opencode-agents-md | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| vekzz-dev/opencode-skills | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| darklightyagami7/opencode-oauth-fix | 1 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| Ermi34/Bedrock-Addon-Wrangler | 1 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| cameronobriendev/NotchWall | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| electron-stagewright/electron-stagewright | 5 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| gregoirecambon/norc | 0 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| Arseeth/skills-for-vibe-coder | 5 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| All-zzz/claude-canvas | 0 stars/2 forks/1 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| tuongaz/seeflow | 8 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| broomva/skills | 2 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
| SoliEstre/EstreGenesis | 5 stars/0 forks/0 issues | 2026-05-31 | Trial nếu liên quan harness/context |
Paper / Benchmark Watch
| Paper | Date | Signal | Move |
|---|---|---|---|
| Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| Quiver Approach to Symmetry Theories | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| Benchmarking Single-Factor Physical Video-to-Audio Generation | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Compone | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| Quiver Approach to Symmetry Theories | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| GMOS: Grounding Moving Object Segmentation in 3D Space and Time | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| LLMSurgeon: Diagnosing Data Mixture of Large Language Models | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| AdaState: Self-Evolving Anchors for Streaming Video Generation | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| Quiver Approach to Symmetry Theories | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| Benchmarking Single-Factor Physical Video-to-Audio Generation | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Compone | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| LLMSurgeon: Diagnosing Data Mixture of Large Language Models | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| GPIC: A Giant Permissive Image Corpus for Visual Generation | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| LLMSurgeon: Diagnosing Data Mixture of Large Language Models | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Compone | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| Demystifying Data Organization for Enhanced LLM Training | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
| COMPOSE: Composing Future Theorems from Citations and Formal Structure | 2026-05-28 | Benchmark/eval/code-agent reliability | Watch→pilot nếu có artifact |
Product / Business Watch
- Claude Code, Codex, Cursor, Copilot, Sourcegraph/Cody: 5 direct product/changelog docs checked → monitor enterprise controls.
- Devin/Replit/Gemini/Jules/OpenCode: tracked via GitHub/product web where available; missing engagement → N/A, lowers confidence.
Source Appendix
| Platform | Author | Time | Engagement | Source | Topic |
|---|---|---|---|---|---|
| HN | Imbiss | 2026-05-31 | 7 pts/0 cmt | The UI problem of AI coding agents | coding agent |
| HN | CoffeeOnWrite | 2026-05-31 | 3 pts/0 cmt | Sandboxes and Worktrees: My Secure Agentic AI Setup | coding agent |
| HN | ronbenton | 2026-05-31 | 2 pts/1 cmt | Ask HN: How much is fully agentic coding costing you per month? | coding agent |
| HN | pbjerkeseth | 2026-05-31 | 7 pts/0 cmt | Show HN: Ouijit, an open-source task and terminal manager for coding agents | coding agent |
| HN | memcoder | 2026-05-31 | 6 pts/2 cmt | Show HN: Agents, run any coding agent on your subscription not API costs | coding agent |
| HN | spinchange | 2026-05-30 | 2 pts/0 cmt | Show HN: A Claude Code skill that scopes problems like Peter Naur | agentic programming |
| HN | vbutsomesayw | 2026-05-27 | 3 pts/0 cmt | Bill Gates AI on AI (one month later) | agentic programming |
| HN | xendo | 2026-05-23 | 3 pts/0 cmt | Zero – Programming Language for Agents | agentic programming |
| HN | afshinmeh | 2026-05-19 | 3 pts/0 cmt | Zero: The Programming Language for Agents | agentic programming |
| HN | amitbidlan | 2026-05-19 | 1 pts/3 cmt | Show HN: Korveo – a local firewall for AI agents | agentic programming |
| HN | mahdikaz | 2026-05-31 | 1 pts/0 cmt | Agent-stack – one command to make any repo token-efficient for Claude Code | Claude Code |
| HN | ilkkao | 2026-05-31 | 3 pts/0 cmt | Researchers let AI models run a simulated society | Claude Code |
| HN | pbjerkeseth | 2026-05-31 | 7 pts/0 cmt | Show HN: Ouijit, an open-source task and terminal manager for coding agents | Claude Code |
| HN | memcoder | 2026-05-31 | 6 pts/2 cmt | Show HN: Agents, run any coding agent on your subscription not API costs | Claude Code |
| HN | dbvdh | 2026-05-31 | 5 pts/0 cmt | Show HN: Strudai, browser based agentic wrapper around Strudel | Claude Code |
| HN | shudv | 2026-05-31 | 2 pts/0 cmt | Accountability Throughput | OpenAI Codex |
| HN | rane | 2026-05-30 | 3 pts/0 cmt | Show HN: Use Kimi and OpenAI Subscriptions in Claude Code | OpenAI Codex |
| HN | ramonga | 2026-05-28 | 3 pts/0 cmt | Show HN: Free open source coding models in Slack | OpenAI Codex |
| HN | vashchylau | 2026-05-28 | 3 pts/0 cmt | First thing you see when Googling "OpenAI Codex app" is a fake malware website | OpenAI Codex |
| HN | dnw | 2026-05-27 | 2 pts/0 cmt | Building self-improving tax agents with Codex | OpenAI Codex |
| HN | ronbenton | 2026-05-31 | 2 pts/1 cmt | Ask HN: How much is fully agentic coding costing you per month? | Cursor agent |
| HN | memcoder | 2026-05-31 | 6 pts/2 cmt | Show HN: Agents, run any coding agent on your subscription not API costs | Cursor agent |
| HN | detente18 | 2026-05-30 | 6 pts/0 cmt | Show HN: Lite-Harness – Self-Hosted Cursor Agents (Use Claude Code/OpenCode) | Cursor agent |
| HN | ananandreas | 2026-05-29 | 5 pts/0 cmt | Show HN: OpenHive – AI agents share solutions so other agents dont re-solve them | Cursor agent |
| HN | kiBytes | 2026-05-29 | 2 pts/0 cmt | Show HN: TheFoundry – Easy bootstrapping framework for MultiAgent Systems | Cursor agent |
| HN | vektormemory | 2026-05-30 | 2 pts/0 cmt | We Benchmarked Our Open Source Memory Tool Against a Microsoft Research Paper | SWE-bench |
| HN | fittingopposite | 2026-05-28 | 2 pts/0 cmt | Mini-SWE-agent scores up to 74% on SWE-bench in 100 lines of Python code | SWE-bench |
| HN | kimjune01 | 2026-05-24 | 2 pts/0 cmt | Show HN: 97% on SWE-bench Verified with subscription-token agents | SWE-bench |
| HN | Sushrutkm | 2026-05-19 | 2 pts/0 cmt | Bito's AI Architect Boosts Claude Opus's task success rate by 35% | SWE-bench |
| HN | azurewraith | 2026-05-12 | 126 pts/59 cmt | Show HN: Statewright – Visual state machines that make AI agents reliable | SWE-bench |
| HN | neversettles | 2026-05-03 | 1 pts/2 cmt | The Terminal Bench 3.0 community is looking for task contributors | Terminal-Bench |
| HN | gk1 | 2026-04-29 | 4 pts/0 cmt | ForgeCode: Top open source coding agent in Terminal-Bench 2.0 | Terminal-Bench |
| HN | ubermon | 2026-04-28 | 6 pts/9 cmt | Open-weight 27B hits 38% on Terminal-Bench 2.0 (Opus 4.1 hit 38% in Aug 2025) | Terminal-Bench |
| HN | GodelNumbering | 2026-04-27 | 393 pts/148 cmt | Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview | Terminal-Bench |
| HN | neversupervised | 2026-04-15 | 6 pts/2 cmt | Show HN: Terminal-Wrench, a dataset of 331 realistic hackable environments | Terminal-Bench |
| HN | memcoder | 2026-05-31 | 6 pts/2 cmt | Show HN: Agents, run any coding agent on your subscription not API costs | AI coding workflow |
| HN | geoctl | 2026-05-31 | 3 pts/0 cmt | Show HN: Cordium: FOSS sandbox platform that eliminates credential injection | AI coding workflow |
| HN | Imbiss | 2026-05-30 | 2 pts/1 cmt | Spatial IDE's for agentic coding workflows | AI coding workflow |
| HN | v-mdev | 2026-05-28 | 2 pts/0 cmt | Superpowers: An Agentic Skills Framework for AI Coding Workflows | AI coding workflow |
| HN | sjhalani7 | 2026-05-27 | 8 pts/3 cmt | Show HN: VAEN – Package and import portable AI coding-agent Harnesses | AI coding workflow |
| GitHub | frido22 | 2026-05-31 | 0 stars/0 forks/0 issues | frido22/gradient-descent-memory | coding-agent |
| GitHub | RahulRachhoya | 2026-05-31 | 0 stars/0 forks/0 issues | RahulRachhoya/ai-data-analysis-agent | coding-agent |
| GitHub | cattailfarmer | 2026-05-31 | 0 stars/0 forks/0 issues | cattailfarmer/TheBrain | coding-agent |
| GitHub | contact715 | 2026-05-31 | 0 stars/0 forks/0 issues | contact715/jidoka | coding-agent |
| GitHub | KaelanRichards | 2026-05-31 | 1 stars/0 forks/1 issues | KaelanRichards/agents | coding-agent |
| GitHub | Enxs969 | 2026-05-31 | 1 stars/0 forks/0 issues | Enxs969/skiller | coding-agent |
| GitHub | tiendatne2004 | 2026-05-31 | 2 stars/0 forks/0 issues | tiendatne2004/tikhub_api_skill | coding-agent |
| GitHub | KadenMc | 2026-05-31 | 12 stars/1 forks/0 issues | KadenMc/work-buddy | coding-agent |
| GitHub | RahulRachhoya | 2026-05-31 | 0 stars/0 forks/0 issues | RahulRachhoya/ai-data-analysis-agent | ai-agent code |
| GitHub | Himanshu123cyber | 2026-05-31 | 0 stars/0 forks/0 issues | Himanshu123cyber/claude-code | ai-agent code |
| GitHub | KadenMc | 2026-05-31 | 12 stars/1 forks/0 issues | KadenMc/work-buddy | ai-agent code |
| GitHub | lyzzhimmm | 2026-05-31 | 0 stars/0 forks/0 issues | lyzzhimmm/SkillManager | ai-agent code |
| GitHub | gtapps | 2026-05-31 | 59 stars/7 forks/5 issues | gtapps/claude-code-hermit | ai-agent code |
| GitHub | ChisomNwokoro | 2026-05-31 | 2 stars/0 forks/0 issues | ChisomNwokoro/andy-universal-agent-rules | ai-agent code |
| GitHub | dulcekllr | 2026-05-31 | 2 stars/0 forks/0 issues | dulcekllr/agent-os | ai-agent code |
| GitHub | netdata | 2026-05-31 | 1 stars/2 forks/0 issues | netdata/ai-viewer | ai-agent code |
| GitHub | linny006 | 2026-05-31 | 0 stars/0 forks/3 issues | linny006/agent-eval-harness | swe-bench |
| GitHub | kimjune01 | 2026-05-31 | 0 stars/0 forks/0 issues | kimjune01/swebench-pro-flash-composer | swe-bench |
| GitHub | zero-iteration | 2026-05-31 | 0 stars/0 forks/0 issues | zero-iteration/vard | swe-bench |
| GitHub | OpenInterpretability | 2026-05-31 | 1 stars/0 forks/0 issues | OpenInterpretability/openinterp-swebench-harness | swe-bench |
| GitHub | lmnst | 2026-05-31 | 0 stars/0 forks/0 issues | lmnst/SWE-Review-Bench | swe-bench |
| GitHub | ZaikoXeas | 2026-05-31 | 1 stars/1 forks/1 issues | ZaikoXeas/mcpbr | swe-bench |
| GitHub | Human-Agent-Society | 2026-05-31 | 680 stars/90 forks/9 issues | Human-Agent-Society/CORAL | swe-bench |
| GitHub | Sharkoon1 | 2026-05-31 | 0 stars/0 forks/0 issues | Sharkoon1/code-complexity-llm-performance | swe-bench |
| GitHub | AlexRosito67 | 2026-05-31 | 0 stars/0 forks/0 issues | AlexRosito67/resistor | terminal-bench |
| GitHub | harbor-framework | 2026-05-31 | 2212 stars/1091 forks/364 issues | harbor-framework/harbor | terminal-bench |
| GitHub | anjalii40 | 2026-05-31 | 0 stars/0 forks/0 issues | anjalii40/Terminal-Benchmark-Tasks-for-AI-Agents | terminal-bench |
| GitHub | sachinthink | 2026-05-31 | 0 stars/0 forks/0 issues | sachinthink/Terminal-Bench---Harbor-Task | terminal-bench |
| GitHub | RDI-Foundation | 2026-05-31 | 0 stars/3 forks/5 issues | RDI-Foundation/terminal-bench-leaderboard | terminal-bench |
| GitHub | scitix | 2026-05-31 | 3 stars/0 forks/0 issues | scitix/Agent-Sandbox | terminal-bench |
| GitHub | leoncuhk | 2026-05-31 | 1 stars/0 forks/0 issues | leoncuhk/awesome-llm-bench | terminal-bench |
| GitHub | nbajpai-code | 2026-05-31 | 0 stars/0 forks/0 issues | nbajpai-code/tb | terminal-bench |
| GitHub | Marwane83930 | 2026-05-31 | 2 stars/0 forks/0 issues | Marwane83930/structured-prompt-skill | claude-code |
| GitHub | electron-stagewright | 2026-05-31 | 5 stars/0 forks/0 issues | electron-stagewright/electron-stagewright | claude-code |
| GitHub | tytsxai | 2026-05-31 | 0 stars/0 forks/0 issues | tytsxai/claude-code-guide-zh | claude-code |
| GitHub | EL-HAMDAOUI-Othmane | 2026-05-31 | 0 stars/0 forks/0 issues | EL-HAMDAOUI-Othmane/agent-reachout | claude-code |
| GitHub | Thomasneatbiggers | 2026-05-31 | 2 stars/3 forks/1 issues | Thomasneatbiggers/Perplexity-Comet-MCP | claude-code |
| GitHub | ssamssae | 2026-05-31 | 0 stars/0 forks/0 issues | ssamssae/claude-skills | claude-code |
| GitHub | frank-syncmarket | 2026-05-31 | 3 stars/1 forks/0 issues | frank-syncmarket/skills | claude-code |
| GitHub | DeusData | 2026-05-31 | 2834 stars/297 forks/50 issues | DeusData/codebase-memory-mcp | claude-code |
| GitHub | nicolashuber | 2026-05-31 | 0 stars/0 forks/0 issues | nicolashuber/opencode-config | opencode |
| GitHub | DeusData | 2026-05-31 | 2834 stars/297 forks/50 issues | DeusData/codebase-memory-mcp | opencode |
| GitHub | koopticon | 2026-05-31 | 0 stars/0 forks/0 issues | koopticon/opencode-bas-plugin | opencode |
| GitHub | dohzoh | 2026-05-31 | 0 stars/0 forks/0 issues | dohzoh/llm-provider-unsloth | opencode |
| GitHub | jonanderson10 | 2026-05-31 | 0 stars/0 forks/0 issues | jonanderson10/enhanced-opencode-agents-md | opencode |
| GitHub | vekzz-dev | 2026-05-31 | 0 stars/0 forks/0 issues | vekzz-dev/opencode-skills | opencode |
| GitHub | darklightyagami7 | 2026-05-31 | 1 stars/0 forks/0 issues | darklightyagami7/opencode-oauth-fix | opencode |
| GitHub | Ermi34 | 2026-05-31 | 1 stars/0 forks/0 issues | Ermi34/Bedrock-Addon-Wrangler | opencode |
| GitHub | cameronobriendev | 2026-05-31 | 0 stars/0 forks/0 issues | cameronobriendev/NotchWall | cursor agent |
| GitHub | electron-stagewright | 2026-05-31 | 5 stars/0 forks/0 issues | electron-stagewright/electron-stagewright | cursor agent |
| GitHub | gregoirecambon | 2026-05-31 | 0 stars/0 forks/0 issues | gregoirecambon/norc | cursor agent |
| GitHub | Arseeth | 2026-05-31 | 5 stars/0 forks/0 issues | Arseeth/skills-for-vibe-coder | cursor agent |
| GitHub | All-zzz | 2026-05-31 | 0 stars/2 forks/1 issues | All-zzz/claude-canvas | cursor agent |
| GitHub | tuongaz | 2026-05-31 | 8 stars/0 forks/0 issues | tuongaz/seeflow | cursor agent |
| GitHub | broomva | 2026-05-31 | 2 stars/0 forks/0 issues | broomva/skills | cursor agent |
| GitHub | SoliEstre | 2026-05-31 | 5 stars/0 forks/0 issues | SoliEstre/EstreGenesis | cursor agent |
| arXiv | paper | 2026-05-28 | N/A arXiv API | Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific | agentic software engineering |
| arXiv | paper | 2026-05-28 | N/A arXiv API | Quiver Approach to Symmetry Theories | agentic software engineering |
| arXiv | paper | 2026-05-28 | N/A arXiv API | SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations | agentic software engineering |
| arXiv | paper | 2026-05-28 | N/A arXiv API | Benchmarking Single-Factor Physical Video-to-Audio Generation | agentic software engineering |
| arXiv | paper | 2026-05-28 | N/A arXiv API | REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image | agentic software engineering |
| arXiv | paper | 2026-05-28 | N/A arXiv API | Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Compone | agentic software engineering |
| arXiv | paper | 2026-05-28 | N/A arXiv API | Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific | code generation benchmark |
| arXiv | paper | 2026-05-28 | N/A arXiv API | Quiver Approach to Symmetry Theories | code generation benchmark |
| arXiv | paper | 2026-05-28 | N/A arXiv API | GMOS: Grounding Moving Object Segmentation in 3D Space and Time | code generation benchmark |
| arXiv | paper | 2026-05-28 | N/A arXiv API | DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation | code generation benchmark |
| arXiv | paper | 2026-05-28 | N/A arXiv API | LLMSurgeon: Diagnosing Data Mixture of Large Language Models | code generation benchmark |
| arXiv | paper | 2026-05-28 | N/A arXiv API | AdaState: Self-Evolving Anchors for Streaming Video Generation | code generation benchmark |
| arXiv | paper | 2026-05-28 | N/A arXiv API | Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific | software engineering agents |
| arXiv | paper | 2026-05-28 | N/A arXiv API | Quiver Approach to Symmetry Theories | software engineering agents |
| arXiv | paper | 2026-05-28 | N/A arXiv API | SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations | software engineering agents |
| arXiv | paper | 2026-05-28 | N/A arXiv API | Benchmarking Single-Factor Physical Video-to-Audio Generation | software engineering agents |
| arXiv | paper | 2026-05-28 | N/A arXiv API | REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image | software engineering agents |
| arXiv | paper | 2026-05-28 | N/A arXiv API | Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Compone | software engineering agents |
| arXiv | paper | 2026-05-28 | N/A arXiv API | Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific | LLM coding agents |
| arXiv | paper | 2026-05-28 | N/A arXiv API | LLMSurgeon: Diagnosing Data Mixture of Large Language Models | LLM coding agents |
| arXiv | paper | 2026-05-28 | N/A arXiv API | SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations | LLM coding agents |
| arXiv | paper | 2026-05-28 | N/A arXiv API | GPIC: A Giant Permissive Image Corpus for Visual Generation | LLM coding agents |
| arXiv | paper | 2026-05-28 | N/A arXiv API | REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image | LLM coding agents |
| arXiv | paper | 2026-05-28 | N/A arXiv API | Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching | LLM coding agents |
Data Quality / Scan Health
PARTIAL: harness timeout 180s; bounded fallback collector completed. Candidates 143. Reddit/Facebook public: 0 usable due public JSON/search access limits in bounded run. X: 5 KOL profile checks, post metrics N/A because no API/browser collector. YouTube RSS: 7. Publish allowed because GitHub+HN+arXiv+product evidence sufficient for CTO brief; social confidence reduced to 68%.