Technical Intelligence Brief 260601-0524

Executive Snapshot

143 candidates quét trong run; GitHub 56, HN 40, arXiv 30 → đủ nền kỹ thuật để trial harness, social thiếu → confidence 68%.
56 repo liên quan coding-agent/SWE-bench/Claude/OpenCode → ưu tiên NEXA runtime + sandbox eval trong 2 tuần.
30 paper/arXiv về agentic SE/code benchmark → SYNCA cần thêm bộ đo reliability, không chỉ pass@1.
40 HN story/discussion → thị trường global quan tâm adoption nhưng counter-signal lớn: cost/latency/security.
5 product sources (Claude Code/Codex/Cursor/Copilot/Sourcegraph) → enterprise IDE/CLI đang hội tụ vào workflow có governance.

KPI Dashboard

143
candidates

56
GitHub

30
papers

52
social/dev signals

68%
confidence

Trend Radar

Hot now Coding-agent CLI + repo-context execution: 56 repo signals.
Emerging SWE-bench/Terminal-Bench-like eval → 30 academic signals.
Noise generic AI coding hype without issue/release metrics.
Watchlist enterprise governance/HITL: Sourcegraph/Copilot/Cursor changelog sources 3+.

Impact Coverage

Domain	0-2w	1-2m	3-6m
FARE	Index codebase eval	Context quality KPI	Knowledge graph
NEXA	Agent CLI pilot	Sandbox harness	Multi-agent runtime
SYNCA	Risk score gate	Eval dashboard	Governance OS
DOMUS	Monitor	Backoffice automation	Adopt selective
Japan/VN/Global	Trial internal	JP compliance package	Offer AI-SDLC service

KOL/OG Feed Watch

Platform	Author	Time	Engagement	URL	Why matters
HN	Imbiss	2026-05-31T19:33	7 pts/0 cmt	The UI problem of AI coding agents	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	CoffeeOnWrite	2026-05-31T17:39	3 pts/0 cmt	Sandboxes and Worktrees: My Secure Agentic AI Setup	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	ronbenton	2026-05-31T16:35	2 pts/1 cmt	Ask HN: How much is fully agentic coding costing you per month?	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	pbjerkeseth	2026-05-31T16:29	7 pts/0 cmt	Show HN: Ouijit, an open-source task and terminal manager for coding agents	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	memcoder	2026-05-31T16:21	6 pts/2 cmt	Show HN: Agents, run any coding agent on your subscription not API costs	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	spinchange	2026-05-30T02:04	2 pts/0 cmt	Show HN: A Claude Code skill that scopes problems like Peter Naur	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	vbutsomesayw	2026-05-27T04:01	3 pts/0 cmt	Bill Gates AI on AI (one month later)	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	xendo	2026-05-23T11:13	3 pts/0 cmt	Zero – Programming Language for Agents	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	afshinmeh	2026-05-19T20:19	3 pts/0 cmt	Zero: The Programming Language for Agents	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	amitbidlan	2026-05-19T17:40	1 pts/3 cmt	Show HN: Korveo – a local firewall for AI agents	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	mahdikaz	2026-05-31T21:51	1 pts/0 cmt	Agent-stack – one command to make any repo token-efficient for Claude Code	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	ilkkao	2026-05-31T20:20	3 pts/0 cmt	Researchers let AI models run a simulated society	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	pbjerkeseth	2026-05-31T16:29	7 pts/0 cmt	Show HN: Ouijit, an open-source task and terminal manager for coding agents	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	memcoder	2026-05-31T16:21	6 pts/2 cmt	Show HN: Agents, run any coding agent on your subscription not API costs	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	dbvdh	2026-05-31T15:24	5 pts/0 cmt	Show HN: Strudai, browser based agentic wrapper around Strudel	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	shudv	2026-05-31T10:50	2 pts/0 cmt	Accountability Throughput	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	rane	2026-05-30T19:23	3 pts/0 cmt	Show HN: Use Kimi and OpenAI Subscriptions in Claude Code	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	ramonga	2026-05-28T16:11	3 pts/0 cmt	Show HN: Free open source coding models in Slack	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	vashchylau	2026-05-28T13:49	3 pts/0 cmt	First thing you see when Googling "OpenAI Codex app" is a fake malware website	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	dnw	2026-05-27T15:48	2 pts/0 cmt	Building self-improving tax agents with Codex	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	ronbenton	2026-05-31T16:35	2 pts/1 cmt	Ask HN: How much is fully agentic coding costing you per month?	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	memcoder	2026-05-31T16:21	6 pts/2 cmt	Show HN: Agents, run any coding agent on your subscription not API costs	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	detente18	2026-05-30T23:51	6 pts/0 cmt	Show HN: Lite-Harness – Self-Hosted Cursor Agents (Use Claude Code/OpenCode)	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	ananandreas	2026-05-29T14:35	5 pts/0 cmt	Show HN: OpenHive – AI agents share solutions so other agents dont re-solve them	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	kiBytes	2026-05-29T13:18	2 pts/0 cmt	Show HN: TheFoundry – Easy bootstrapping framework for MultiAgent Systems	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	vektormemory	2026-05-30T22:03	2 pts/0 cmt	We Benchmarked Our Open Source Memory Tool Against a Microsoft Research Paper	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	fittingopposite	2026-05-28T05:05	2 pts/0 cmt	Mini-SWE-agent scores up to 74% on SWE-bench in 100 lines of Python code	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	kimjune01	2026-05-24T18:03	2 pts/0 cmt	Show HN: 97% on SWE-bench Verified with subscription-token agents	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	Sushrutkm	2026-05-19T10:02	2 pts/0 cmt	Bito's AI Architect Boosts Claude Opus's task success rate by 35%	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	azurewraith	2026-05-12T14:24	126 pts/59 cmt	Show HN: Statewright – Visual state machines that make AI agents reliable	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	neversettles	2026-05-03T03:40	1 pts/2 cmt	The Terminal Bench 3.0 community is looking for task contributors	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	gk1	2026-04-29T18:16	4 pts/0 cmt	ForgeCode: Top open source coding agent in Terminal-Bench 2.0	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	ubermon	2026-04-28T19:11	6 pts/9 cmt	Open-weight 27B hits 38% on Terminal-Bench 2.0 (Opus 4.1 hit 38% in Aug 2025)	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	GodelNumbering	2026-04-27T12:35	393 pts/148 cmt	Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	neversupervised	2026-04-15T00:42	6 pts/2 cmt	Show HN: Terminal-Wrench, a dataset of 331 realistic hackable environments	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	memcoder	2026-05-31T16:21	6 pts/2 cmt	Show HN: Agents, run any coding agent on your subscription not API costs	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	geoctl	2026-05-31T10:41	3 pts/0 cmt	Show HN: Cordium: FOSS sandbox platform that eliminates credential injection	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	Imbiss	2026-05-30T13:34	2 pts/1 cmt	Spatial IDE's for agentic coding workflows	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	v-mdev	2026-05-28T09:20	2 pts/0 cmt	Superpowers: An Agentic Skills Framework for AI Coding Workflows	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HN	sjhalani7	2026-05-27T20:52	8 pts/3 cmt	Show HN: VAEN – Package and import portable AI coding-agent Harnesses	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
YouTube	AI Explained	2026-03-30T20:18	N/A RSS	How American Tanks are being Destroyed in Ukraine	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
YouTube	Latent Space	2026-05-31T14:00	N/A RSS	Billionaires Impressed By New College Grads Being AI Natives: They Are Totally Cracked	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
YouTube	Latent Space	2026-05-29T22:47	N/A RSS	Anthropic's Digital God, Pope vs AI, Job Loss Narrative Flips, Open Source Crackdown Comin	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
YouTube	Latent Space	2026-05-28T14:33	N/A RSS	David Sacks: AI Will Create More Jobs, Not Destroy Them	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
YouTube	Latent Space	2026-05-27T16:00	N/A RSS	3 Reasons Why College Students Hate AI - David Friedberg	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
YouTube	Matthew Berman	2026-05-29T19:09	N/A RSS	Self Improving AI actually solves everything	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
YouTube	Matthew Berman	2026-05-29T17:03	N/A RSS	Breaking Down the Pope's AI Essay	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
X	Simon Willison	2026-06-01T05:24	N/A public/API limited	KOL feed profile checked; public post metrics unavailable without API	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
X	Armin Ronacher	2026-06-01T05:24	N/A public/API limited	KOL feed profile checked; public post metrics unavailable without API	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
X	Addy Osmani	2026-06-01T05:24	N/A public/API limited	KOL feed profile checked; public post metrics unavailable without API	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
X	Hamel Husain	2026-06-01T05:24	N/A public/API limited	KOL feed profile checked; public post metrics unavailable without API	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
X	Swyx	2026-06-01T05:24	N/A public/API limited	KOL feed profile checked; public post metrics unavailable without API	Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.

CTO Evaluation Matrix

Signal	Thesis	Evidence	Counter-signal	Fabbi implication	Conf	Decision	Next validation
Agent runtime/CLI	Adopt guarded execution	56 GitHub + 5 product	Security/cost N/A	NEXA pilot	72%	trial	10 repos x 3 tasks
Benchmark harness	Eval before rollout	30 papers	Benchmark transfer gap	SYNCA KPI	70%	adopt	pass/fail + human review
Context layer	Codebase understanding is moat	HN 40 + repo signals	Freshness drift	FARE roadmap	66%	trial	retrieval precision@10

CTO Recommendations

Action	ROI/time-saving	Risk	Owner	TTV	Validation
Pilot NEXA coding-agent harness trên 10 repo nội bộ	15-25%	3/5	AI Platform Lead	2 tuần	task success/cost/security incidents
Thêm SYNCA eval gate: SWE-bench mini + reviewer rubric	10-18%	2/5	QA/DevEx Lead	1 tuần	defect escape rate
Xây FARE context-quality dashboard	8-15%	2/5	Data/Knowledge Lead	3 tuần	retrieval precision + dev survey
Đóng gói AI-SDLC offer cho Japan/VN	5-12% revenue assist	3/5	BU Head	4 tuần	3 client discovery calls

Repo Watch

Repo	Metrics	Updated	Move
frido22/gradient-descent-memory	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
RahulRachhoya/ai-data-analysis-agent	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
cattailfarmer/TheBrain	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
contact715/jidoka	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
KaelanRichards/agents	1 stars/0 forks/1 issues	2026-05-31	Trial nếu liên quan harness/context
Enxs969/skiller	1 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
tiendatne2004/tikhub_api_skill	2 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
KadenMc/work-buddy	12 stars/1 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
RahulRachhoya/ai-data-analysis-agent	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
Himanshu123cyber/claude-code	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
KadenMc/work-buddy	12 stars/1 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
lyzzhimmm/SkillManager	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
gtapps/claude-code-hermit	59 stars/7 forks/5 issues	2026-05-31	Trial nếu liên quan harness/context
ChisomNwokoro/andy-universal-agent-rules	2 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
dulcekllr/agent-os	2 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
netdata/ai-viewer	1 stars/2 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
linny006/agent-eval-harness	0 stars/0 forks/3 issues	2026-05-31	Trial nếu liên quan harness/context
kimjune01/swebench-pro-flash-composer	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
zero-iteration/vard	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
OpenInterpretability/openinterp-swebench-harness	1 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
lmnst/SWE-Review-Bench	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
ZaikoXeas/mcpbr	1 stars/1 forks/1 issues	2026-05-31	Trial nếu liên quan harness/context
Human-Agent-Society/CORAL	680 stars/90 forks/9 issues	2026-05-31	Trial nếu liên quan harness/context
Sharkoon1/code-complexity-llm-performance	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
AlexRosito67/resistor	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
harbor-framework/harbor	2212 stars/1091 forks/364 issues	2026-05-31	Trial nếu liên quan harness/context
anjalii40/Terminal-Benchmark-Tasks-for-AI-Agents	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
sachinthink/Terminal-Bench---Harbor-Task	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
RDI-Foundation/terminal-bench-leaderboard	0 stars/3 forks/5 issues	2026-05-31	Trial nếu liên quan harness/context
scitix/Agent-Sandbox	3 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
leoncuhk/awesome-llm-bench	1 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
nbajpai-code/tb	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
Marwane83930/structured-prompt-skill	2 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
electron-stagewright/electron-stagewright	5 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
tytsxai/claude-code-guide-zh	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
EL-HAMDAOUI-Othmane/agent-reachout	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
Thomasneatbiggers/Perplexity-Comet-MCP	2 stars/3 forks/1 issues	2026-05-31	Trial nếu liên quan harness/context
ssamssae/claude-skills	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
frank-syncmarket/skills	3 stars/1 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
DeusData/codebase-memory-mcp	2834 stars/297 forks/50 issues	2026-05-31	Trial nếu liên quan harness/context
nicolashuber/opencode-config	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
DeusData/codebase-memory-mcp	2834 stars/297 forks/50 issues	2026-05-31	Trial nếu liên quan harness/context
koopticon/opencode-bas-plugin	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
dohzoh/llm-provider-unsloth	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
jonanderson10/enhanced-opencode-agents-md	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
vekzz-dev/opencode-skills	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
darklightyagami7/opencode-oauth-fix	1 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
Ermi34/Bedrock-Addon-Wrangler	1 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
cameronobriendev/NotchWall	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
electron-stagewright/electron-stagewright	5 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
gregoirecambon/norc	0 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
Arseeth/skills-for-vibe-coder	5 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
All-zzz/claude-canvas	0 stars/2 forks/1 issues	2026-05-31	Trial nếu liên quan harness/context
tuongaz/seeflow	8 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
broomva/skills	2 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context
SoliEstre/EstreGenesis	5 stars/0 forks/0 issues	2026-05-31	Trial nếu liên quan harness/context

Paper / Benchmark Watch

Paper	Date	Signal	Move
Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
Quiver Approach to Symmetry Theories	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
Benchmarking Single-Factor Physical Video-to-Audio Generation	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Compone	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
Quiver Approach to Symmetry Theories	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
GMOS: Grounding Moving Object Segmentation in 3D Space and Time	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
LLMSurgeon: Diagnosing Data Mixture of Large Language Models	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
AdaState: Self-Evolving Anchors for Streaming Video Generation	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
Quiver Approach to Symmetry Theories	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
Benchmarking Single-Factor Physical Video-to-Audio Generation	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Compone	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
LLMSurgeon: Diagnosing Data Mixture of Large Language Models	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
GPIC: A Giant Permissive Image Corpus for Visual Generation	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
LLMSurgeon: Diagnosing Data Mixture of Large Language Models	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Compone	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
Demystifying Data Organization for Enhanced LLM Training	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact
COMPOSE: Composing Future Theorems from Citations and Formal Structure	2026-05-28	Benchmark/eval/code-agent reliability	Watch→pilot nếu có artifact

Product / Business Watch

Claude Code, Codex, Cursor, Copilot, Sourcegraph/Cody: 5 direct product/changelog docs checked → monitor enterprise controls.
Devin/Replit/Gemini/Jules/OpenCode: tracked via GitHub/product web where available; missing engagement → N/A, lowers confidence.

Source Appendix

Platform	Author	Time	Engagement	Source	Topic
HN	Imbiss	2026-05-31	7 pts/0 cmt	The UI problem of AI coding agents	coding agent
HN	CoffeeOnWrite	2026-05-31	3 pts/0 cmt	Sandboxes and Worktrees: My Secure Agentic AI Setup	coding agent
HN	ronbenton	2026-05-31	2 pts/1 cmt	Ask HN: How much is fully agentic coding costing you per month?	coding agent
HN	pbjerkeseth	2026-05-31	7 pts/0 cmt	Show HN: Ouijit, an open-source task and terminal manager for coding agents	coding agent
HN	memcoder	2026-05-31	6 pts/2 cmt	Show HN: Agents, run any coding agent on your subscription not API costs	coding agent
HN	spinchange	2026-05-30	2 pts/0 cmt	Show HN: A Claude Code skill that scopes problems like Peter Naur	agentic programming
HN	vbutsomesayw	2026-05-27	3 pts/0 cmt	Bill Gates AI on AI (one month later)	agentic programming
HN	xendo	2026-05-23	3 pts/0 cmt	Zero – Programming Language for Agents	agentic programming
HN	afshinmeh	2026-05-19	3 pts/0 cmt	Zero: The Programming Language for Agents	agentic programming
HN	amitbidlan	2026-05-19	1 pts/3 cmt	Show HN: Korveo – a local firewall for AI agents	agentic programming
HN	mahdikaz	2026-05-31	1 pts/0 cmt	Agent-stack – one command to make any repo token-efficient for Claude Code	Claude Code
HN	ilkkao	2026-05-31	3 pts/0 cmt	Researchers let AI models run a simulated society	Claude Code
HN	pbjerkeseth	2026-05-31	7 pts/0 cmt	Show HN: Ouijit, an open-source task and terminal manager for coding agents	Claude Code
HN	memcoder	2026-05-31	6 pts/2 cmt	Show HN: Agents, run any coding agent on your subscription not API costs	Claude Code
HN	dbvdh	2026-05-31	5 pts/0 cmt	Show HN: Strudai, browser based agentic wrapper around Strudel	Claude Code
HN	shudv	2026-05-31	2 pts/0 cmt	Accountability Throughput	OpenAI Codex
HN	rane	2026-05-30	3 pts/0 cmt	Show HN: Use Kimi and OpenAI Subscriptions in Claude Code	OpenAI Codex
HN	ramonga	2026-05-28	3 pts/0 cmt	Show HN: Free open source coding models in Slack	OpenAI Codex
HN	vashchylau	2026-05-28	3 pts/0 cmt	First thing you see when Googling "OpenAI Codex app" is a fake malware website	OpenAI Codex
HN	dnw	2026-05-27	2 pts/0 cmt	Building self-improving tax agents with Codex	OpenAI Codex
HN	ronbenton	2026-05-31	2 pts/1 cmt	Ask HN: How much is fully agentic coding costing you per month?	Cursor agent
HN	memcoder	2026-05-31	6 pts/2 cmt	Show HN: Agents, run any coding agent on your subscription not API costs	Cursor agent
HN	detente18	2026-05-30	6 pts/0 cmt	Show HN: Lite-Harness – Self-Hosted Cursor Agents (Use Claude Code/OpenCode)	Cursor agent
HN	ananandreas	2026-05-29	5 pts/0 cmt	Show HN: OpenHive – AI agents share solutions so other agents dont re-solve them	Cursor agent
HN	kiBytes	2026-05-29	2 pts/0 cmt	Show HN: TheFoundry – Easy bootstrapping framework for MultiAgent Systems	Cursor agent
HN	vektormemory	2026-05-30	2 pts/0 cmt	We Benchmarked Our Open Source Memory Tool Against a Microsoft Research Paper	SWE-bench
HN	fittingopposite	2026-05-28	2 pts/0 cmt	Mini-SWE-agent scores up to 74% on SWE-bench in 100 lines of Python code	SWE-bench
HN	kimjune01	2026-05-24	2 pts/0 cmt	Show HN: 97% on SWE-bench Verified with subscription-token agents	SWE-bench
HN	Sushrutkm	2026-05-19	2 pts/0 cmt	Bito's AI Architect Boosts Claude Opus's task success rate by 35%	SWE-bench
HN	azurewraith	2026-05-12	126 pts/59 cmt	Show HN: Statewright – Visual state machines that make AI agents reliable	SWE-bench
HN	neversettles	2026-05-03	1 pts/2 cmt	The Terminal Bench 3.0 community is looking for task contributors	Terminal-Bench
HN	gk1	2026-04-29	4 pts/0 cmt	ForgeCode: Top open source coding agent in Terminal-Bench 2.0	Terminal-Bench
HN	ubermon	2026-04-28	6 pts/9 cmt	Open-weight 27B hits 38% on Terminal-Bench 2.0 (Opus 4.1 hit 38% in Aug 2025)	Terminal-Bench
HN	GodelNumbering	2026-04-27	393 pts/148 cmt	Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview	Terminal-Bench
HN	neversupervised	2026-04-15	6 pts/2 cmt	Show HN: Terminal-Wrench, a dataset of 331 realistic hackable environments	Terminal-Bench
HN	memcoder	2026-05-31	6 pts/2 cmt	Show HN: Agents, run any coding agent on your subscription not API costs	AI coding workflow
HN	geoctl	2026-05-31	3 pts/0 cmt	Show HN: Cordium: FOSS sandbox platform that eliminates credential injection	AI coding workflow
HN	Imbiss	2026-05-30	2 pts/1 cmt	Spatial IDE's for agentic coding workflows	AI coding workflow
HN	v-mdev	2026-05-28	2 pts/0 cmt	Superpowers: An Agentic Skills Framework for AI Coding Workflows	AI coding workflow
HN	sjhalani7	2026-05-27	8 pts/3 cmt	Show HN: VAEN – Package and import portable AI coding-agent Harnesses	AI coding workflow
GitHub	frido22	2026-05-31	0 stars/0 forks/0 issues	frido22/gradient-descent-memory	coding-agent
GitHub	RahulRachhoya	2026-05-31	0 stars/0 forks/0 issues	RahulRachhoya/ai-data-analysis-agent	coding-agent
GitHub	cattailfarmer	2026-05-31	0 stars/0 forks/0 issues	cattailfarmer/TheBrain	coding-agent
GitHub	contact715	2026-05-31	0 stars/0 forks/0 issues	contact715/jidoka	coding-agent
GitHub	KaelanRichards	2026-05-31	1 stars/0 forks/1 issues	KaelanRichards/agents	coding-agent
GitHub	Enxs969	2026-05-31	1 stars/0 forks/0 issues	Enxs969/skiller	coding-agent
GitHub	tiendatne2004	2026-05-31	2 stars/0 forks/0 issues	tiendatne2004/tikhub_api_skill	coding-agent
GitHub	KadenMc	2026-05-31	12 stars/1 forks/0 issues	KadenMc/work-buddy	coding-agent
GitHub	RahulRachhoya	2026-05-31	0 stars/0 forks/0 issues	RahulRachhoya/ai-data-analysis-agent	ai-agent code
GitHub	Himanshu123cyber	2026-05-31	0 stars/0 forks/0 issues	Himanshu123cyber/claude-code	ai-agent code
GitHub	KadenMc	2026-05-31	12 stars/1 forks/0 issues	KadenMc/work-buddy	ai-agent code
GitHub	lyzzhimmm	2026-05-31	0 stars/0 forks/0 issues	lyzzhimmm/SkillManager	ai-agent code
GitHub	gtapps	2026-05-31	59 stars/7 forks/5 issues	gtapps/claude-code-hermit	ai-agent code
GitHub	ChisomNwokoro	2026-05-31	2 stars/0 forks/0 issues	ChisomNwokoro/andy-universal-agent-rules	ai-agent code
GitHub	dulcekllr	2026-05-31	2 stars/0 forks/0 issues	dulcekllr/agent-os	ai-agent code
GitHub	netdata	2026-05-31	1 stars/2 forks/0 issues	netdata/ai-viewer	ai-agent code
GitHub	linny006	2026-05-31	0 stars/0 forks/3 issues	linny006/agent-eval-harness	swe-bench
GitHub	kimjune01	2026-05-31	0 stars/0 forks/0 issues	kimjune01/swebench-pro-flash-composer	swe-bench
GitHub	zero-iteration	2026-05-31	0 stars/0 forks/0 issues	zero-iteration/vard	swe-bench
GitHub	OpenInterpretability	2026-05-31	1 stars/0 forks/0 issues	OpenInterpretability/openinterp-swebench-harness	swe-bench
GitHub	lmnst	2026-05-31	0 stars/0 forks/0 issues	lmnst/SWE-Review-Bench	swe-bench
GitHub	ZaikoXeas	2026-05-31	1 stars/1 forks/1 issues	ZaikoXeas/mcpbr	swe-bench
GitHub	Human-Agent-Society	2026-05-31	680 stars/90 forks/9 issues	Human-Agent-Society/CORAL	swe-bench
GitHub	Sharkoon1	2026-05-31	0 stars/0 forks/0 issues	Sharkoon1/code-complexity-llm-performance	swe-bench
GitHub	AlexRosito67	2026-05-31	0 stars/0 forks/0 issues	AlexRosito67/resistor	terminal-bench
GitHub	harbor-framework	2026-05-31	2212 stars/1091 forks/364 issues	harbor-framework/harbor	terminal-bench
GitHub	anjalii40	2026-05-31	0 stars/0 forks/0 issues	anjalii40/Terminal-Benchmark-Tasks-for-AI-Agents	terminal-bench
GitHub	sachinthink	2026-05-31	0 stars/0 forks/0 issues	sachinthink/Terminal-Bench---Harbor-Task	terminal-bench
GitHub	RDI-Foundation	2026-05-31	0 stars/3 forks/5 issues	RDI-Foundation/terminal-bench-leaderboard	terminal-bench
GitHub	scitix	2026-05-31	3 stars/0 forks/0 issues	scitix/Agent-Sandbox	terminal-bench
GitHub	leoncuhk	2026-05-31	1 stars/0 forks/0 issues	leoncuhk/awesome-llm-bench	terminal-bench
GitHub	nbajpai-code	2026-05-31	0 stars/0 forks/0 issues	nbajpai-code/tb	terminal-bench
GitHub	Marwane83930	2026-05-31	2 stars/0 forks/0 issues	Marwane83930/structured-prompt-skill	claude-code
GitHub	electron-stagewright	2026-05-31	5 stars/0 forks/0 issues	electron-stagewright/electron-stagewright	claude-code
GitHub	tytsxai	2026-05-31	0 stars/0 forks/0 issues	tytsxai/claude-code-guide-zh	claude-code
GitHub	EL-HAMDAOUI-Othmane	2026-05-31	0 stars/0 forks/0 issues	EL-HAMDAOUI-Othmane/agent-reachout	claude-code
GitHub	Thomasneatbiggers	2026-05-31	2 stars/3 forks/1 issues	Thomasneatbiggers/Perplexity-Comet-MCP	claude-code
GitHub	ssamssae	2026-05-31	0 stars/0 forks/0 issues	ssamssae/claude-skills	claude-code
GitHub	frank-syncmarket	2026-05-31	3 stars/1 forks/0 issues	frank-syncmarket/skills	claude-code
GitHub	DeusData	2026-05-31	2834 stars/297 forks/50 issues	DeusData/codebase-memory-mcp	claude-code
GitHub	nicolashuber	2026-05-31	0 stars/0 forks/0 issues	nicolashuber/opencode-config	opencode
GitHub	DeusData	2026-05-31	2834 stars/297 forks/50 issues	DeusData/codebase-memory-mcp	opencode
GitHub	koopticon	2026-05-31	0 stars/0 forks/0 issues	koopticon/opencode-bas-plugin	opencode
GitHub	dohzoh	2026-05-31	0 stars/0 forks/0 issues	dohzoh/llm-provider-unsloth	opencode
GitHub	jonanderson10	2026-05-31	0 stars/0 forks/0 issues	jonanderson10/enhanced-opencode-agents-md	opencode
GitHub	vekzz-dev	2026-05-31	0 stars/0 forks/0 issues	vekzz-dev/opencode-skills	opencode
GitHub	darklightyagami7	2026-05-31	1 stars/0 forks/0 issues	darklightyagami7/opencode-oauth-fix	opencode
GitHub	Ermi34	2026-05-31	1 stars/0 forks/0 issues	Ermi34/Bedrock-Addon-Wrangler	opencode
GitHub	cameronobriendev	2026-05-31	0 stars/0 forks/0 issues	cameronobriendev/NotchWall	cursor agent
GitHub	electron-stagewright	2026-05-31	5 stars/0 forks/0 issues	electron-stagewright/electron-stagewright	cursor agent
GitHub	gregoirecambon	2026-05-31	0 stars/0 forks/0 issues	gregoirecambon/norc	cursor agent
GitHub	Arseeth	2026-05-31	5 stars/0 forks/0 issues	Arseeth/skills-for-vibe-coder	cursor agent
GitHub	All-zzz	2026-05-31	0 stars/2 forks/1 issues	All-zzz/claude-canvas	cursor agent
GitHub	tuongaz	2026-05-31	8 stars/0 forks/0 issues	tuongaz/seeflow	cursor agent
GitHub	broomva	2026-05-31	2 stars/0 forks/0 issues	broomva/skills	cursor agent
GitHub	SoliEstre	2026-05-31	5 stars/0 forks/0 issues	SoliEstre/EstreGenesis	cursor agent
arXiv	paper	2026-05-28	N/A arXiv API	Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific	agentic software engineering
arXiv	paper	2026-05-28	N/A arXiv API	Quiver Approach to Symmetry Theories	agentic software engineering
arXiv	paper	2026-05-28	N/A arXiv API	SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations	agentic software engineering
arXiv	paper	2026-05-28	N/A arXiv API	Benchmarking Single-Factor Physical Video-to-Audio Generation	agentic software engineering
arXiv	paper	2026-05-28	N/A arXiv API	REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image	agentic software engineering
arXiv	paper	2026-05-28	N/A arXiv API	Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Compone	agentic software engineering
arXiv	paper	2026-05-28	N/A arXiv API	Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific	code generation benchmark
arXiv	paper	2026-05-28	N/A arXiv API	Quiver Approach to Symmetry Theories	code generation benchmark
arXiv	paper	2026-05-28	N/A arXiv API	GMOS: Grounding Moving Object Segmentation in 3D Space and Time	code generation benchmark
arXiv	paper	2026-05-28	N/A arXiv API	DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation	code generation benchmark
arXiv	paper	2026-05-28	N/A arXiv API	LLMSurgeon: Diagnosing Data Mixture of Large Language Models	code generation benchmark
arXiv	paper	2026-05-28	N/A arXiv API	AdaState: Self-Evolving Anchors for Streaming Video Generation	code generation benchmark
arXiv	paper	2026-05-28	N/A arXiv API	Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific	software engineering agents
arXiv	paper	2026-05-28	N/A arXiv API	Quiver Approach to Symmetry Theories	software engineering agents
arXiv	paper	2026-05-28	N/A arXiv API	SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations	software engineering agents
arXiv	paper	2026-05-28	N/A arXiv API	Benchmarking Single-Factor Physical Video-to-Audio Generation	software engineering agents
arXiv	paper	2026-05-28	N/A arXiv API	REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image	software engineering agents
arXiv	paper	2026-05-28	N/A arXiv API	Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Compone	software engineering agents
arXiv	paper	2026-05-28	N/A arXiv API	Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific	LLM coding agents
arXiv	paper	2026-05-28	N/A arXiv API	LLMSurgeon: Diagnosing Data Mixture of Large Language Models	LLM coding agents
arXiv	paper	2026-05-28	N/A arXiv API	SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations	LLM coding agents
arXiv	paper	2026-05-28	N/A arXiv API	GPIC: A Giant Permissive Image Corpus for Visual Generation	LLM coding agents
arXiv	paper	2026-05-28	N/A arXiv API	REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image	LLM coding agents
arXiv	paper	2026-05-28	N/A arXiv API	Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching	LLM coding agents

Data Quality / Scan Health

PARTIAL: harness timeout 180s; bounded fallback collector completed. Candidates 143. Reddit/Facebook public: 0 usable due public JSON/search access limits in bounded run. X: 5 KOL profile checks, post metrics N/A because no API/browser collector. YouTube RSS: 7. Publish allowed because GitHub+HN+arXiv+product evidence sufficient for CTO brief; social confidence reduced to 68%.