Technical Intelligence Brief

LLM/Coding Agents/Harness Engineering — 2026-06-01 05:26
Fabbi CTO/CDXO
QUALITY_GATE_PARTIAL

Executive Snapshot

KPI Dashboard

143
candidates
56
GitHub
30
papers
52
social/dev signals
68%
confidence

Trend Radar

  • Hot now Coding-agent CLI + repo-context execution: 56 repo signals.
  • Emerging SWE-bench/Terminal-Bench-like eval → 30 academic signals.
  • Noise generic AI coding hype without issue/release metrics.
  • Watchlist enterprise governance/HITL: Sourcegraph/Copilot/Cursor changelog sources 3+.

Impact Coverage

Domain0-2w1-2m3-6m
FAREIndex codebase evalContext quality KPIKnowledge graph
NEXAAgent CLI pilotSandbox harnessMulti-agent runtime
SYNCARisk score gateEval dashboardGovernance OS
DOMUSMonitorBackoffice automationAdopt selective
Japan/VN/GlobalTrial internalJP compliance packageOffer AI-SDLC service

KOL/OG Feed Watch

PlatformAuthorTimeEngagementURLWhy matters
HNImbiss2026-05-31T19:337 pts/0 cmtThe UI problem of AI coding agentsTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNCoffeeOnWrite2026-05-31T17:393 pts/0 cmtSandboxes and Worktrees: My Secure Agentic AI SetupTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNronbenton2026-05-31T16:352 pts/1 cmtAsk HN: How much is fully agentic coding costing you per month?Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNpbjerkeseth2026-05-31T16:297 pts/0 cmtShow HN: Ouijit, an open-source task and terminal manager for coding agentsTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNmemcoder2026-05-31T16:216 pts/2 cmtShow HN: Agents, run any coding agent on your subscription not API costsTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNspinchange2026-05-30T02:042 pts/0 cmtShow HN: A Claude Code skill that scopes problems like Peter NaurTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNvbutsomesayw2026-05-27T04:013 pts/0 cmtBill Gates AI on AI (one month later)Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNxendo2026-05-23T11:133 pts/0 cmtZero – Programming Language for AgentsTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNafshinmeh2026-05-19T20:193 pts/0 cmtZero: The Programming Language for AgentsTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNamitbidlan2026-05-19T17:401 pts/3 cmtShow HN: Korveo – a local firewall for AI agentsTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNmahdikaz2026-05-31T21:511 pts/0 cmtAgent-stack – one command to make any repo token-efficient for Claude CodeTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNilkkao2026-05-31T20:203 pts/0 cmtResearchers let AI models run a simulated societyTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNpbjerkeseth2026-05-31T16:297 pts/0 cmtShow HN: Ouijit, an open-source task and terminal manager for coding agentsTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNmemcoder2026-05-31T16:216 pts/2 cmtShow HN: Agents, run any coding agent on your subscription not API costsTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNdbvdh2026-05-31T15:245 pts/0 cmtShow HN: Strudai, browser based agentic wrapper around StrudelTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNshudv2026-05-31T10:502 pts/0 cmtAccountability ThroughputTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNrane2026-05-30T19:233 pts/0 cmtShow HN: Use Kimi and OpenAI Subscriptions in Claude CodeTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNramonga2026-05-28T16:113 pts/0 cmtShow HN: Free open source coding models in SlackTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNvashchylau2026-05-28T13:493 pts/0 cmtFirst thing you see when Googling "OpenAI Codex app" is a fake malware websiteTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNdnw2026-05-27T15:482 pts/0 cmtBuilding self-improving tax agents with CodexTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNronbenton2026-05-31T16:352 pts/1 cmtAsk HN: How much is fully agentic coding costing you per month?Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNmemcoder2026-05-31T16:216 pts/2 cmtShow HN: Agents, run any coding agent on your subscription not API costsTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNdetente182026-05-30T23:516 pts/0 cmtShow HN: Lite-Harness – Self-Hosted Cursor Agents (Use Claude Code/OpenCode)Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNananandreas2026-05-29T14:355 pts/0 cmtShow HN: OpenHive – AI agents share solutions so other agents dont re-solve themTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNkiBytes2026-05-29T13:182 pts/0 cmtShow HN: TheFoundry – Easy bootstrapping framework for MultiAgent SystemsTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNvektormemory2026-05-30T22:032 pts/0 cmtWe Benchmarked Our Open Source Memory Tool Against a Microsoft Research PaperTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNfittingopposite2026-05-28T05:052 pts/0 cmtMini-SWE-agent scores up to 74% on SWE-bench in 100 lines of Python codeTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNkimjune012026-05-24T18:032 pts/0 cmtShow HN: 97% on SWE-bench Verified with subscription-token agentsTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNSushrutkm2026-05-19T10:022 pts/0 cmtBito's AI Architect Boosts Claude Opus's task success rate by 35%Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNazurewraith2026-05-12T14:24126 pts/59 cmtShow HN: Statewright – Visual state machines that make AI agents reliableTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNneversettles2026-05-03T03:401 pts/2 cmtThe Terminal Bench 3.0 community is looking for task contributorsTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNgk12026-04-29T18:164 pts/0 cmtForgeCode: Top open source coding agent in Terminal-Bench 2.0Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNubermon2026-04-28T19:116 pts/9 cmtOpen-weight 27B hits 38% on Terminal-Bench 2.0 (Opus 4.1 hit 38% in Aug 2025)Tín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNGodelNumbering2026-04-27T12:35393 pts/148 cmtShow HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-previewTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNneversupervised2026-04-15T00:426 pts/2 cmtShow HN: Terminal-Wrench, a dataset of 331 realistic hackable environmentsTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNmemcoder2026-05-31T16:216 pts/2 cmtShow HN: Agents, run any coding agent on your subscription not API costsTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNgeoctl2026-05-31T10:413 pts/0 cmtShow HN: Cordium: FOSS sandbox platform that eliminates credential injectionTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNImbiss2026-05-30T13:342 pts/1 cmtSpatial IDE's for agentic coding workflowsTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNv-mdev2026-05-28T09:202 pts/0 cmtSuperpowers: An Agentic Skills Framework for AI Coding WorkflowsTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
HNsjhalani72026-05-27T20:528 pts/3 cmtShow HN: VAEN – Package and import portable AI coding-agent HarnessesTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
YouTubeAI Explained2026-03-30T20:18N/A RSSHow American Tanks are being Destroyed in UkraineTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
YouTubeLatent Space2026-05-31T14:00N/A RSSBillionaires Impressed By New College Grads Being AI Natives: They Are Totally CrackedTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
YouTubeLatent Space2026-05-29T22:47N/A RSSAnthropic's Digital God, Pope vs AI, Job Loss Narrative Flips, Open Source Crackdown CominTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
YouTubeLatent Space2026-05-28T14:33N/A RSSDavid Sacks: AI Will Create More Jobs, Not Destroy ThemTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
YouTubeLatent Space2026-05-27T16:00N/A RSS3 Reasons Why College Students Hate AI - David FriedbergTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
YouTubeMatthew Berman2026-05-29T19:09N/A RSSSelf Improving AI actually solves everythingTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
YouTubeMatthew Berman2026-05-29T17:03N/A RSSBreaking Down the Pope's AI EssayTín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
XSimon Willison2026-06-01T05:24N/A public/API limitedKOL feed profile checked; public post metrics unavailable without APITín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
XArmin Ronacher2026-06-01T05:24N/A public/API limitedKOL feed profile checked; public post metrics unavailable without APITín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
XAddy Osmani2026-06-01T05:24N/A public/API limitedKOL feed profile checked; public post metrics unavailable without APITín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
XHamel Husain2026-06-01T05:24N/A public/API limitedKOL feed profile checked; public post metrics unavailable without APITín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.
XSwyx2026-06-01T05:24N/A public/API limitedKOL feed profile checked; public post metrics unavailable without APITín hiệu CTO: theo dõi runtime/eval/adoption; metric thiếu → giảm confidence.

CTO Evaluation Matrix

SignalThesisEvidenceCounter-signalFabbi implicationConfDecisionNext validation
Agent runtime/CLIAdopt guarded execution56 GitHub + 5 productSecurity/cost N/ANEXA pilot72%trial10 repos x 3 tasks
Benchmark harnessEval before rollout30 papersBenchmark transfer gapSYNCA KPI70%adoptpass/fail + human review
Context layerCodebase understanding is moatHN 40 + repo signalsFreshness driftFARE roadmap66%trialretrieval precision@10

CTO Recommendations

ActionROI/time-savingRiskOwnerTTVValidation
Pilot NEXA coding-agent harness trên 10 repo nội bộ15-25%3/5AI Platform Lead2 tuầntask success/cost/security incidents
Thêm SYNCA eval gate: SWE-bench mini + reviewer rubric10-18%2/5QA/DevEx Lead1 tuầndefect escape rate
Xây FARE context-quality dashboard8-15%2/5Data/Knowledge Lead3 tuầnretrieval precision + dev survey
Đóng gói AI-SDLC offer cho Japan/VN5-12% revenue assist3/5BU Head4 tuần3 client discovery calls

Repo Watch

RepoMetricsUpdatedMove
frido22/gradient-descent-memory0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
RahulRachhoya/ai-data-analysis-agent0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
cattailfarmer/TheBrain0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
contact715/jidoka0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
KaelanRichards/agents1 stars/0 forks/1 issues2026-05-31Trial nếu liên quan harness/context
Enxs969/skiller1 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
tiendatne2004/tikhub_api_skill2 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
KadenMc/work-buddy12 stars/1 forks/0 issues2026-05-31Trial nếu liên quan harness/context
RahulRachhoya/ai-data-analysis-agent0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
Himanshu123cyber/claude-code0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
KadenMc/work-buddy12 stars/1 forks/0 issues2026-05-31Trial nếu liên quan harness/context
lyzzhimmm/SkillManager0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
gtapps/claude-code-hermit59 stars/7 forks/5 issues2026-05-31Trial nếu liên quan harness/context
ChisomNwokoro/andy-universal-agent-rules2 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
dulcekllr/agent-os2 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
netdata/ai-viewer1 stars/2 forks/0 issues2026-05-31Trial nếu liên quan harness/context
linny006/agent-eval-harness0 stars/0 forks/3 issues2026-05-31Trial nếu liên quan harness/context
kimjune01/swebench-pro-flash-composer0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
zero-iteration/vard0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
OpenInterpretability/openinterp-swebench-harness1 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
lmnst/SWE-Review-Bench0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
ZaikoXeas/mcpbr1 stars/1 forks/1 issues2026-05-31Trial nếu liên quan harness/context
Human-Agent-Society/CORAL680 stars/90 forks/9 issues2026-05-31Trial nếu liên quan harness/context
Sharkoon1/code-complexity-llm-performance0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
AlexRosito67/resistor0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
harbor-framework/harbor2212 stars/1091 forks/364 issues2026-05-31Trial nếu liên quan harness/context
anjalii40/Terminal-Benchmark-Tasks-for-AI-Agents0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
sachinthink/Terminal-Bench---Harbor-Task0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
RDI-Foundation/terminal-bench-leaderboard0 stars/3 forks/5 issues2026-05-31Trial nếu liên quan harness/context
scitix/Agent-Sandbox3 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
leoncuhk/awesome-llm-bench1 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
nbajpai-code/tb0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
Marwane83930/structured-prompt-skill2 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
electron-stagewright/electron-stagewright5 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
tytsxai/claude-code-guide-zh0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
EL-HAMDAOUI-Othmane/agent-reachout0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
Thomasneatbiggers/Perplexity-Comet-MCP2 stars/3 forks/1 issues2026-05-31Trial nếu liên quan harness/context
ssamssae/claude-skills0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
frank-syncmarket/skills3 stars/1 forks/0 issues2026-05-31Trial nếu liên quan harness/context
DeusData/codebase-memory-mcp2834 stars/297 forks/50 issues2026-05-31Trial nếu liên quan harness/context
nicolashuber/opencode-config0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
DeusData/codebase-memory-mcp2834 stars/297 forks/50 issues2026-05-31Trial nếu liên quan harness/context
koopticon/opencode-bas-plugin0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
dohzoh/llm-provider-unsloth0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
jonanderson10/enhanced-opencode-agents-md0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
vekzz-dev/opencode-skills0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
darklightyagami7/opencode-oauth-fix1 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
Ermi34/Bedrock-Addon-Wrangler1 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
cameronobriendev/NotchWall0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
electron-stagewright/electron-stagewright5 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
gregoirecambon/norc0 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
Arseeth/skills-for-vibe-coder5 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
All-zzz/claude-canvas0 stars/2 forks/1 issues2026-05-31Trial nếu liên quan harness/context
tuongaz/seeflow8 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
broomva/skills2 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context
SoliEstre/EstreGenesis5 stars/0 forks/0 issues2026-05-31Trial nếu liên quan harness/context

Paper / Benchmark Watch

PaperDateSignalMove
Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
Quiver Approach to Symmetry Theories2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
Benchmarking Single-Factor Physical Video-to-Audio Generation2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Compone2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
Quiver Approach to Symmetry Theories2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
GMOS: Grounding Moving Object Segmentation in 3D Space and Time2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
LLMSurgeon: Diagnosing Data Mixture of Large Language Models2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
AdaState: Self-Evolving Anchors for Streaming Video Generation2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
Quiver Approach to Symmetry Theories2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
Benchmarking Single-Factor Physical Video-to-Audio Generation2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Compone2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
LLMSurgeon: Diagnosing Data Mixture of Large Language Models2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
GPIC: A Giant Permissive Image Corpus for Visual Generation2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
LLMSurgeon: Diagnosing Data Mixture of Large Language Models2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Compone2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
Demystifying Data Organization for Enhanced LLM Training2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact
COMPOSE: Composing Future Theorems from Citations and Formal Structure2026-05-28Benchmark/eval/code-agent reliabilityWatch→pilot nếu có artifact

Product / Business Watch

Source Appendix

PlatformAuthorTimeEngagementSourceTopic
HNImbiss2026-05-317 pts/0 cmtThe UI problem of AI coding agentscoding agent
HNCoffeeOnWrite2026-05-313 pts/0 cmtSandboxes and Worktrees: My Secure Agentic AI Setupcoding agent
HNronbenton2026-05-312 pts/1 cmtAsk HN: How much is fully agentic coding costing you per month?coding agent
HNpbjerkeseth2026-05-317 pts/0 cmtShow HN: Ouijit, an open-source task and terminal manager for coding agentscoding agent
HNmemcoder2026-05-316 pts/2 cmtShow HN: Agents, run any coding agent on your subscription not API costscoding agent
HNspinchange2026-05-302 pts/0 cmtShow HN: A Claude Code skill that scopes problems like Peter Nauragentic programming
HNvbutsomesayw2026-05-273 pts/0 cmtBill Gates AI on AI (one month later)agentic programming
HNxendo2026-05-233 pts/0 cmtZero – Programming Language for Agentsagentic programming
HNafshinmeh2026-05-193 pts/0 cmtZero: The Programming Language for Agentsagentic programming
HNamitbidlan2026-05-191 pts/3 cmtShow HN: Korveo – a local firewall for AI agentsagentic programming
HNmahdikaz2026-05-311 pts/0 cmtAgent-stack – one command to make any repo token-efficient for Claude CodeClaude Code
HNilkkao2026-05-313 pts/0 cmtResearchers let AI models run a simulated societyClaude Code
HNpbjerkeseth2026-05-317 pts/0 cmtShow HN: Ouijit, an open-source task and terminal manager for coding agentsClaude Code
HNmemcoder2026-05-316 pts/2 cmtShow HN: Agents, run any coding agent on your subscription not API costsClaude Code
HNdbvdh2026-05-315 pts/0 cmtShow HN: Strudai, browser based agentic wrapper around StrudelClaude Code
HNshudv2026-05-312 pts/0 cmtAccountability ThroughputOpenAI Codex
HNrane2026-05-303 pts/0 cmtShow HN: Use Kimi and OpenAI Subscriptions in Claude CodeOpenAI Codex
HNramonga2026-05-283 pts/0 cmtShow HN: Free open source coding models in SlackOpenAI Codex
HNvashchylau2026-05-283 pts/0 cmtFirst thing you see when Googling "OpenAI Codex app" is a fake malware websiteOpenAI Codex
HNdnw2026-05-272 pts/0 cmtBuilding self-improving tax agents with CodexOpenAI Codex
HNronbenton2026-05-312 pts/1 cmtAsk HN: How much is fully agentic coding costing you per month?Cursor agent
HNmemcoder2026-05-316 pts/2 cmtShow HN: Agents, run any coding agent on your subscription not API costsCursor agent
HNdetente182026-05-306 pts/0 cmtShow HN: Lite-Harness – Self-Hosted Cursor Agents (Use Claude Code/OpenCode)Cursor agent
HNananandreas2026-05-295 pts/0 cmtShow HN: OpenHive – AI agents share solutions so other agents dont re-solve themCursor agent
HNkiBytes2026-05-292 pts/0 cmtShow HN: TheFoundry – Easy bootstrapping framework for MultiAgent SystemsCursor agent
HNvektormemory2026-05-302 pts/0 cmtWe Benchmarked Our Open Source Memory Tool Against a Microsoft Research PaperSWE-bench
HNfittingopposite2026-05-282 pts/0 cmtMini-SWE-agent scores up to 74% on SWE-bench in 100 lines of Python codeSWE-bench
HNkimjune012026-05-242 pts/0 cmtShow HN: 97% on SWE-bench Verified with subscription-token agentsSWE-bench
HNSushrutkm2026-05-192 pts/0 cmtBito's AI Architect Boosts Claude Opus's task success rate by 35%SWE-bench
HNazurewraith2026-05-12126 pts/59 cmtShow HN: Statewright – Visual state machines that make AI agents reliableSWE-bench
HNneversettles2026-05-031 pts/2 cmtThe Terminal Bench 3.0 community is looking for task contributorsTerminal-Bench
HNgk12026-04-294 pts/0 cmtForgeCode: Top open source coding agent in Terminal-Bench 2.0Terminal-Bench
HNubermon2026-04-286 pts/9 cmtOpen-weight 27B hits 38% on Terminal-Bench 2.0 (Opus 4.1 hit 38% in Aug 2025)Terminal-Bench
HNGodelNumbering2026-04-27393 pts/148 cmtShow HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-previewTerminal-Bench
HNneversupervised2026-04-156 pts/2 cmtShow HN: Terminal-Wrench, a dataset of 331 realistic hackable environmentsTerminal-Bench
HNmemcoder2026-05-316 pts/2 cmtShow HN: Agents, run any coding agent on your subscription not API costsAI coding workflow
HNgeoctl2026-05-313 pts/0 cmtShow HN: Cordium: FOSS sandbox platform that eliminates credential injectionAI coding workflow
HNImbiss2026-05-302 pts/1 cmtSpatial IDE's for agentic coding workflowsAI coding workflow
HNv-mdev2026-05-282 pts/0 cmtSuperpowers: An Agentic Skills Framework for AI Coding WorkflowsAI coding workflow
HNsjhalani72026-05-278 pts/3 cmtShow HN: VAEN – Package and import portable AI coding-agent HarnessesAI coding workflow
GitHubfrido222026-05-310 stars/0 forks/0 issuesfrido22/gradient-descent-memorycoding-agent
GitHubRahulRachhoya2026-05-310 stars/0 forks/0 issuesRahulRachhoya/ai-data-analysis-agentcoding-agent
GitHubcattailfarmer2026-05-310 stars/0 forks/0 issuescattailfarmer/TheBraincoding-agent
GitHubcontact7152026-05-310 stars/0 forks/0 issuescontact715/jidokacoding-agent
GitHubKaelanRichards2026-05-311 stars/0 forks/1 issuesKaelanRichards/agentscoding-agent
GitHubEnxs9692026-05-311 stars/0 forks/0 issuesEnxs969/skillercoding-agent
GitHubtiendatne20042026-05-312 stars/0 forks/0 issuestiendatne2004/tikhub_api_skillcoding-agent
GitHubKadenMc2026-05-3112 stars/1 forks/0 issuesKadenMc/work-buddycoding-agent
GitHubRahulRachhoya2026-05-310 stars/0 forks/0 issuesRahulRachhoya/ai-data-analysis-agentai-agent code
GitHubHimanshu123cyber2026-05-310 stars/0 forks/0 issuesHimanshu123cyber/claude-codeai-agent code
GitHubKadenMc2026-05-3112 stars/1 forks/0 issuesKadenMc/work-buddyai-agent code
GitHublyzzhimmm2026-05-310 stars/0 forks/0 issueslyzzhimmm/SkillManagerai-agent code
GitHubgtapps2026-05-3159 stars/7 forks/5 issuesgtapps/claude-code-hermitai-agent code
GitHubChisomNwokoro2026-05-312 stars/0 forks/0 issuesChisomNwokoro/andy-universal-agent-rulesai-agent code
GitHubdulcekllr2026-05-312 stars/0 forks/0 issuesdulcekllr/agent-osai-agent code
GitHubnetdata2026-05-311 stars/2 forks/0 issuesnetdata/ai-viewerai-agent code
GitHublinny0062026-05-310 stars/0 forks/3 issueslinny006/agent-eval-harnessswe-bench
GitHubkimjune012026-05-310 stars/0 forks/0 issueskimjune01/swebench-pro-flash-composerswe-bench
GitHubzero-iteration2026-05-310 stars/0 forks/0 issueszero-iteration/vardswe-bench
GitHubOpenInterpretability2026-05-311 stars/0 forks/0 issuesOpenInterpretability/openinterp-swebench-harnessswe-bench
GitHublmnst2026-05-310 stars/0 forks/0 issueslmnst/SWE-Review-Benchswe-bench
GitHubZaikoXeas2026-05-311 stars/1 forks/1 issuesZaikoXeas/mcpbrswe-bench
GitHubHuman-Agent-Society2026-05-31680 stars/90 forks/9 issuesHuman-Agent-Society/CORALswe-bench
GitHubSharkoon12026-05-310 stars/0 forks/0 issuesSharkoon1/code-complexity-llm-performanceswe-bench
GitHubAlexRosito672026-05-310 stars/0 forks/0 issuesAlexRosito67/resistorterminal-bench
GitHubharbor-framework2026-05-312212 stars/1091 forks/364 issuesharbor-framework/harborterminal-bench
GitHubanjalii402026-05-310 stars/0 forks/0 issuesanjalii40/Terminal-Benchmark-Tasks-for-AI-Agentsterminal-bench
GitHubsachinthink2026-05-310 stars/0 forks/0 issuessachinthink/Terminal-Bench---Harbor-Taskterminal-bench
GitHubRDI-Foundation2026-05-310 stars/3 forks/5 issuesRDI-Foundation/terminal-bench-leaderboardterminal-bench
GitHubscitix2026-05-313 stars/0 forks/0 issuesscitix/Agent-Sandboxterminal-bench
GitHubleoncuhk2026-05-311 stars/0 forks/0 issuesleoncuhk/awesome-llm-benchterminal-bench
GitHubnbajpai-code2026-05-310 stars/0 forks/0 issuesnbajpai-code/tbterminal-bench
GitHubMarwane839302026-05-312 stars/0 forks/0 issuesMarwane83930/structured-prompt-skillclaude-code
GitHubelectron-stagewright2026-05-315 stars/0 forks/0 issueselectron-stagewright/electron-stagewrightclaude-code
GitHubtytsxai2026-05-310 stars/0 forks/0 issuestytsxai/claude-code-guide-zhclaude-code
GitHubEL-HAMDAOUI-Othmane2026-05-310 stars/0 forks/0 issuesEL-HAMDAOUI-Othmane/agent-reachoutclaude-code
GitHubThomasneatbiggers2026-05-312 stars/3 forks/1 issuesThomasneatbiggers/Perplexity-Comet-MCPclaude-code
GitHubssamssae2026-05-310 stars/0 forks/0 issuesssamssae/claude-skillsclaude-code
GitHubfrank-syncmarket2026-05-313 stars/1 forks/0 issuesfrank-syncmarket/skillsclaude-code
GitHubDeusData2026-05-312834 stars/297 forks/50 issuesDeusData/codebase-memory-mcpclaude-code
GitHubnicolashuber2026-05-310 stars/0 forks/0 issuesnicolashuber/opencode-configopencode
GitHubDeusData2026-05-312834 stars/297 forks/50 issuesDeusData/codebase-memory-mcpopencode
GitHubkoopticon2026-05-310 stars/0 forks/0 issueskoopticon/opencode-bas-pluginopencode
GitHubdohzoh2026-05-310 stars/0 forks/0 issuesdohzoh/llm-provider-unslothopencode
GitHubjonanderson102026-05-310 stars/0 forks/0 issuesjonanderson10/enhanced-opencode-agents-mdopencode
GitHubvekzz-dev2026-05-310 stars/0 forks/0 issuesvekzz-dev/opencode-skillsopencode
GitHubdarklightyagami72026-05-311 stars/0 forks/0 issuesdarklightyagami7/opencode-oauth-fixopencode
GitHubErmi342026-05-311 stars/0 forks/0 issuesErmi34/Bedrock-Addon-Wrangleropencode
GitHubcameronobriendev2026-05-310 stars/0 forks/0 issuescameronobriendev/NotchWallcursor agent
GitHubelectron-stagewright2026-05-315 stars/0 forks/0 issueselectron-stagewright/electron-stagewrightcursor agent
GitHubgregoirecambon2026-05-310 stars/0 forks/0 issuesgregoirecambon/norccursor agent
GitHubArseeth2026-05-315 stars/0 forks/0 issuesArseeth/skills-for-vibe-codercursor agent
GitHubAll-zzz2026-05-310 stars/2 forks/1 issuesAll-zzz/claude-canvascursor agent
GitHubtuongaz2026-05-318 stars/0 forks/0 issuestuongaz/seeflowcursor agent
GitHubbroomva2026-05-312 stars/0 forks/0 issuesbroomva/skillscursor agent
GitHubSoliEstre2026-05-315 stars/0 forks/0 issuesSoliEstre/EstreGenesiscursor agent
arXivpaper2026-05-28N/A arXiv APIPhysics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientificagentic software engineering
arXivpaper2026-05-28N/A arXiv APIQuiver Approach to Symmetry Theoriesagentic software engineering
arXivpaper2026-05-28N/A arXiv APISchGen: PCB Schematic Generation with Semantic-Grounded Code Representationsagentic software engineering
arXivpaper2026-05-28N/A arXiv APIBenchmarking Single-Factor Physical Video-to-Audio Generationagentic software engineering
arXivpaper2026-05-28N/A arXiv APIREST3D: Reconstructing Physically Stable 3D Scenes from a Single Imageagentic software engineering
arXivpaper2026-05-28N/A arXiv APILocally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Componeagentic software engineering
arXivpaper2026-05-28N/A arXiv APIPhysics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientificcode generation benchmark
arXivpaper2026-05-28N/A arXiv APIQuiver Approach to Symmetry Theoriescode generation benchmark
arXivpaper2026-05-28N/A arXiv APIGMOS: Grounding Moving Object Segmentation in 3D Space and Timecode generation benchmark
arXivpaper2026-05-28N/A arXiv APIDynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representationcode generation benchmark
arXivpaper2026-05-28N/A arXiv APILLMSurgeon: Diagnosing Data Mixture of Large Language Modelscode generation benchmark
arXivpaper2026-05-28N/A arXiv APIAdaState: Self-Evolving Anchors for Streaming Video Generationcode generation benchmark
arXivpaper2026-05-28N/A arXiv APIPhysics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientificsoftware engineering agents
arXivpaper2026-05-28N/A arXiv APIQuiver Approach to Symmetry Theoriessoftware engineering agents
arXivpaper2026-05-28N/A arXiv APISchGen: PCB Schematic Generation with Semantic-Grounded Code Representationssoftware engineering agents
arXivpaper2026-05-28N/A arXiv APIBenchmarking Single-Factor Physical Video-to-Audio Generationsoftware engineering agents
arXivpaper2026-05-28N/A arXiv APIREST3D: Reconstructing Physically Stable 3D Scenes from a Single Imagesoftware engineering agents
arXivpaper2026-05-28N/A arXiv APILocally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Componesoftware engineering agents
arXivpaper2026-05-28N/A arXiv APIPhysics Is All You Need? A Case Study in Physicist-Supervised AI Development of ScientificLLM coding agents
arXivpaper2026-05-28N/A arXiv APILLMSurgeon: Diagnosing Data Mixture of Large Language ModelsLLM coding agents
arXivpaper2026-05-28N/A arXiv APISchGen: PCB Schematic Generation with Semantic-Grounded Code RepresentationsLLM coding agents
arXivpaper2026-05-28N/A arXiv APIGPIC: A Giant Permissive Image Corpus for Visual GenerationLLM coding agents
arXivpaper2026-05-28N/A arXiv APIREST3D: Reconstructing Physically Stable 3D Scenes from a Single ImageLLM coding agents
arXivpaper2026-05-28N/A arXiv APIEfficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient CachingLLM coding agents

Data Quality / Scan Health

PARTIAL: harness timeout 180s; bounded fallback collector completed. Candidates 143. Reddit/Facebook public: 0 usable due public JSON/search access limits in bounded run. X: 5 KOL profile checks, post metrics N/A because no API/browser collector. YouTube RSS: 7. Publish allowed because GitHub+HN+arXiv+product evidence sufficient for CTO brief; social confidence reduced to 68%.