Lab Notes / V.1

Applied research,
documented in public.

Field observations from the studio. Studio Futuro publishes hypotheses, experiments and measurable results drawn from enterprise client work. The research function is integral to positioning, and the documentation remains open.

Note / 01·Apr 2026·Studio Futuro Lab·Agentic

Superpowers and the plugin ecosystem as operating standard

Six weeks of adopting Jesse Vincent's Superpowers plugin, combined with twelve proprietary skills (brainstorming, TDD, systematic debugging, code review, documentation), have reshaped the studio's operating workflow. Skills activate contextually, replacing static long-form prompts. Measurements on recurring tasks: −38% token consumption, −22% iterations before merge.

Takeaway

Skills are to coding agents what pure functions are to code: composable, testable, versioned. A pattern that repeats three times must be extracted. Monolithic prompt engineering gives way to behavior libraries.

Note / 02·Apr 2026·Studio Futuro Lab·Agentic

Semantic router for specialized agents

The single-generalist-agent architecture has been replaced by seven specialized agents (frontend, backend, SQL, testing, security, documentation, review) orchestrated through a Haiku classifier. Across 300 real client tasks, routing selects the correct agent 91% of the time; blind output-quality review +23%, total cost −31%. Classifier cost remains below one-tenth of the dispatched workload.

Takeaway

A compact, fast model that routes to a specialist outperforms an extended model operating as a soloist. Model tiering — Haiku for routing, Sonnet for execution — is the reference pattern for production workloads.

Note / 03·Apr 2026·Studio Futuro·DevOps

DevOps skills for business process automation

A client had a tenant onboarding articulated in seven manual steps: provisioning, DNS, CRM, billing, team notification, first deploy, welcome email. A composite skill now orchestrates the full chain in four minutes. Across 80 real onboardings: zero errors, −94% operational time, full git traceability.

Takeaway

Skills are not a developer-only artifact. They define a new format for corporate procedures: readable by an operator, executable by an agent, versioned as code. Every repetitive internal process is a skill awaiting formalization.

Note / 04·Apr 2026·Studio Futuro·Product

Quack 2: the agent router as a product

Quack 2 was released publicly after five months of internal use. The release's distinguishing element: the agent router is no longer hidden but constitutes the core of the interface. The user observes routing decisions in real time — to Claude, to local Mistral, to a specialized agent. Explicit transparency on orchestrator choices.

Takeaway

Concealing the agent's decision process no longer constitutes a product advantage: it generates trust debt. The enterprise user requires understanding of decisions, not mere execution. Exposed routing defines a new UX paradigm for AI tools.

Note / 05·Mar 2026·Studio Futuro Lab·Agentic

Quack Brain: the knowledge graph fed by agents

Agents operating within Quack read and write the shared Second Brain in every session: gotchas, patterns, decisions, project diary. Three months after release, the graph counts 1,847 interconnected nodes. Observed metrics: −58% recurring errors, −41% repeated questions to the user. Memorization is selective, not exhaustive.

Takeaway

Agent memory does not coincide with a passively populated vector database. It consists of deliberate writing: rules, gotchas, decisions, code breadcrumbs. An agent that identifies what not to remember outperforms one that memorizes everything.

Note / 06·Mar 2026·Studio Futuro·Research

Local LLMs for repetitive enterprise workloads

Production deployment of Mistral 7B and Qwen 2.5 7B on on-prem A10G GPUs for document classification, structured data extraction, and internal draft generation. Observed quality at 92% relative to Sonnet 4.6 on these specific workloads, latency −65%, cost per million tokens reduced from €3.10 to €0.12. Complex reasoning tasks remain routed to the frontier API.

Takeaway

The open vs closed debate is superseded by the hybrid routing paradigm. High-volume repetitive workloads reside on-prem or self-hosted; deep reasoning remains on frontier APIs. The relevant question is where to draw the line, not which model dominates.

Note / 07·Mar 2026·Studio Futuro·Agentic

Orchestrating Claude Code agents in parallel

Experimentation with five concurrent Claude Code sessions on independent tasks within the same repository. Measured result: 3.7× faster than a single session, with 12% rework attributable to state conflicts. The bottleneck does not reside in the model but in task-split design.

Takeaway

Agent parallelization produces value only in the presence of genuinely independent tasks. Naive file-level locking proves inadequate; semantic orchestration based on the functional domains involved is required.

Note / 08·Feb 2026·Studio Futuro·Philosophy

From vibe coding to agentic coding: the end of an era

The vibe-coding mode — fast chat, quick iterations, immediate shipping — does not meet the requirements of production workloads. Serious work demands structured discipline: formalized brief, plan, tests, review, diary. The transition from magic to engineering separates teams using AI from teams producing value with AI.

Takeaway

AI does not eliminate the craft, it relocates it. The next-generation professional defines problems better, structures context better, verifies output better. AI amplifies design thinking, not improvisation.

Note / 09·Feb 2026·Studio Futuro·Agentic

Agent teams: the shift from writing to shipping

Adoption of Opus 4.6 multi-agent team capabilities to rewrite three features of an enterprise ERP. The team comprises four agents: planner, backend, frontend, reviewer. Speed 2.8× higher than a single agent; more significantly, merge-ready code on the first pass. The write-review cycle is endogenous to the team.

Takeaway

An agent writing without peer review produces mediocre code. A team of agents in mutual peer review produces merge-ready code. Peer review has become a system function, no longer an exclusively human role.

Note / 10·Feb 2026·Studio Futuro Lab·Agentic

Second Brain for AI agents: reducing recurring errors

Formalization of the Second Brain for agents: each session begins by reading project gotchas and patterns, each session ends by writing discoveries to the diary. Across six client projects in production: −63% of known bugs resurface, −40% onboarding time for new agents on the project. This component forms the core of Quack Brain.

Takeaway

Without persistent shared memory, every AI session restarts from zero. With it, the project acquires historical memory and agents cease repeating already-solved errors. The Second Brain defines the threshold between prototyping and production.

Note / 11·Feb 2026·Studio Futuro·Engineering

Automated refactoring on 40k LOC

Internal experiment: Claude was tasked with reducing a legacy ERP from 40k to ~25k LOC while keeping the test suite intact. Seven days of wall-time, 38% reduction, zero failing tests. The model did not rewrite the code: it removed with confidence.

Takeaway

AI proves ineffective at rewriting from scratch. It excels at removing dead code in the presence of a solid test harness. The lever is test coverage, not the model.

Note / 12·Feb 2026·Studio Futuro·Workflow

Project Groups: underused configuration, immediate return

A rarely leveraged Claude Code feature: grouping of related projects with shared context. On a client monorepo composed of four services, enabling cross-session context sharing eliminated architectural duplication in prompts. The agent operates with simultaneous awareness of all services. Productivity doubled within a week.

Takeaway

Many AI optimizations reside not in the model but in tool configuration. Project Groups have existed for months but adoption remains limited. Systematic reading of changelogs, including secondary ones, represents an undervalued competitive advantage.

Note / 13·Jan 2026·Studio Futuro·Product

Quack: public beta launch

After two months of private use, Quack was opened to forty selected beta testers. Objective: to validate the Visual IDE + multi-agent + local routing model outside the studio's boundaries. Positive outcome on the proposition, with a need for UX simplification. New users require an immediate entry point before accessing underlying routers.

Takeaway

A product validated internally is not necessarily validated for market. What appears obvious to the team is the first request from beta testers. The beta serves to unlearn one's own implicit competence.

Note / 14·Jan 2026·Studio Futuro·Engineering

Claude Code plan limits and cost-aware fallback strategy

The studio's agentic workflow had begun exceeding the weekly Claude Code limit. Systematic evaluation of alternative providers for non-critical tasks: DeepSeek, Kimi, Qwen through a unified gateway. On analysis and documentation tasks, quality parity above 85% relative to Sonnet at 1/12 the cost. Cost-aware routing is now a stable component of the pipeline.

Takeaway

Loyalty to a single provider constitutes a luxury, cost-aware routing an operational necessity. Selecting the appropriate model for each individual task exceeds in relevance the selection of the best model overall.

Note / 15·Jan 2026·Studio Futuro Lab·Research

LLM benchmark on Italian enterprise technical language

A dataset of 240 technical Italian prompts was built — fiscal, logistics, regulatory domains — and evaluated across six models. Claude Sonnet and the GPT-4 class lead; open-source models show difficulty with technical terminology and legal nuance.

Takeaway

For Italian enterprise technical language, the open vs closed gap remains significant. On sensitive projects, a commercial LLM with a clear DPA outperforms immature self-hosting.

Note / 16·Dec 2025·Studio Futuro Lab·Research

Eval pipeline: from visual review to structured assessment

The volume of agentic output exceeded manual review capacity. Implementation of an eval pipeline: 120 real tasks, 8-criterion rubric, second agent in judge role. Every new skill or model transits through the pipeline prior to deployment. Evaluation time reduced from two hours to twelve minutes.

Takeaway

As volume increases, visual review fails. Evals constitute the thermometer of agentic quality. They are not an optional lab activity, but a standard production practice.

Note / 17·Nov 2025·Studio Futuro·Strategy

Agentic analytics reading and business model review

Six months of studio site analytics, support sessions and closed contracts were supplied to Claude for analysis. Query: where does actual value creation reside? The output highlighted a service line with high time absorption and low revenue contribution. Two weeks after the analysis, a strategic pivot was formalized.

Takeaway

The data had been available for months, without analysis. AI did not generate new information: it rendered legible what was already present. In some contexts the value of AI does not reside in the answer, but in the formulation of the question.

Note / 18·Oct 2025·Studio Futuro·Product

Quack: genesis of a multi-agent Visual IDE

Launch of Quack as an internal prototype: a visual interface for the simultaneous orchestration of multiple Claude Code agents, with a distinct visual representation per agent. Five client projects managed in parallel within the first week, with no conflicts between sessions.

Takeaway

The interface conditions the workflow to a greater extent than expected. Assigning each agent a distinctive visual representation modifies the mode of work delegation. Usability elements significantly affect real productivity.

Note / 19·Oct 2025·Studio Futuro·Product

Five key patterns for agentic coding

During Quack development, five recurring patterns were identified: parallel sessions, agent visibility, skill sharing, automatic diary, integrated task kanban. Taken individually they appear marginal; integrated, they transform work ergonomics from linear conversation to a multi-thread work surface.

Takeaway

The chat interface was designed for conversational turn-taking. Agentic coding is not conversation: it is multi-thread flow. The emergence of alternative interfaces is a distinguishing trait of 2026.

Note / 20·Sep 2025·Studio Futuro Lab·Research

Claude Code + Obsidian: origin of the Second Brain for agents

The first implementation of the Second Brain consisted of an Obsidian vault read and written by Claude Code in every session. Minimal architecture: markdown, wiki-links, one folder per project. Effective operation. From this foundation, Quack Brain and the studio's entire approach to agentic memory were developed.

Takeaway

Simple tools surpass sophisticated tools when they support a real workflow. Markdown, folders, an agent that reads and writes: a solution less complex than it appears, still underutilized across many enterprise contexts.

Long-form deep dives are published progressively on Medium. Preview requests on specific notes may be submitted via direct contact.

Book a call

Applied research,documented in public.

Superpowers and the plugin ecosystem as operating standard

Semantic router for specialized agents

DevOps skills for business process automation

Quack 2: the agent router as a product

Quack Brain: the knowledge graph fed by agents

Local LLMs for repetitive enterprise workloads

Orchestrating Claude Code agents in parallel

From vibe coding to agentic coding: the end of an era

Agent teams: the shift from writing to shipping

Second Brain for AI agents: reducing recurring errors

Automated refactoring on 40k LOC

Project Groups: underused configuration, immediate return

Quack: public beta launch

Claude Code plan limits and cost-aware fallback strategy

LLM benchmark on Italian enterprise technical language

Eval pipeline: from visual review to structured assessment

Agentic analytics reading and business model review

Quack: genesis of a multi-agent Visual IDE

Five key patterns for agentic coding

Claude Code + Obsidian: origin of the Second Brain for agents

Applied research,
documented in public.