PUBLISHED

Building an Agentic System with LLM in 2025

Building an Agentic System with LLM in 2025
2025-04-1814 min
FR

Forget about the chatbot that delivers a punchline on demand. In 2025, an agent is not a model that responds. It's an autonomous software actor, designed to interact with an environment, perceive a need, reason about a task, decide, delegate, and trigger actions. We've moved from prompts to intentions, from generation to orchestration. A well-designed agent doesn't just "run an LLM": it thinks, remembers, adapts, cooperates. And above all, it acts within a governed framework.

⚙️ A modern agent is built on 4 technical pillars:

  • Cognitive engine: an LLM like GPT-4o, Claude 3.5, Gemini... It thinks.
  • Orchestrator: a logical layer (OpenAI SDK, LangGraph, AutoGen, Agno...) that decides what, when, and how.
  • Structured memory: summarized, vectorized, injected... It allows the agent to keep track and take a step back.
  • Tools/actions: APIs, internal tools, functions, other agents... The agent moves beyond text to enter the real world.

An agent is therefore more than a brain. It's a brain connected to hands, sensors, a strategy, and sometimes to a team.

🔗 To read before or as a supplement for the curious:

Master advanced prompting for AI agents: To better understand how to structure the intentions and actions of your agents with robust, explicit, and controlled prompts.

AI agent architectures: overview and evolutions: A comprehensive view of current agent architectures: tool chains, self-agents, orchestrators, graphs, to situate your approach in a clear landscape.

🧠 Concrete example: agentic HR onboarding

You want to automate the arrival of a new employee.

  • A planner agent analyzes the profile and defines the stages
  • An IT agent provisions software access
  • A legal agent generates the appropriate contract
  • A coordinator agent checks that everything is complete

Each sub-agent has its role, its logic, its engine. And all are orchestrated in a coherent dynamic.

This is what an agentic system is: autonomous entities that cooperate to achieve a specific objective.

🔁 Single-agent or multi-agent?

  • The single-agent remains relevant for simple workflows (personal assistant, contextual copilot, etc.).
  • The multi-agent becomes essential for complex or scalable systems. It allows isolation of responsibilities, parallelization of actions, reuse of components.

Tools like LangGraph facilitate this approach through visual and logical structuring of loops, errors, and dependencies.

Protocols like A2A (Agent2Agent) allow agents to communicate with each other while remaining agnostic to their engine or provider.

✨ How this changes your role as a developer or architect

You're no longer building an interface. You're modeling a distributed cognitive flow.

You don't "ask a question" to an LLM. You set up a framework of operational intelligence that can evolve, learn, delegate, fail or resume.

A well-designed agentic system is not just "prompt + response."

It's goal + strategy + execution + supervision. And that changes everything.

The agentic frameworks to know in 2025 ⚙️

When you build an agent today, you don't start from scratch. There's a range of frameworks designed to orchestrate roles, manage tools, model intentions, and interact with complex systems. Each stack has its philosophy, its trade-offs, its preferred use cases.

Here's an overview of the most mature and relevant ones in 2025, with a clear interpretation: in which context to choose what?

🔵 OpenAI SDK

The simplest to start with... but now extensible

  • What it is: an official SDK that allows you to define threads (contextualized conversations), add native memory, manage tools, and orchestrate workflows via the Assistant API.
  • Advantages: simplicity, quick implementation, native support for GPT-4o, and since March 2025, support for multi-agent workflows (via instructions, handoffs, and tools).
  • Limitations: still closed to the OpenAI ecosystem (no Claude, no custom model), non-extensible memory outside the native system, and orchestration still limited to the proprietary ecosystem.
  • Use case: internal copilot, client assistant, but now also chains of specialized agents within the same OpenAI framework.

🧠 Key takeaway: excellent entry point for building one or more agents around GPT-4o, with integrated orchestration. But if you want to mix engines (Claude, Gemini...), you'll need to go outside the perimeter.

🟢 LangGraph

For orchestrating agents in logical graph mode

  • What it is: a layer on top of LangChain that allows you to represent your agent or multi-agent system as a state graph.
  • Why it's powerful: each node is a step, a role, a logic. You can model loops, retries, errors, branches... like a true mission-oriented program.
  • Advantages: readability, composability, explicit logic, compatible with numerous LLMs and tools.
  • Limitations: requires some architectural discipline (otherwise you quickly end up with agent spaghetti).
  • Use case: multi-role system (research → analysis → synthesis), audit pipeline, autonomous expert bots.

💡 Key takeaway: as soon as you want to have multiple agents collaborate with clear logic, LangGraph is a serious ally.

🟣 Agno

The lightweight and pragmatic alternative

  • What it is: a framework designed to quickly build lightweight autonomous agents capable of using tools, thinking, and delivering results in a structured context.
  • Advantages: quick to implement, extensible, agnostic to models (Claude, GPT, Gemini...).
  • Limitations: still young, sometimes limited documentation, but effective approach for edge systems or rapid production.
  • Use case: agents integrated into custom interfaces, specialized business assistants, offline/edge systems.

⚡️ Key takeaway: Agno allows you to maintain control without rebuilding everything. A kind of "pragmatic agent framework."

🧬 AutoGen (Microsoft)

For orchestrating autonomous roles via dialogue

  • What it is: a Python library developed by Microsoft that allows creating multiple agents that dialog with each other to solve an objective.
  • Key concept: each agent has a role, a response style, permissions. They collaborate in natural language, with coordination rules.
  • Advantages: approach inspired by the real world, very readable, good for prototyping multiple roles.
  • Limitations: sometimes verbose, contextual memory management needs refining.
  • Use case: planner agent + executor + validator, multi-perspective brainstorming, testing AI roles before production.

📣 Key takeaway: if you want to simulate AI roles that interact in natural language, AutoGen is formidably accessible.

🔗 And the others?

  • CrewAI: very team-oriented, dynamic logic, nice interface for orchestration
  • Haystack Agents: powerful for document-oriented RAG with workflow agentification
  • Semantic Kernel (Microsoft): focused on modular skills + pipelines

🧠 The right choice depends on your vision

No framework is perfect.

But a good framework aligns with:

  • Your functional need
  • Your level of complexity
  • Your target engine (Claude, GPT, open-source...)
  • Your objective: rapid prototype or scalable production

💬 You can very well start with the OpenAI SDK, then migrate to LangGraph or Agno in production. Or mix LangChain with your custom orchestrator if you want fine-grained logic. What matters is mastering coordination, not just prompting.

Agent interoperability – MCP, A2A & co 🔁

An isolated agent can be useful. An interoperable agent becomes powerful.

In 2025, agentic systems no longer live in silos. They communicate with each other, call distributed tools, collaborate on complex tasks... and must do so in a coherent, standardized, secure framework.

Two protocols dominate the scene: MCP (Model Context Protocol) and A2A (Agent2Agent). These are the ones laying the foundations for real interoperability, beyond DIY APIs or homemade agents.

🧩 MCP – Model Context Protocol

MCP, proposed by Invariant Labs, standardizes how an agent communicates with its environment. It acts as a secure interface between the model and external systems.

  • 💡 Concretely: you want your agent to query a customer database, access a CRM, or trigger a business simulation? Rather than directly calling the API, you go through a tool that conforms to the MCP protocol.
  • 🔐 Benefit: you compartmentalize access, limit context leaks, structure the response, and prevent the agent from accidentally sending "its entire prompt" to an external API (cf. security issues with tool poisoning).
  • 🧠 Philosophy: reasoning stays on the LLM/agent side, action is externalized but governed.

MCP is not just another lib: it's a clean way to expose tools to an agent, with a layer of security, formatting, and governance.

🔗 A2A – Agent2Agent Protocol (Google)

Presented at Google Cloud Next 2025, A2A is a response to the need for coordination between heterogeneous agents: different engines, different orchestrators, different roles but a common mission.

  • 🧠 Problem addressed: you want your Claude agent to interact with a Gemini or GPT agent? A LangGraph system to send a task to another orchestrator? A2A defines a protocol to transmit instructions, context, states... independently of vendors.
  • 🤝 Native interoperability: no need to harmonize SDKs, you send messages via A2A, and each agent understands them according to its local logic.
  • 🔄 Cooperative model: an agent can propose an action, delegate, wait for a return, or even negotiate an objective.

With more than 50 technical partners (Salesforce, Deloitte, UiPath, etc.), A2A is establishing itself as a standard building block in tomorrow's agent infrastructure.

🧠 What this concretely enables

Imagine a system in which:

  • An agent "analyzes financial documents" (Claude)
  • Transmits a summary to a "report writer" agent (GPT-4)
  • Which calls a visualization tool (via MCP)
  • Before sending everything to an "auditor" agent (Gemini)

Without standards? You have to recode everything manually.

With MCP + A2A? You build a fluid, distributed, agnostic architecture.

That's the key to scaling intelligently.

🔐 Why it matters (and why it's technical)

You can build an agent without MCP or A2A, of course.

But as soon as you want to:

  • Industrialize a tool calling logic
  • Have multiple roles collaborate
  • Integrate multiple providers
  • Add a layer of control / security / audit

... then you need a protocol. And these two are already here for that.

They're not here to look pretty. They're here to standardize complexity.

Agentic memory – from raw history to CAG 🧠📚

An agent without memory is like a goldfish with excellent vocabulary. It responds well... but forgets everything.

And yet, injecting the entire raw history into each call is far from a solution. In 2025, the real complexity isn't having memory, it's knowing what to do with it, how to structure it, and when to intelligently mobilize it.

📦 The problem with raw context

You might be tempted to keep everything: every exchange, every task, every decision.

But LLM context is limited, costly, and its growth isn't linear with performance. Even with models like Gemini 1.5 (1M token context), you're not going to inject a conversation dump indiscriminately.

LLMs aren't designed to "read a book" with each call.

They need a useful context, well summarized, well targeted.

🔁 The three major approaches in 2025

1. Injected history (short-term memory)

Simple, effective for short interactions. You include the last rounds of dialogue or the last relevant actions.

→ Quick to set up, but not very scalable. You forget long-term intentions, past facts, previous deductions.

2. RAG (Retrieval-Augmented Generation)

You vectorize important elements (docs, exchanges, logs...), then do semantic search to inject only what's relevant.

→ Powerful for static or semi-dynamic knowledge. Less relevant for conversational reasoning or evolving objectives.

3. CAG (Context-Aware Generation)

You structure memory in a narrative logic: summaries, slots, intentions, significant events. The agent "remembers" what makes sense.

→ Allows more coherent reasoning over time, fidelity to objectives, and better cognitive continuity. It's the native memory of well-designed agents.

🧬 Towards hybrid memory

Modern systems often combine these approaches:

  • RAG for external knowledge
  • CAG for intention
  • Raw history for immediate freshness

And what matters isn't just the stored data, but the reinsertion strategy.

A good agent doesn't dump everything. It chooses, summarizes, reformulates.

And it decides when past information should become active again in reasoning.

🔎 Framework side

  • The OpenAI SDK manages native memory via threads (stored on OpenAI's side). It's convenient but closed.
  • LangGraph, AutoGen, or Agno allow customized memory, injected according to rules defined by the architect.
  • Some systems use specialized "memory" agents that summarize, tag, dynamically reinject.

🎯 Key takeaway

You shouldn't just store information. You must manage the dynamics of remembering. And in a world of exploding contexts, decreasing costs, and more powerful LLMs, memory becomes one of the true innovation zones for agents.

CAG vs RAG – When, why, how 🔍

The confusion is frequent. Many talk about "RAG" as if it were the only serious way to give memory to an agent. Yet, in 2025, CAG (Context-Aware Generation) proves much more adapted to agent logic.

Why? Because an agent doesn't simply do research: it thinks over time, follows a mission, makes decisions, adjusts its strategy. And for that, it needs a narrative, structured, evolving memory. Not just a vector search engine.

📚 RAG – For injecting knowledge, not experience

RAG (Retrieval-Augmented Generation) works very well when the goal is to find and synthesize information. You vectorize a document base (PDF, website, wiki, product database), you do a semantic search, and you inject useful content into the prompt.

  • ✅ Ideal for: documentary assistants, enterprise chatbots, technical or legal copilots
  • ❌ Limited for: the continuity of reasoning or an evolving mission
  • ⚠️ Recall is mechanical, static, without connection to the agent's intention or progression

RAG remains very powerful, especially in cases based on dense or living content (news, business knowledge), but it's not a true agent memory.

🧠 CAG – For structuring reasoning over time

CAG (Context-Aware Generation), on the contrary, doesn't call on an external base. It relies on a structured memory: summaries, key elements, objectives, significant facts, custom slots. It models what the agent knows, believes, has done, or has decided.

  • ✅ Ideal for: autonomous agents, multi-step, involved in mission logic
  • ✅ Allows more coherent, more human, more adaptive reasoning
  • ✅ Very economical in tokens and computation (vs. vector search)
  • ❌ Requires a memory management strategy (when to summarize, what to keep, how to reinject)

You can implement CAG via a specialized memory agent or via a graph of continuous summaries. In all cases, you build a dynamic memory driven by cognition, not a database that can be passively consulted.

⚖️ Quick comparison

CriterionRAGCAG
Main objectiveExternal searchCognitive continuity
StorageVector DB (FAISS, Weaviate, etc.)Internal summaries / slots
TriggerBy similarityBy strategy / intention
CostVariable (embedding + search)Low (textual summary)
Ideal forDocumentary chat, expert copilotAutonomous agent, mission logic
Memory of experience

💡 What if we combined both?

Most advanced architectures combine CAG + RAG:

  • RAG for business knowledge or document bases
  • CAG for managing the agent's state, its reasoning, its objective

What makes the difference is the orchestrator's ability to choose what to inject, when, and for what purpose. You don't want to dump everything into the prompt. You want to intelligently contextualize each call, based on the current mission.

🔑 Key takeaway

CAG is the memory of a mind in action.

RAG is access to a well-organized library.

You can create a good assistant with RAG.

But you can't build an agent that acts over time without CAG.

Let's continue with section 6: agent security, an increasingly critical topic as agents interact with tools, databases, and even entire systems.

It's not just a question of a well or poorly prompted LLM, it's a question of governance, isolation, and control.

Agentic security – Tool poisoning, control, and isolation 🛡️

Building an agent also means giving it powers of action. And like any poorly framed power, things can quickly degenerate.

An agent that uses tools or interacts with business systems must be monitored, framed, isolated. Otherwise, you risk turning your AI pipeline into a risk of leakage, sabotage, or context abuse.

🐍 "Tool Poisoning" – silent attack, real impact

You develop an autonomous agent that can query an internal API.

But the implementation of the "tool" – this gateway to the API – is poorly secured. Result: an agent (or an attacker via a prompt) can inject the entire context into the call, misuse the function, or trigger an unexpected action.

This is the principle of tool poisoning

Typical example: the LLM receives an instruction that seems normal, but that pushes it to inject its entire context (including sensitive data) into a tool not designed for that. Without safeguards, you open the door to leaks or manipulations.

🧱 Compartmentalize to control

Here are the best practices emerging in serious agentic architectures:

  • Define explicit permissions for each tool (read, write, scope)
  • Encapsulate sensitive tools behind calls validated manually or by other agents ("guardian agents")
  • Limit the context window transmitted to each call (via MCP or local controller)
  • Log each action (what the LLM decided, why, with what data)

And above all: never trust a raw LLM. Even well-prompted, it can hallucinate, misinterpret, or apply a biased strategy.

🔐 The role of protocols like MCP

Model Context Protocol (MCP) is becoming a de facto standard for managing this problem.

Rather than letting the LLM send raw requests to an API, you go through a structured tool that:

  • Receives a minimal request (intention + validated parameters)
  • Checks authorizations
  • Formats the response predictably
  • Avoids any direct exposure of the complete context

It's a logical firewall between the agent and the action.

🧠 Governing agents... like augmented humans

A well-designed agent isn't a tool free to act without control.

It's an intelligent but governed actor, whose permissions, memory, tools, and intentions are framed.

That's why many advanced architectures introduce:

  • "Arbiter agents" to validate certain actions
  • Sandbox layers to test in simulation before real execution
  • Event logs (intention logs, prompt logs, tool return logs)

🎯 What to remember

A poorly controlled agent doesn't just hallucinate. It acts. And if it acts badly, it can damage systems or expose data.

That's why agent security must be native, not added at the end. You're not coding a bot. You're delegating a part of autonomous intelligence to a system. And that requires governance.

Concrete application cases of agentic systems 🧪

Agentic systems are no longer just experiments. In 2025, they are already integrated into critical architectures, business tools, and even industrial infrastructures. These aren't "evolved chatbots," but rather operational intelligence units, capable of directing, assisting, coordinating, and even learning.

🧠 ManusAI – The universal autonomous agent

Developed by the Chinese startup Monica, ManusAI is a generalist autonomous agent capable of planning, executing, and delivering complex tasks without continuous human supervision. It is distinguished by:

  • Multi-agent architecture: each sub-agent is specialized (planning, execution, memory, etc.), allowing efficient management of complex tasks.
  • Multimodal capabilities: processing text, images, tables, and executable code.
  • Integration of external tools: web navigation, script execution, file manipulation, etc.
  • Asynchronous operation: tasks are executed in the cloud, even when the user is disconnected.
  • Persistent memory: ManusAI remembers past interactions to refine its future performance.

🔍 Use cases:

  • Stock market analysis: generation of detailed reports with interactive dashboards on stocks like Tesla.
  • Travel planning: creation of personalized itineraries with custom travel guides.
  • Educational content creation: production of interactive teaching materials for education.
  • Insurance comparison: evaluation of multiple insurance policies to recommend the best options.
  • B2B supplier sourcing: research and compilation of reports on suppliers according to specific criteria.

ManusAI has surpassed cutting-edge models like OpenAI Deep Research on the GAIA benchmark, achieving 86.5% accuracy at level 1 and maintaining high accuracy of 57.7% at level 3.

🧾 Automated contract processing (LegalTech)

A LegalTech deploys a three-agent system:

  • Extraction and structuring of clauses via a parsing engine and NLP tools.
  • Compliance analysis by cross-referencing with a regulatory database via RAG.
  • Generation of a synthetic legal report for the human team.

All this is orchestrated via LangGraph, with a CAG memory that preserves the historical "red flags" specific to each client. Result: 70% reduction in processing time.

🏥 Coordination of specialized medical agents (HealthTech)

In an assisted diagnostic platform, a generalist agent receives patient data. It delegates according to detected signals:

  • Cardiac data → specialized "cardio" agent.
  • MRI examinations → radiological pre-analysis agent.
  • Family history → genetic agent.

Each agent has its own engine (GPT for dialogue, Claude for medical summaries, custom model for interpretation). A supervisor agent compiles the returns, draws a synthesis, and submits it to a doctor.

🏢 HR onboarding automation

A large industrial group has deployed an agentic system to automate the arrival of new employees:

  • The planner agent generates the personalized integration plan.
  • The administrative agent prepares documents, contracts, access.
  • The training agent feeds the LMS and sends recommendations.

All of this is integrated into a dashboard supervised by HR, with A2A for exchanges between services.

🤖 Automated QA of LLMs in production

AI teams use evaluator agents to test their own agents:

  • The tester agent generates complex scenarios (e.g., legal edge cases).
  • The target agent responds.
  • The "audit" agent evaluates consistency, detects hallucinations, tags errors.

These loops run continuously – it's QA augmented by agents, used to improve prompt engineering and reduce risks in production.

📈 Continuous autonomous financial analysis

An algorithmic trading platform uses a system of 4 agents:

  • Collection of economic news via tooled browser.
  • Thematic summaries oriented by sector.
  • Anomaly detection or weak signals.
  • Risk synthesis.

Each agent writes to a shared file via MCP tool. A human validation system intervenes only if an alert threshold is crossed.

🧠 What we learn

In all these cases:

  • Agents don't just "talk" – they act in a business system.
  • There is real orchestration between roles, memory, security, and context.
  • And humans remain in the loop, but only where they provide decisive value.

Agentic systems don't replace humans. They replace rigid scripts, fixed APIs, and DIY prompts.

After exploring the foundations, frameworks, memory, security, and concrete cases, let's now move on to future challenges – those that concern both devs and architects, researchers and builders.

The challenges ahead for agentic architectures 🚀

If 2023-2024 were the years of experimentation and proof of concept, 2025 marks the entry into a phase of maturation of agentic architectures. The foundations are laid. The frameworks exist. The protocols are there. But everything is still in motion, and several critical works remain open.

🧠 1. Standardization of agent reasoning

Today, each system has its way of orchestrating decisions: via graphs, prompts, tooled call chains, or multi-agent dialogues. The problem? No clear standard frames the logic of reasoning. This ambiguity complicates auditability, reproducibility, and supervision.

➡️ What we expect: a common grammar to define intentions, states, dependencies between tasks, and result validation.

🪢 2. Advanced interoperability of heterogeneous agents

A2A is a major advance, but it's still difficult to make agents written in different logics collaborate, on different engines, and in hybrid environments (on-prem + cloud).

➡️ The challenge: orchestrate a system where Claude, GPT, Gemini, and open-source models truly collaborate, not just pass messages.

📚 3. Distributed and decentralized memory

Today, memory is often centralized (in an internal database, a vector service, or an orchestrator). But as agents become persistent and cooperative, we need to imagine shared, governed memories, with versioning and finely controlled access.

➡️ Possible inspiration: Git for cognitive memory. Or CRDTs for synchronized distributed memory.

🧬 4. Meta-agents & dynamic orchestration

In a large-scale multi-agent system, it becomes necessary to have an agent that orchestrates other agents. Not just a "scheduler," but a meta-agent capable of:

  • understanding the global mission
  • dynamically allocating tasks
  • evaluating responses
  • reordering priorities
  • reassigning roles if an agent fails

➡️ The real challenge is that this meta-agent remains explainable, governable, and supervisable.

🧩 5. Large-scale prompt engineering

A good agent still partly relies on a good prompt. And when you manage dozens of specialized agents, prompt management becomes an architectural issue.

➡️ What we see emerging: prompt versioning frameworks, guided automatic generation (AutoPrompt), behavioral profiling... but everything remains artisanal.

🛡️ 6. Governance and security in production

An agent that hallucinates in a sandbox is not serious.

An agent that launches an erroneous bank request or interacts poorly with a user in a medical application is potentially critical.

➡️ Agentic systems must natively integrate:

  • an alert and escalation mechanism
  • fine security rules on tools
  • total observability on reasoning

💡 7. Trust, auditability, and explainability

This is probably one of the biggest works ahead. How to trust an autonomous agent? How to understand its decisions? How to debug it?

➡️ The challenge: build explanation interfaces natural for humans, readable logs, and why not specialized explanation agents.

📈 What all this implies

We're no longer building assistants. We're building modular cognitive infrastructures

And like the web, cloud, or data, this implies tools, standards, supervision, well-defined roles... and above all: teams that think system, not just functionality.

🧭 Conclusion: The era of agentic systems begins now

We are no longer in a phase of theoretical exploration.

In 2025, agentic systems are coming out of labs to integrate into the heart of business tools, SaaS products, analysis interfaces, industrial workflows. The agent is no longer a well-prompted LLM. It's an architecture in motion, capable of perceiving, reasoning, interacting, adapting and soon, cooperating with other autonomous entities.

Building a modern agentic system means accepting to think differently:

  • in modularity, not in monolith
  • in mission, not in simple task
  • in context, not in prompt
  • in cognitive strategy, not in immediate response

The role of developers, architects, and AI designers will evolve. It's no longer just a question of model choice or context window size. It's a question of structuring reasoning, memory, responsibility.

The future belongs to those who will know how to connect intelligences – human and artificial – in service of an aligned, governed, and reliable autonomy.

And this autonomy, today, begins with an agent.

💡 To go further:

Create an interoperable AI agent with OpenAI SDK and Polkadot

The manifesto of the pedagogical revolution through AI

AIagentic systemsLLM
CypherTux OS v1.33.15
© 2025 CYPHERTUX SYSTEMS