From the Kaggle paper on Agents.

We see that agents are comprised of:

Orchestration Layer

Stuff like:

  1. Instructions
  2. Agent Profiles
  3. Agent goals and objectives
  4. Memory (short- and long-term)
  5. Model-based reasoning and planning, etc

Tools

Types of tools:

  1. Extensions
  2. Functions
  3. Data Stores (Vector DBs) Functions are executed on the client-side, while extensions (what we’d call plugins) are executed agent-side.

Model

The model/LLM must be capable of following instruction-based reasoning and logic frameworks, like ReAct, Chain-of-Thought or Tree-of-Thoughts. They can be general-purpose, multimodal or fine-tuned on a need-to-have basis. Tip: it’s good to fine-tune the agents with the specific tools or reasoning steps in various contexts.

Reasoning Frameworks

ReAct

This is a prompt engineering framework for taking action on user query with or without in-context examples.

Chain-of-Thought

Reasoning through intermediate steps. Flavors include:

  1. Self-consistency
  2. Active-prompt
  3. Multimodal CoT

Tree-of-Thought

Suited for exploration or strategic look-ahead. This generalizes over CoT prompting and lets the model explore various thought chains.