Designing Domain-Specific Agents via Hierarchical Task Abstraction Mechanism

1 Xi'an Jiaotong University
2 China Telecom Shaanxi Branch
*Equal Contribution
Corresponding Author
TL;DR We propose HTAM for designing domain-specific agents, instantiate this framework in the remote sensing field as EarthAgent, and introduce a comprehensive evaluation platform, GeoPlan-bench, to assess complex remote sensing planning capabilities.


Abstract

LLM-driven agents, particularly those using general frameworks like ReAct or human-inspired role-playing, often struggle in specialized domains that necessitate rigorously structured workflows. Fields such as remote sensing, requiring specialized tools (e.g., correction, spectral indices calculation), and multi-step procedures (e.g., numerous intermediate products and optional steps), significantly challenge generalized approaches. To address this gap, we introduce a novel agent design framework centered on a Hierarchical Task Abstraction Mechanism (HTAM). Specifically, HTAM moves beyond emulating social roles, instead structuring multi-agent systems into a logical hierarchy that mirrors the intrinsic task-dependency graph of a given domain. This task-centric architecture thus enforces procedural correctness and decomposes complex problems into sequential layers, where each layer's sub-agents operate on the outputs of the preceding layers. We instantiate this framework as EarthAgent, a multi-agent system tailored for complex geospatial analysis. To evaluate such complex planning capabilities, we build GeoPlan-bench, a comprehensive benchmark of realistic, multi-step geospatial planning tasks. It is accompanied by a suite of carefully designed metrics to evaluate tool selection, path similarity, and logical completeness. Experiments show that EarthAgent substantially outperforms a range of established single- and multi-agent systems. Our work demonstrates that aligning agent architecture with a domain's intrinsic task structure is a critical step toward building robust and reliable specialized autonomous systems.


Methodology

We introduce a paradigm shift in agent architecture design, i.e., HTAM. This framework departs from social analogies and instead derives its structure directly from the logical dependencies inherent to the problem domain.

Methodology Comparison

Comparative analysis of agent architectures on a complex geospatial query. (1) ReAct's iterative, step-by-step process lacks a global strategy, leading to chaotic, redundant tool calls and incomplete solutions. (2) Plan&Execute adheres to a rigid, pre-determined plan, showing no capacity for correction even when encountering errors, ultimately failing the task. (3) HTAM demonstrates a structured, hierarchical decomposition, ensuring a logical progression from data acquisition (Layer 1) to analysis (Layer 2) and final synthesis (Layer 3), leading to a coherent and complete solution.

To validate the efficacy and practicality of the HTAM model, we developed EarthAgent, a multi-agent system that serves as its implementation for the remote sensing domain.

EarthAgent Architecture

Architecture of EarthAgent. It consists of three layers: (1) Data Acquisition and Preprocessing Layer, (2) Data Processing and Analysis Layer, and (3) Synthesis and Application Layer. Each layer is composed of a set of sub-agents, which are responsible for the specific sub-task of the layer.


GeoPlan-bench

We introduce GeoPlan-bench. Its design is built around complex and long-planning remote sensing tasks, where queries require agents to perform strategic decomposition, tool selection, and dependency management without explicit guidance. We develop a semi-automated pipeline for task construction. The process is divided into two stages: task generation and task validation. In addition, we design a suite of metrics designed to assess the quality of an agent's generated plan from three perspectives: correctness, structure, and holistic completeness.

GeoPlan-bench Pipeline

The pipeline of GeoPlan-bench task construction and validation.


Main Results

We conducted a series of experiments on GeoPlan-bench. The primary objective was to compare the task planning performance of our EarthAgent against a representative set of established agent paradigms.

Main Results Table