Multi-agent deep reinforcement learning (MADRL) has shown remarkable advancements in the past decade. However, most current MADRL models focus on task-specific short-horizon problems involving a small number of agents, limiting their applicability to long-horizon planning in complex environments. Hierarchical multiagent models offer a promising solution by organizing agents into different levels, effectively addressing tasks with varying planning horizons. However, these models often face constraints related to the number of agents or levels of hierarchies. This paper introduces HiSOMA, a novel hierarchical multi-agent model designed to handle long-horizon, multi-agent, multi-task decision-making problems. The top-level controller, FALCON, is modeled as a class of self-organizing neural networks (SONN), designed to learn high-level decision rules as internal cognitive codes to modulate middle-level controllers in a fast and incremental manner. The middle-level controllers, MADRL models, in turn receive modulatory signals from the higher level and regulate bottom-level controllers, which learn individual action policies generating primitive actions and interacting directly with the environment. Extensive experiments across different levels of the hierarchical model demonstrate HiSOMA’s efficiency in tackling challenging long-horizon problems, surpassing a number of non-hierarchical MADRL approaches. Moreover, its modular design allows for extension into deeper hierarchies and application to more complex tasks with heterogeneous controllers. Demonstration videos and codes can be found on our project web page: https://smu-ncc.github.io.
Cooperative multi-agent reinforcement learning methods aim to learn effective collaborative behaviours of multiple agents performing complex tasks. However, existing MARL methods are commonly proposed for fairly small-scale multi-agent benchmark problems, wherein both the number of agents and the length of the time horizons are typically restricted. My initial work investigates hierarchical controls of multi-agent systems, where a unified overarching framework coordinates multiple smaller multi-agent subsystems, tackling complex, long-horizon tasks that involve multiple objectives. Addressing another critical need in the field, my research introduces a comprehensive benchmark for evaluating MARL methods in long-horizon, multi-agent, and multi-objective scenarios. This benchmark aims to fill the current gap in the MARL community for assessing methodologies in more complex and realistic scenarios. My dissertation would focus on proposing and evaluating methods for scaling up multi-agent systems in two aspects: structural-wise increasing the number of reinforcement learning agents and temporal-wise extending the planning horizon and complexity of problem domains that agents are deployed in.
This paper introduces a method to explain MADRL agents’ behaviors by abstracting their actions into high-level strategies. Particularly, a spatio-temporal neural network model is applied to encode the agents’ sequences of actions as memory episodes wherein an aggregating memory retrieval can generalize them into a concise abstract representation of collective strategies. To assess the effectiveness of our method, we applied it to explain the actions of QMIX MADRL agents playing a StarCraft Multi-agent Challenge (SMAC) video game. A user study on the perceived explainability of the extracted strategies indicates that our method can provide comprehensible explanations at various levels of granularity.
Current MARL benchmarks fall short in simulating realistic scenarios, particularly those involving long action sequences with sequential tasks and multiple conflicting objectives. Addressing this gap, we introduce Multi-Objective SMAC (MOSMAC), a novel MARL benchmark tailored to assess MARL methods on tasks with varying time horizons and multiple objectives. Each MOSMAC task contains one or multiple sequential subtasks. Agents are required to simultaneously balance between two objectives - combat and navigation - to successfully complete each subtask. Our evaluation of nine state-of-the-art MARL algorithms reveals that MOSMAC presents substantial challenges to many state-of-the-art MARL methods and effectively fills a critical gap in existing benchmarks for both single-objective and multi-objective MARL research.
Although Multi-agent Deep Reinforcement Learning (MADRL) has shown promising results in solving complex real-world problems, the applicability and reliability of MADRL models are often limited by a lack of understanding of their inner workings for explaining the decisions made. To address this issue, this paper proposes a novel method for explaining MADRL by generalizing the sequences of action events performed by agents into high-level abstract strategies using a spatio-temporal neural network model. Specifically, an interval-based memory retrieval procedure is developed to generalize the encoded sequences of action events over time into short sequential patterns. In addition, two abstraction algorithms are introduced, one for abstracting action events across multiple agents and the other for further abstracting the episodes over time into short sequential patterns, which can then be translated into symbolic form for interpretation. We evaluate the proposed method using the StarCraft Multi Agent Challenge (SMAC) benchmark task, which shows that the method is able to derive high-level explanations of MADRL models at various levels of granularity.