Complex mission planning scenarios that arise in field robotics can involve multiple robotic agents, multiple human operators, and multiple tasks to accomplish. The high-level goal is to find an allocation of agents to tasks that is optimal in some sense (eg. minimum time, maximum probability of success, minimum risk, etc.). After allocating agents to tasks by a human, automated, or hybrid solution, lower-level controllers readily solve single-agent path planning and control problems. The end-to-end mission planning problem can be considered a hierarchical decision problem (DP) and is studied in optimization and control communities. However, several limitations of optimization and control render solving all but the most simple mission planning problems infeasible:
Real missions may involve a variety of task types (visit, patrol, rendezvous, search, track).
Tasks may have spatial and temporal components (rendezvous at a waypointpoint during a specific time window).
Tasks may have time varying priority, which can change at mission runtime. Agents have various ability levels that make them more or less suited for particular tasks.
High level human reasoning can readily trade off the marginal benefit of one configuration (allocation of tasks to agents at a particular time instance) over another. However, human reasoning is difficult if not impossible to encode in an algorithm. Humans themselves are often not able to specifically enumerate all of the thought processes behind complex decision making.
The last point is the primary motivation for taking a Reinforcement Learning (RL) approach to problems such as the one described.
P. Bartlett, B. Basso, A. Kusack, J. Wilson, K. Zacny, “New Rock Physical Properties Assessments From the Mars Exploration Rover Rock Abrasion Tool (RAT),” American Geophysical Union, Fall Meeting, 2005