Logo Left - should be set by hand or just had a Logo.png in images folder
Logo Right - should be set by hand or just had a Logo2.png in images folder

RoboErgoSum Project

There is an intricate relationship between self-awareness and the ability to perform cognitive-level reasoning.

Models for action.

Action is the expression of robot decisions (and reflexes) in the real world. For physically interacting with its environment, the robot needs representations adapted to both the environment and its actions. There are different models for action, and there is again evidence from neuroscience indicating that fine grasping, for example, uses different representations than gross manipulation or object pushing. Motion in the environment requires localization and mapping, using geometric and topological information. Action is also based on complex models of self, expressing geometric workspace, reachability, field of view, singularities, kinematic and dynamic models, etc.
Action is also anticipated before execution. This anticipation uses models of actions in terms of their preconditions (context) and their expected outcome. In classical action planning in Robotics and AI, logical representations and inference are used. In some systems, probabilistic reasoning is used to take into account uncertainties in action execution or in environment models (mostly Bayesian inference such as in Markov models and processes, or Kalman filter which can be used to combine learning of action values and estimation of reward uncertainty).

Approach. One of the most important questions we envisage to address is the relationship between "low-level" action models (skills) and abstract action models (tasks). The question that will be addressed are the following: how are abstract models synthesized (learned) from the more basic actuation capacities and how are those abstract actions decomposed in turn to the lower level skills? What information related to the body model is embedded in those representations? At which level is there already an uncertainty component in action representation? How is knowledge on the physical world encoded in robot actions and in action planning processes?
Since robot motion and action planning are actually simulations of the actions to come, we will develop simulators that will permit to test different representations and their interactions. Those simulators are based on action planners taking into account geometrical constraints.
In a navigational context, the existence of place cells in the hippocampus has been known for a long time [1]. These neurons discharge only when the animal is present in a specific subpart of the environment, as if they where coding the places. They are supposed to be the substrate of the cognitive map allowing elaborated navigation strategies, like planning. Numerous models explaining how these conceptual representations can be created out of sensory inputs have been proposed (see [2] for a review), as well as how they can be used to plan locomotion [3][4]. In these models, the action allowing the transition from one place to another is broadly encoded, while low level systems will take care of precise obstacle avoidance. The interactions of this class of models for locomotor actions with other models dedicated to object manipulation, will be studied within this task.
A solution we want to apply to robotics to create abstractions of actions out of sequences of motor acts, is the use of temporal abstractions from machine learning, known as options [5]. Learning options consists in building a sequence of actions with reinforcement learning, and in chunking actions into unitary abstractions, casting them as a single higher-level action or skill. These new representations are described as temporal abstractions because they abstract over temporally extended, and potentially variable, sequences of lower-level steps. The first advantage of options is to speed-up exploration (by introducing structure into the exploration process), and to enable quicker learning (by adjusting learning only at the beginning and at the end of the action sequence instead of adjusting each elemental action of the sequence). In addition, temporally abstract options can be used as new actions for the model-based system, thus enabling high-level planning on higher-order behavioral elements. The combination of these different levels of learning is called hierarchical reinforcement learning [6], and can also be developed in close interaction with neurophysiological data [7]. However, the intermediate goals of options are usually defined in an ad hoc manner. The automatic identification of the relevant sequences to be considered as potential options, while avoiding a combinatorial explosion, is an unsolved problem we will address in this task.

Back to Work Packages

[1] O'Keefe, J. & Nadel, L. The hippocampus as a cognitive map. In The hippocampus as a cognitive map, Oxford University Press, 1978.
[2] Arleo, A. & Rondi-Reig, L. Multimodal sensory integration and concurrent navigation strategies for spatial cognition in real and artificial organisms.. Journal of Integrative Neuroscience, 6(3):327-66, World Scientific Publishing, 2007.
[3] Hasselmo, M.E. A Model of Prefrontal Cortical Mechanisms for Goal-directed Behavior. J. Cogn. Neurosci., 17(7):1115-1129, 2005.
[4] Martinet, L.E., Sheynikhovich, D., Benchenane, K. & Arleo, A. Spatial learning and action planning in a prefrontal cortical network model. PLoS computational biology, 7(5):e1002045, Public Library of Science, 2011.
[5] Sutton, R.S., Precup, D. & Singh, S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1–2):181-211, 1999.
[6] Barto, A.G. & Mahadevan, S. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems, 13(1-2):41-77, Kluwer Academic Publishers, 2003.
[7] Botvinick, M., Niv, Y. & Barto, A. Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition, 113(3):262-280, 2009.

RoboErgoSum project is funded by an ANR grant under reference ANR-12-CORD-0030