RoboErgoSum Project

There is an intricate relationship between self-awareness and the ability to perform cognitive-level reasoning.

Self-awareness and deliberation

Several issues must be considered in the decision-making process. We shall investigate how the notion of will can be developed and implemented. Will is deﬁned here as the capacity of taking initiatives and making decisions, without a request, nor in a reﬂex reaction to an external events. This capacity must have a persistence property and be able to project into the future. The notion of causality between the agent actions and events in the real world has to be represented, and this probably requires a representation of state and time.

Another issue in deliberation is the notion of motivation. The question of internal motivations has often been overlooked in the autonomous robotics literature: motivations are usually identified to simple drives, whose dynamics is entirely dictated by the metabolism (like decreasing level) and the occasional unconditional rewarding signals issued from the environment (like increasing energy on reloading stations). The resulting systems are thus not purely reactive, but they can neither be considered as truly motivationally autonomous.
Here, we want to investigate the potential advantage an artificial system could have in developing its own preferences, i.e. to associate virtual rewards (to be distinguished from reward predictions used in actor-critic models, for example) to specific states which seem to have a key role in obtaining long-term rewards and should thus become intrinsically rewarding. These virtual rewards would be created by the motivational system, while the learning systems would remain unaware of the real or virtual aspect of the rewards they are manipulating. A possible advantage could be to set key-points where a reset of the reward discount mechanisms would be made, thus avoiding the problem of the discounted reward vanishing when trying to learn to reach very long-term goals.
This could account for example for the behavior of rats in the task of [1], where the stimulus seems to become a reward in itself, even when the food is not consumed. This could also explain how getting more money, an normally intermediate step which can indirectly lead to unconditional rewards like food, can become a reward in itself.

As far as the architecture of the deliberation system is concerned, we hypothesize two levels of decision making, one for solving multiple goal situations, given context and long-term objectives, to produce a "goal agenda" as in input to a lower level goal-oriented planning subsystem which will decide of the more precise course of actions to achieve the goals. The planning system is associated with a supervisory control system, which enables to control action execution. The confrontation of the intended action and its actual results has two consequences: on the one hand, it is an input for re-planning if there is a failure in action execution -as it is classically the case in action planning systems. But more signiﬁcantly for this project, this supervision consolidates the relationship between the intended goal and the achieved situation, which we think is central for self-awareness and the identiﬁcation of the agent by itself as the source of action.

Back to Work Packages

[1]	• Flagel, S.B., Clark, J.J., Robinson, T.E., Mayo, L., Czuj, A., Willuhn, I., Akers, C.A., Clinton, S.M., Phillips, P.E. & Akil, H. A selective role for dopamine in stimulus-reward learning. Nature, 469(7328):53-57, Nature Publishing Group, 2011.

RoboErgoSum project is funded by an ANR grant under reference ANR-12-CORD-0030