The mechanisms of learning are a very popular research area in neuroscience, and one of the key theories involves dopamine. Previous studies have found that when a person receives an unexpected reward, dopamine neurons in the striatum respond; however, when the reward becomes entirely predictable, the response of these neurons disappears. This suggests that these cells use prediction error to reflect the learning process. In addition to the striatum, neurons in the prefrontal cortex are also involved in processing goal-directed behaviors and value information. So, how do we integrate the information from these different brain regions?
Wang et al. (2018) proposed a framework in which they consider the prefrontal cortex as a Recurrent Neural Network (RNN). This neural network primarily receives various inputs, such as visual and reward information, and uses the network to generate actions that need to be executed or to calculate the value of each state for decision-making. The dopamine-related neural mechanisms are then used to adjust the synaptic weights within the RNN. Importantly, this mechanism is not limited to handling specific tasks but is capable of processing various related tasks in a dynamic environment. This mechanism is known as “meta-learning.” Meta-learning not only allows the algorithm to learn how to complete tasks but also teaches it how to learn new tasks more quickly and efficiently. In other words, a meta-learning system can learn “learning strategies” from different tasks, enabling it to adapt quickly when facing new tasks without needing to be retrained. For example, meta-learning can allow a model to learn some general rules or strategies after being exposed to several different tasks, so that when it encounters a completely new task, it can respond more quickly. In summary, meta-learning is a system that learns “how to learn,” allowing it to adapt quickly to different tasks in a dynamic and changing environment.
In further research, Jensen et al. (2024) explored whether this mechanism could explain certain human learning behaviors and the “replay” phenomenon observed in hippocampal cells. They designed a spatial navigation task where participants entered a maze with walls and had to find hidden treasures within the maze in a limited amount of time, with the goal of finding the treasures as many times as possible. The study found that human participants could quickly locate the hidden treasures after just one attempt, and before their initial movements, participants had longer thinking times, suggesting a planning process before execution.
To verify whether this behavior aligns with previous neural models, the research team constructed a meta-learning framework. In this framework, the relevant information from the task environment (the maze) was input into the RNN. The RNN had two options: one was to output the action to be executed, and the other was to enter a “thinking” state. In the thinking state, the RNN would simulate a possible goal, execute a series of actions to achieve that goal, and then execute the actual action. This is similar to how humans plan.
The research team found that the neural model exhibited behaviors similar to those of humans, particularly at the beginning when it more frequently entered the “thinking” state. More frequent thinking states also improved the accuracy of the neural model during task execution. Further findings revealed that the way the neural model functioned during thinking was very similar to the replay phenomenon observed in hippocampal cells in rats, which could predict the upcoming path. The entire study suggests that the RNN functions similarly to the prefrontal cortex, and its thinking mechanism is similar to that of the hippocampus.
In my view, there are still some issues in this study that need further clarification. First, Jensen et al.’s research only explores sequential replay, but other studies have found that neural replay can sometimes occur in reverse order. Additionally, it is not yet clear whether this replay mechanism is consistent with replay that occurs during sleep in animals. Moreover, in animal experiments, animals typically learn and navigate from a first-person perspective, while the RNN has all the information from the start, resembling a bird’s-eye view learning process. This represents a fundamental difference in experimental design. Therefore, I believe that fully extrapolating the RNN results to neural mechanisms still warrants further discussion.
Reference
Jensen, K. T., Hennequin, G., & Mattar, M. G. (2024). A recurrent network model of planning explains hippocampal replay and human behavior. Nature Neuroscience. https://doi.org/10.1038/s41593-024-01675-7
Wang, J. X., Kurth-Nelson, Z., Kumaran, D., Tirumala, D., Soyer, H., Leibo, J. Z., Hassabis, D., & Botvinick, M. (2018). Prefrontal cortex as a meta-reinforcement learning system. Nature Neuroscience, 21(6), 860β868. https://doi.org/10.1038/s41593-018-0147-8