• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Russian Researchers Improve Neural Networks' Spatial Navigation Performance

The missing element was the quality of attention

ISTOCK

Researchers at HSE University, MISiS National University of Science and Technology, and the Artificial Intelligence Research Institute (AIRI) have developed an enhanced approach to reinforcement learning for neural networks tasked with navigation in three-dimensional environments. By using the attention mechanism, they managed to improve the performance of a graph neural network by 15%. The study results have been published in IEEE Access.

The study used resources provided by the HSE Basic Research Programme and the HSE HPC cluster.

Some of the most promising robotics applications lie within logistics, with robots autonomously transporting boxes in warehouses, operating self-driving trucks, and steering delivery drones around obstacles on city streets. For navigating three-dimensional spaces, such devices (agents) rely on neural networks, as they need to respond rapidly to changing conditions.

If we want to train an agent to operate autonomously, it is essential to assess its performance throughout the training process. We cannot merely present the problem and observe, because the agent, almost inevitably, will process it incorrectly and the outcome will be unacceptable. Therefore, a neural network is instructed, as a bonus quest, to score as many points as possible while accomplishing the task. Points are awarded for advancing towards the optimal solution. This is what reinforcement learning is about. As the neural network performs the task multiple times during the learning process, we evaluate the outcomes and either reward progress in the right direction or reduce the earned points if the result is deemed incorrect.

Matvey Gerasyov
Co-author of the paper, doctoral student, HSE Faculty of Computer Science

Navigating through three-dimensional environments poses one of the greatest challenges for neural networks. The issue lies in the fact that a neural network is often not supplied with sufficient information regarding its current environment, such as the terrain depth or a map. Even more limited is the neural network's understanding of potential returns, as it only receives a reward upon completing the task rather than at various stages of the process.

Consider a scenario where you have to navigate through a forest towards a tower while enticing as many squirrels as possible along the way. A crucial aspect is that the squirrels are mainly positioned along the shortest route (the optimal solution) and will follow you once they catch sight of you. Yet, you're unaware of the tower's location and unable to see the squirrels—you will only discover the number of creatures you have enticed upon reaching your destination. This is the kind of task typically assigned to spatial neural networks.

Receiving a reward is mathematically expressed through a reward function, and the neural network must accurately determine it to maximise gains. A well-designed reward function can significantly improve a network's learning and performance.

The study authors propose a novel approach to enhancing the reward function, while addressing the limitations of providing a single reward upon completion of the task. Their solution is based on augmenting the reward signal with supplementary rewards—a technique called reward shaping.

The authors have applied two modifications of a reward shaping method proposed in 2020 by researchers at McGill University, Canada. The first modification involves advanced aggregating functions, and the second utilises the attention mechanism. Advanced aggregation functions are focused on what the neural network observes and in what sequence. The paper highlights the importance of selecting an aggregation function that aligns with the neural network’s architecture.

The attention mechanism enables the model to prioritise key inputs from the environment when making predictions. The neural network picks up clues indicating a significant favourable solution when comparing successive steps in the process of problem-solving.

The researchers conducted a series of experiments with phased, or sparse rewards. For this purpose, they employed navigation tasks set in two virtual environments: FourRooms and Maze.

In the FourRooms environment, the neural network was tasked with locating a red box randomly placed in one of the rooms. The neural network could only move forward, turn left, or turn right at each step. The box served as the focal point for the attention mechanism. The neural networks were trained on 16 parallel instances of the environment, with the total number of steps exceeding five million.

In the Maze environment, the agent, positioned at a random point within a maze, needed to navigate its way out. The maze was randomly generated each time. The procedure was the same as in FourRooms, but the amount of training was extended to 20 million steps.

The study revealed that reward shaping using the attention mechanism trains the agent to focus on edges corresponding to important transitions in the 3D environment—those in which the goal enters the agent's field of view. This improves the performance of a graph neural network by up to 15%.

It was important for us to optimise the learning process specifically for graph neural networks. A graph cannot be directly observed in its entirety, but for effective training of a graph neural network, it is sufficient to consider its parts, which can be observed as distinct trajectories of the agent's movement. Therefore, it is unnecessary for training purposes to consider all possible trajectory options. Leveraging the attention mechanism is a promising solution, as it notably accelerates the learning process. This acceleration arises from capturing the structure of the underlying Markov decision process, a capability not available to non-graph neural networks.

Ilya Makarov
Associate Professor, Faculty of Computer Science; visiting lecturer, Laboratory of Algorithms and Technologies for Network Analysis, HSE Campus in Nizhny Novgorod; Head, AI in Industry Group, AIRI; Director, AI Centre, MISiS

IQ

May 06