Reinforcement learning allows underwater robots to locate and track objects underwater.

July 28, 2023

Computer Vision and Robotics Research Group (UdG) is participating in this study led by the ICM-CSIC and published in the journal Science Robotics. The work demonstrates for the first time how a submarine robot is capable of learning the optimal trajectory to monitor the seabed and track species. The tests were conducted in Sant Feliu de Guíxols using autonomous underwater vehicles from VICOROB, and in California, using vehicles from the Bioinspiration Lab.

A team led by the Institute of Marine Sciences (ICM-CSIC) in Barcelona, in collaboration with the University of Girona (UdG), has demonstrated for the first time that deep reinforcement learning allows autonomous underwater vehicles and robots to locate and accurately track marine objects and animals underwater. Reinforcement learning is a neural network that learns the best action to take at each moment according to a series of rewards.

The details of this research are presented in a scientific article published in Science Robotics, the leading scientific journal in the field of robotics. The Monterey Bay Aquarium Research Institute (MBARI) in California and the Polytechnic University of Catalonia (UPC) have also participated in the study.

Currently, underwater robotics is becoming a key tool to improve knowledge of the oceans given the numerous challenges in exploring them, with vehicles capable of diving to depths of up to 4,000 meters. Furthermore, the in-situ data they provide helps complement other data sources, such as those obtained through satellites. This technology allows for the study of small-scale phenomena, such as the capture of CO2 by marine organisms, which contributes to regulating climate change.

Specifically, this new work reveals that reinforcement learning, widely used in control and robotics, as well as in the development of tools related to natural language processing like ChatGPT, enables underwater robots to learn the actions they should take at each moment to achieve a specific goal. These action policies equal or even outperform, under certain circumstances, traditional methods based on analytical development.

“This type of learning allows us to train a neural network to optimize a specific task, which would be very difficult to achieve otherwise. For example, we have demonstrated that it is possible to optimize the trajectory of a vehicle to locate and track objects moving underwater,” explains the study’s lead author, Ivan Masmitjà.

This “will allow us to delve into the study of ecological phenomena, such as migration or movement at small and large scales of many marine species, using autonomous robots. Additionally, these advancements will make it possible to supervise other oceanographic instruments in real-time through a network of robots, where some can be on the surface monitoring and transmitting the actions of other robotic platforms at the seafloor via satellite,” comments ICM-CSIC researcher Joan Navarro.

To carry out the study, the authors used the well-known “range acoustics techniques,” which estimate the position of an object based on distance measurements taken at different points. However, this fact makes the accuracy of object localization highly dependent on the location of the acoustic range measurements. This is where the application of artificial intelligence, specifically reinforcement learning, becomes crucial, as it identifies the best points and, consequently, the optimal trajectory for the robot to follow.

The neural networks were trained, in part, using the cluster of computers at the National Supercomputing Center of Barcelona, which houses the most powerful supercomputer in Spain and one of the most powerful in Europe. “This allowed us to adjust the parameters of different algorithms much faster than using conventional computers,” indicates Mario Martin, a co-author of the study and a professor in the Department of Computer Science at UPC.

Once trained, the algorithms were tested on different autonomous vehicles, including the AUV Sparus II developed by VICOROB, in a series of experimental missions conducted at the port of Sant Feliu de Guíxols and Monterey Bay (California), in collaboration with Kakani Katija, the principal investigator of the Bioinspiration Lab at MBARI.

“Our simulation environment incorporates the control architecture of real vehicles, which allowed us to effectively implement the algorithms before going to sea,” comments VICOROB researcher Narcís Palomeras.

For future research, the team will explore the possibility of applying the same algorithms to solve more complicated missions. For example, using multiple vehicles to locate objects, detect fronts and thermoclines, or cooperatively track algal blooms through multi-platform reinforcement learning techniques.

This research was carried out thanks to the prestigious European Marie Curie Individual Fellowship won by researcher Ivan Masmitjà in 2020 and the BITER project, funded by the Ministry of Science and Innovation of the Spanish Government, which is currently underway.

Reference article: Ivan Masmitjà, Mario Martin, Tom O’Reilly, Brian Kieft, Narcís Palomeras, Joan Navarro, and Kakani Katija (2023). Dynamic robotic tracking of underwater targets using Reinforcement Learning. Science Robotics, ade7811. DOI: 10.1126/scirobotics.ade7811.