All Works

Twin Delayed Deep Deterministic Policy Gradient-Based Target Tracking for Unmanned Aerial Vehicle with Achievement Rewarding and Multistage Training

Najmaddin Abo Mosali, Tun Hussein Onn University of Malaysia
Syariful Syafiq Shamsudin, Tun Hussein Onn University of Malaysia
Omar Alfandi, Zayed University
Rosli Omar, Tun Hussein Onn University of Malaysia
Najib Al-fadhali, Tun Hussein Onn University of Malaysia

ORCID Identifiers

0000-0002-4466-4533

0000-0002-6255-432X

0000-0002-9581-401X

Document Type

Article

Source of Publication

IEEE Access

Publication Date

2-22-2022

Abstract

Target tracking using an unmanned aerial vehicle (UAV) is a challenging robotic problem. It requires handling a high level of nonlinearity and dynamics. Model-free control effectively handles the uncertain nature of the problem, and reinforcement learning (RL)-based approaches are a good candidate for solving this problem. In this article, the Twin Delayed Deep Deterministic Policy Gradient Algorithm (TD3), as recent and composite architecture of RL, was explored as a tracking agent for the UAV-based target tracking problem. Several improvements on the original TD3 were also performed. First, the proportional-differential controller was used to boost the exploration of the TD3 in training. Second, a novel reward formulation for the UAV-based target tracking enabled a careful combination of the various dynamic variables in the reward functions. This was accomplished by incorporating two exponential functions to limit the effect of velocity and acceleration to prevent the deformation in the policy function approximation. In addition, the concept of multistage training based on the dynamic variables was proposed as an opposing concept to one-stage combinatory training. Third, an enhancement of the rewarding function by including piecewise decomposition was used to enable more stable learning behaviour of the policy and move out from the linear reward to the achievement formula. The training was conducted based on fixed target tracking followed by moving target tracking. The flight testing was conducted based on three types of target trajectories: fixed, square, and blinking. The multistage training achieved the best performance with both exponential and achievement rewarding for the fixed trained agent with the fixed and square moving target and for the combined agent with both exponential and achievement rewarding for a fixed trained agent in the case of a blinking target. With respect to the traditional proportional differential controller, the maximum error reduction rate is 86%. The developed achievement rewarding and the multistage training opens the door to various applications of RL in target tracking.

DOI Link

10.1109/access.2022.3154388

ISSN

2169-3536

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Volume

Issue

Disciplines

Computer Sciences

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Recommended Citation

Mosali, Najmaddin Abo; Shamsudin, Syariful Syafiq; Alfandi, Omar; Omar, Rosli; and Al-fadhali, Najib, "Twin Delayed Deep Deterministic Policy Gradient-Based Target Tracking for Unmanned Aerial Vehicle with Achievement Rewarding and Multistage Training" (2022). All Works. 4911.
https://zuscholars.zu.ac.ae/works/4911

Indexed in Scopus

Open Access

yes

Open Access Type

Gold: This publication is openly available in an open access journal/series

Download

Included in

Computer Sciences Commons

COinS

All Works

Twin Delayed Deep Deterministic Policy Gradient-Based Target Tracking for Unmanned Aerial Vehicle with Achievement Rewarding and Multistage Training

ORCID Identifiers

Document Type

Source of Publication

Publication Date

Abstract

DOI Link

ISSN

Publisher

Volume

Issue

Disciplines

Creative Commons License

Recommended Citation

Indexed in Scopus

Open Access

Open Access Type

Included in

Search

Browse

Contribute

Content Type

All Works

Twin Delayed Deep Deterministic Policy Gradient-Based Target Tracking for Unmanned Aerial Vehicle with Achievement Rewarding and Multistage Training

Author First name, Last name, Institution

ORCID Identifiers

Document Type

Source of Publication

Publication Date

Abstract

DOI Link

ISSN

Publisher

Volume

Issue

Disciplines

Creative Commons License

Recommended Citation

Indexed in Scopus

Open Access

Open Access Type

Included in

Share

Search

Browse

Contribute

Content Type