Inference-Based Personalized Recommendation Via Uncertainty-Aware Dual Actor-Critic Using Reinforcement Learning

Date of Award


Publication Type


Degree Name



Computer Science


Inference, Actor-critic, Uncertainty, Recommendation systems, Decision-making







Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


Ranking of items is the core of an efficient, personalized recommendation system that justifies the overall performance and directly affects the consumer’s experience. For models working on explicit feedback to capture consumers’ satisfaction, nonrated interactions are usually ignored. A large section of users do not provide ratings to items after consuming them hence user satisfaction remains to be discovered for those items and an implicit system that maximizes interaction is utilized in such cases. We aim to extend the application of an explicit system via inference to suggest items to those users so that the users’ satisfaction is considered a pertinent element in the recommendation. Our goal is to use the interaction data of such users to suggest an item that could positively affect them. This work trains the model on explicit data obtained in the same environment. We use a reinforcement learning technique to obtain the model since recommendation can be considered a sequential decision-making task. At the same time, the aim remains long terms cumulative reward maximization by making efficient transitions. Twin actor twin delayed deep deterministic policy gradient is the underlying framework. Our approach considers uncertainty a determining element, which is a significant feature of this work because every user’s behavior in itself is different and uncertain which might be reflected in the data. This can induce uncertainty in the model as there will be insufficient data and the model itself could be inefficient in capturing all the patterns. The final policy, which we refer to as uncertainty-aware dual actor-critic, is acquired via policy aggregation, which is theoretically motivated by the deep ensemble in reinforcement learning with multiple deep deterministic policy gradients. The results of numerous experiments conducted using various benchmark datasets show that our aggregated policy-based approach enhances the recommendation performance by improving the generalization capability of the agent.