Autonomous PEV Charging Scheduling Using Deep-Q Network and Dyna-Q Reinforcement Learning
thesisposted on 24.05.2021, 07:54 by Fan Wang
This paper proposes a demand response method that aims to reduce the long-term charging cost of a plug-in electric vehicle (PEV) while overcoming obstacles such as the stochastic nature of the user’s driving be- haviour, traffic condition, energy usage, and energy price. The problem is formulated as a Markov Decision Process (MDP) with unknown transition probabilities and solved using deep reinforcement learning (RL) techniques. Existing methods using machine learning either requires initial user behaviour data, or converges far too slowly. This method does not require any initial data on the PEV owner’s driving behaviour and shows improvement on learning speed. A combination of both model-based and model-free learning called Dyna-Q algorithm is utilized. Every time a real experience is obtained, the model is updated and the RL agent will learn from both real data set and “imagined” experience from the model. Due to the vast amount of state space, a table-look up method is impractical and a value approximation method using deep neural networks is employed for estimating the long-term expected reward of all state-action pairs. An average of historical price is used to predict future price. Three different user behaviour without any initial PEV owner behaviour data are simulated. A purely model-free DQN method is shown to run out of battery during trips very often, and is impractical for real life charging scenarios. Simulation results demonstrate the effectiveness of the proposed approach and its ability to reach an optimal policy quicker while avoiding state of charge (SOC) depleting during trips when compared to existing PEV charging schemes for all three different users profiles.