Adaptive inventory replenishment using structured reinforcement learning by exploiting a policy structure

Hyungjun Park, Dong Gu Choi, Daiki Min

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


We consider an inventory replenishment problem with unknown and non-stationary demand. We design a structured reinforcement learning algorithm that efficiently adapts the replenishment policy to changing demand without any prior knowledge. Our proposed method integrates the known structural properties of a well-performing inventory replenishment policy with reinforcement learning. By exploiting the policy structure, we tune reinforcement learning to characterize the inventory replenishment policy and approximate the value function. In particular, we propose two methods for stochastic approximation on the gradient of the objective function. These novel reinforcement learning algorithms ensure an efficient convergence rate and lower algorithmic complexity for solving practical problems. The numerical results demonstrate that the proposed algorithms adaptively update the policy to changing demand and lower inventory costs compared to various benchmarks. We also conduct a numerical validation for a South Korean retail shop to validate the practical feasibility of the proposed method. Understanding the policy structure is beneficial for designing reinforcement learning algorithms that can address the inventory replenishment problem. These well-designed reinforcement learning algorithms are particularly promising when we require policy updates based on observations without precise knowledge of non-stationary demand. These research findings could be extended to address the various inventory decisions in which policy structures are available.

Original languageEnglish
Article number109029
JournalInternational Journal of Production Economics
StatePublished - Dec 2023

Bibliographical note

Publisher Copyright:
© 2023


  • Inventory replenishment policy
  • Reinforcement learning
  • Stochastic approximation
  • Structural properties


Dive into the research topics of 'Adaptive inventory replenishment using structured reinforcement learning by exploiting a policy structure'. Together they form a unique fingerprint.

Cite this