Using Reinforcement Learning in a Recommendation Engine
Just like Buster could choose to perform various actions, like running and sniffing, the agent can also perform various actions. These actions have an impact on the environment. Our agent can influence the e-commerce recommendations on a newly loaded page of an online shop. For example, they can decide that only products from a particular brand or products with a maximum cost of $20 should be displayed. Or, just like Buster could decide to fetch the ball while barking, the agent can also choose to do both at the same time.
The agent’s decisions influence the product recommendations and the personalised elements that the customer sees. In doing so, it can also influence the customer’s behaviour:
- The best outcome: The customer is shown products that may interest them and so they are more likely to buy something or buy more. If the customer does buy something, the agent receives a digital treat, i.e., the agent is told the amount that the customer has spent. This reward reinforces the agent’s behaviour. This means that if the agent receives a similar input vector in the future, they are more likely to behave in the same way.
- The worst-case scenario: The customer is reluctant to buy anything or leaves the shop. In this case, the agent goes away empty handed and its behaviour is not reinforced. So if the agent receives a similar input vector in the future, they are less likely to perform the same action.
This procedure is repeated for lots of customers. Each individual online shopper becomes a trainer for the agent. Over time, the agent learns which product recommendations are best for the customer’s behaviour.
Personalising Recommendations
What’s special about the agent, is that they can respond to the various situations that customers find themselves in. Customers with similar behaviour create similar vectors. For example, some customers look for something in particular and know what they want. These customers tend to look at category overview pages less, but spend a longer average time on every page they visit. For customers who want to browse and find inspiration, the opposite is true.
The agent learns not only to distinguish between such groups, but also the most appropriate action for each group. So rather than using rigid strategies that perform the same action for every customer based on pre-set rules, this strategy can increase sales.
Our Conclusion on Using Reinforcement Learning in E-Commerce
With the right training, not only can dogs learn to fetch, relevant product recommendations can also be generated in online shops. Reinforcement learning trains the agent with different shop-user behaviour, meaning that the agent can provide improved, customer-specific recommendations.
Get to know how Internetstores raises its turnover through recommendations:
Read the case study now!