Parameterized Action Soft Actor-Critic

Match Plan Generation in Web Search with Parameterized Action Reinforcement Learning

During my Microsoft internship, we explored an RL-based prototype for match plan generation, which outperformed hand-crafted match plans tuned by experts for years, and was later integrated into Bing.