I worked on this project to get some basic knowledge of reinforcement learning and have some fun solving problems automatically. All three problems in this project was solved using Q learning.

                
    Algorithm double_q_learning:
        initialize Q_1 = Q_2, memory_deque = []
        initialize explore_rate = 1, time_step = 0
        initialize state
        repeat:
            action = select from (random_action, argmax(state, Q_1)) with explore_rate
            new_state, reward, done = environment.step(action)
            memory_deque.append( (state, action, new_state, reward, done) )
            time_step += 1
            explore_rate = max(explore_rate * explore_rate_decay, explore_rate_min)
            if time_step % update_q2_steps == 0:
                train_batch = sample(memory_deque)
                q_val = bellman_qeuation(train_batch)
                Q_2 = Q_2 + back_propagation(q_val)
            if time_step % update_q1_steps == 0:
                Q_1 = Q_2
        until avg_game_reward > threshold

I used the template code from the UCB RL course homework to solve the Atari games. It is a deep Q learning algorithm, with some best practice, including double Q networks. The agent constantly achieved a average reward of over 20 on Pong-v0 after 4.3M of game steps.

The cart-pole problem should be the easiest and most common one. Nothing mysterious here. It was solved in 456 steps, which is a pretty good (and lucky) score for q learning.

I also developed an apple-picker game environment, a easy problem to apply deep Q learning with convolutional neural network, to test the code implementation. The problem was solved relatively easy, and should be a good start point for complex problems.

RL Experiments