I worked on this project to get some basic knowledge of reinforcement learning and have some fun solving problems automatically. All three problems in this project was solved using Q learning.
Algorithm double_q_learning:
initialize Q_1 = Q_2, memory_deque = []
initialize explore_rate = 1, time_step = 0
initialize state
repeat:
action = select from (random_action, argmax(state, Q_1)) with explore_rate
new_state, reward, done = environment.step(action)
memory_deque.append( (state, action, new_state, reward, done) )
time_step += 1
explore_rate = max(explore_rate * explore_rate_decay, explore_rate_min)
if time_step % update_q2_steps == 0:
train_batch = sample(memory_deque)
q_val = bellman_qeuation(train_batch)
Q_2 = Q_2 + back_propagation(q_val)
if time_step % update_q1_steps == 0:
Q_1 = Q_2
until avg_game_reward > threshold

I used the template code from the UCB RL course homework to solve the Atari games. It is a deep Q learning algorithm, with some best practice, including double Q networks. The agent constantly achieved a average reward of over 20 on Pong-v0 after 4.3M of game steps.
The cart-pole problem should be the easiest and most common one. Nothing mysterious here. It was solved in 456 steps, which is a pretty good (and lucky) score for q learning.
I also developed an apple-picker game environment, a easy problem to apply deep Q learning with convolutional neural network, to test the code implementation. The problem was solved relatively easy, and should be a good start point for complex problems.