Methods in Reinforcement Learning

Artificial Intelligence

  • Instincts (predefined logics)
  • Machine Learning (statistical data + improvement)
    • Supervised Learning (All data)
    • Reinforcement Learning (Target data only)
      • Bruteforce
        • Sequential bruteforce
        • Random bruteforce
      • Intuitive Explore and Exploit (Policy Network)
        • Single action explore and exploit
          • Random action but with known actions in the past
        • Sequence of actions explore and exploit
          • Monte Carlo Tree Search (for max) explore and exploit
      • Explore and Exploit with Bellman (Q-learning)
        • Single action explore and exploit
        • Use q-table (large memory issue, but all cases)
        • Update formula:
          • q += learning\_rate \times (reward + discount \times max(q_{t+1}) - q)
          • learning_rate is for the q-table; the right side of learning_rate is temporal difference
      • Explore and Exploit with Bellman (Q-network)
        • Single action explore and exploit
        • Use q-network (small memory, but similar cases only)
        • Network expected value:
          • q_{target} = reward + discount \times max(q_{t+1})
          • learning_rate is not in this formula, it is defined with optimiser
    • Unsupervised Learning (No data)
      • With neuralnet:
        • Autoclustering, etc.
      • Non-neuralnet:
        • K-means, etc.

You'll only receive email when they publish something new.

More from 19411
All posts