Szymon Mrozowski's profile

Machine learning TicTacToe

TicTacToe with Q-learning where the agent gets only basic game rules. It learns how to play a game only by playing thousands of matches with itself. Without interaction with human. The program saves what he learned in files in 'policy' catalog. It uses it later to play games with human. The program works quite well. It's not simple to win against it. Most matches end with tie or win for a computer. I trained it by letting him play 500000 matches with itself.
How it works
During training - playing games with itself it saves and updates every state of the board to the dictionary and adds value that tells how good this state is. The value comes from reward it takes at the end of each match. The picture below shows a piece of code that does this thing.

self.lr - Learning Rate -  How quickly the agent learns new information and how long the user remembers old information. I keep this value as 0.1
self.decay_rate - Decay Rate - This variable decides if early or late moves are more important for the result. Usually it has a value 0.8 or 0.9
self.exp_rate - Exploratory Rate - Probability that the agent will choose a completely random move. A bit of spontaneity is never a bad thing. However, too high value brings too much randomness which causes a worse result. I tried many values and the best is between 0.1 till 0.3.
self.states_value -The dictionary contains all encountered states and values assigned to that states.

Later when an agent decide what move to make it checks all possible moves and chooses this one that has the highest value according to the dictionary.
As a reward it takes:
    1 - if the agent won a match
    0 - tie
   -1 - lose
User interface
For the user interface, I used PyQT because I used this library a few times already. Buttons and other elements I designed in Figma and export as png files.
New Game starts a new game against computer.
Comp vs Comp starts a new game where 2 computers play together.
Training starts agent training which takes 5000 iterations (around 30 seconds). After that, we can start a new game with our new trained computer. It will not play as default agent though. The default agent was trained with 500000 iteration which is 100 times more. To back to the default agent have to restart the program.

To separate logic from code related to design I used MVP pattern.
view.py - Contains code that shows user interface.
presenter.py - There is code that does all business logic and connects with visual elements from view.py.
game.py - Code strictly related to TicTacToe game and machine learning.
Check my --- GitHub --- page to see the code.
Here is link to my --- Google Drive --- with exe file of this program.
Machine learning TicTacToe
Published: