- Author: Mohammad Javad Maheronnaghsh
- Last Update: October 25th, 2023
Adversarial search is like a strategy game between two players, where one player tries to make the best moves to win, while the other player tries to stop them. It's used in things like chess or tic-tac-toe, where we plan your moves while thinking about what our opponent might do to counter our plans.
As you can see in the following picture, this is a Game between 2 players (red and blue) to draw the map. There is another rule in the game that we have some walls (drawn in yellow). If a player enters the yellow area, it will draw walls till it enters the yellow area again. Note that a player defeats when it doesn't have any choice.
I have implemented this game in 3 modes:
- Min-max vs. Min-max
- Min-max vs. Random Walk
- Random Walk vs. Random Walk
And compare the results with each other which you can see in the "main.ipynb" file.
In local search, we begin with a solution and try to make it better by making small changes. It's like trying to find a better route to visit cities one by one in the Traveling Salesman Problem by re-arranging them slightly. We keep doing this until you can't make it any better. It's a way to find a good solution, even if it's not necessarily the best one.
As you can see in the picture, we have some cities in beginning and want to travel between them, reaching all cities and get back to the first city again.
I have implemented 3 different algorithms to do that:
Informed search, like A* with heuristics, is like using a map to find the quickest way to your destination. It considers both how far we've traveled (the cost) and an estimate of how much is left (the heuristic) to guide us towards the best path. This helps us make smarter decisions to reach our goal more efficiently.
We have a table like this:
And we want to reach this one (for instance):
The starting and final tables are given in this format:
The allowed actions are moving an arbitrary column upward or downward OR moving an arbitrary row to left of right. Note that the cells are circularly shifted.
I used my own heuristic which you can find more on the repository.
Optimization is the key component used in ML and AI pipelines. We use it to reach the optimum point that our algorithm/model works best.
Gradient Descent plays a pivotal role in the optimization process. In the followint, I am going to tell more explanations about what I have implemented:
- Optimization: Optimization is like finding the best solution to a problem. It's about making things as good as they can be. For example, imagine we have a limited amount of money and we want to buy the most things with it. We're optimizing our spending to get the most value.
- Gradient Descent: This is a method used in optimization. Imagine we're trying to find the highest point on a hilly terrain but can't see the whole landscape. So, we take a step in the direction that looks steepest uphill. We keep doing this until we reach the highest point. This step-by-step process of finding the best point is like gradient descent.
- Learning Rate = 0.1
- Learning Rate = 0.5
- Learning Rate = 0.001
Reinforcement learning (RL) is a machine learning approach based on rewarding desired behaviors and punishing undesired ones. In RL, an agent learns by interacting with its environment and receiving positive or negative feedback for its actions.
The agent seeks to maximize cumulative reward through trial and error. Key elements of reinforcement learning include:
- Agent - The learner and decision maker
- Actions - Possible moves the agent can make
- Environment - The agent's surrounding context
- States - The agent's situation at a given time
- Reward - Feedback for guiding the agent's learning Unlike supervised learning which trains on example input-output pairs, RL agents learn from voluntary interaction. And unlike unsupervised learning which finds patterns in data, RL actively maximizes a reward signal.
We implemented a Q-Learning Algorithm on 2 envoronments:
- Before Training:
- After Training:
- Before Training:
- After Training:
The Markov decision process (MDP) is a mathematical framework used for modeling decision-making problems where the outcomes are partly random and partly controllable. It’s a framework that can address most reinforcement learning (RL) problems [source].
I implemented a generic MDP solver, so we can solve every question by passing transition and reward functions. I used an example to test the implementation.
- Problem Definition
- Value Iteration
- MDP Solving
Other Topics: uninformed search - csp - rl - mdp - bn - hmm - particle filtering - regression - classification - nn (neural networks)
- Reinforcement learning: growing a child, learning and understanding, and also voice recognition of animals (signal processing)
- CSP: assigning tasks to TAs regarding their abilities, interests, and the needs
- HMM: the unknown reseaon behind extending/not-extending the deadline by head TA
- Particle Filtering: Robot Navigation
- Bayesian Networks: Cause and Effects in Psychology
- Regression: X=behaviour and attributes of people | Y=GPA
- Classification: X=different attibutes of thinking | Classes=Nationalities
I kindly request other GitHub members (specially experts in AI) to contribute to this repository so that we can have a more valuable source of AI materials.
- Upload the lecture notes
- Upload useful slides
- Upload Useful assignments and answers
- Add links of useful courses (videos) and their assignments with answers
- The previous suggestion can be converted into a bank of questions that is useful for teaching assisstants to design homeworks
- Add same respositories