Skip to content

Comments

Q-Learning pseudocode | Mathematical notation#432

Open
fardinafdideh wants to merge 1 commit intohuggingface:mainfrom
fardinafdideh:pseudocode
Open

Q-Learning pseudocode | Mathematical notation#432
fardinafdideh wants to merge 1 commit intohuggingface:mainfrom
fardinafdideh:pseudocode

Conversation

@fardinafdideh
Copy link
Contributor

Hi,
My remark is about the mathematical notation of Q-Learning pseudocode in unit2.ipynb.
I found the following notation a little bit confusing:
Q(s,a) + lr [R(s,a) + gamma * max Q(s',a') - Q(s,a)]
Maximization should be taken over all possible values for the action variable (second variable) of the two-variable function Q, while the above expression, i.e., max Q(s',a'), maximizes the Q at the specified points of s' and a' as its first and second variable. It can become clearer if the general variables and specified points are represented with small and capital letters, respectively, e.g., Q(s, a) function at the specified points s=S and a=A can be represented as Q(S, A).
So:

  • Current version: max Q(s',a') implies maximization of the two-variable function Q at the specifief points of s' and a' (since s' has been defined to be a specified point).
  • Suggested version: max_a Q(S',a) implies maximization of the Q function at the specific point of S' (as its first variable) and over its second variable, i.e., a.

@simoninithomas simoninithomas mentioned this pull request Dec 12, 2023
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant