State Function Approximation: Linear Function
In the previous posts, we use different techniques to build and keep updating State-Action tables. But it is impossible to do the same thing when the number of states and actions get huge. So this post we gonna discuss about using a parameterized function to approximate the value function.
Basic Idea of State Function Approximation
Instead of looking up on a State-Action table, we build a black box with weights inside it. Just tell the blackbox whose value functions we want, and then it will calculate and output the value. The weights can be learned by data, which is a typical supervised learning problem.
The input of the system is actually the feature of state S, so we need to do Feature Engineering (Feature Extraction) to represent the state S. X(s) is the feature vectore of state S.
Linear Function Approximation with an Oracle
For the black box, we can use different models. In this post, we use Linear Function: inner product of features and weights
Assume we are cheatingnow, knowing the true value of the State Value function, then we can do Gradient Descent using Mean Square Error:
and SGD sample the gradient:
Model-Free Value Function Approximation
Then we go back to reality, realizing the oracle does not help us, which means the only method we can count on is Model-Free algorithm. So we firstly use Monte Carlo, modifying the SGD equation to the following form:
We can also use TD(0) Learning, the Cost Function is:
the gradient is:
The algorithm can be described as:
Model-Free Control Based on State-Action Value Function Approximation
Same as state value function approximation, we extract features from our target problem, building a feature vector:
Then the linear estimation for the Q-function is :
To minimize the MSE cost function, we can get Monte Carlo gradient by taking derivative:
SARSA gradient:
Q-Learning gradient:
References:
https://www.youtube.com/watch?v=buptHUzDKcE
https://www.youtube.com/watch?v=UoPei5o4fps&list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ&index=6
原文地址:https://www.cnblogs.com/rhyswang/p/11326010.html
- JavaScript 教程
- JavaScript 编辑工具
- JavaScript 与HTML
- JavaScript 与Java
- JavaScript 数据结构
- JavaScript 基本数据类型
- JavaScript 特殊数据类型
- JavaScript 运算符
- JavaScript typeof 运算符
- JavaScript 表达式
- JavaScript 类型转换
- JavaScript 基本语法
- JavaScript 注释
- Javascript 基本处理流程
- Javascript 选择结构
- Javascript if 语句
- Javascript if 语句的嵌套
- Javascript switch 语句
- Javascript 循环结构
- Javascript 循环结构实例
- Javascript 跳转语句
- Javascript 控制语句总结
- Javascript 函数介绍
- Javascript 函数的定义
- Javascript 函数调用
- Javascript 几种特殊的函数
- JavaScript 内置函数简介
- Javascript eval() 函数
- Javascript isFinite() 函数
- Javascript isNaN() 函数
- parseInt() 与 parseFloat()
- escape() 与 unescape()
- Javascript 字符串介绍
- Javascript length属性
- javascript 字符串函数
- Javascript 日期对象简介
- Javascript 日期对象用途
- Date 对象属性和方法
- Javascript 数组是什么
- Javascript 创建数组
- Javascript 数组赋值与取值
- Javascript 数组属性和方法
- C++设计模式笔记(08) - Factory Method工厂方法
- 【Code】GraphSAGE 源码解析
- Kafka常见的导致重复消费原因和解决方案
- 近30个MySQL常用函数,必须推荐!
- 搞定 CompletableFuture,并发异步编程和编写串行程序还有什么区别?你们要的多图长文
- 用注解实现 MyBatis 开发
- MyBatis 实现数据的增删改查
- 0790-5.16.2-NameNode服务的edits不同步异常
- 0789-不停止MySQL服务重做备库的方法
- 防盗链Apache和Nginx配置对比
- Python 类特殊方法__getitem__
- 前端自动化测试探索和实践
- 我的开发日记(十四)
- 测试梗--欢迎补充
- Vue 3 正式进入 RC 阶段!