深度增强学习方向论文整理
2016-11-28 15:36
573 查看
一、开山鼻祖DQN
Playing Atari with Deep Reinforcement Learning,V.Mnih et al., NIPS Workshop, 2013.
Human-level control
through deep reinforcement learning, V. Mnih et al., Nature, 2015.
二. DQN的各种改进版本(侧重于算法上的改进)
Dueling Network Architectures for Deep Reinforcement Learning.Z. Wang et al., arXiv, 2015.
Prioritized Experience Replay, T. Schaul et al., ICLR,
2016.
Deep Reinforcement Learning with Double Q-learning, H.
van Hasselt et al., arXiv, 2015.
Increasing the Action Gap: New Operators for Reinforcement
Learning, M. G. Bellemare et al., AAAI, 2016.
Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan
et al., IJCAI Deep RL Workshop, 2016.
Deep Exploration via Bootstrapped DQN, I. Osband et al.,
arXiv, 2016.
How to Discount Deep Reinforcement Learning: Towards New
Dynamic Strategies, V. François-Lavet et al., NIPS Workshop, 2015.
Learning functions across many orders of magnitudes,H
Van Hasselt,A Guez,M Hessel,D Silver
Massively Parallel Methods for Deep Reinforcement Learning,
A. Nair et al., ICML Workshop, 2015.
State
of the Art Control of Atari Games using shallow reinforcement learning
Learning to Play in a Day: Faster Deep Reinforcement
Learning by Optimality Tightening(11.13更新)
Deep Reinforcement Learning with Averaged Target DQN(11.14更新)
三. DQN的各种改进版本(侧重于模型的改进)
Deep Recurrent Q-Learning for Partially Observable MDPs,M. Hausknecht and P. Stone, arXiv, 2015.
Deep
Attention Recurrent Q-Network
Control of Memory, Active Perception, and Action in Minecraft,
J. Oh et al., ICML, 2016.
Progressive
Neural Networks
Language
Understanding for Text-based Games Using Deep Reinforcement Learning
Learning
to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks
Hierarchical
Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
Recurrent Reinforcement Learning: A Hybrid Approach
四. 基于策略梯度的深度强化学习
深度策略梯度:
End-to-EndTraining of Deep Visuomotor Policies
Learning
Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search
Trust
Region Policy Optimization
深度行动者评论家算法:
Deterministic Policy GradientAlgorithms
Continuous
control with deep reinforcement learning
High-Dimensional
Continuous Control Using Using Generalized Advantage Estimation
Compatible
Value Gradients for Reinforcement Learning of Continuous Deep Policies
Deep
Reinforcement Learning in Parameterized Action Space
Memory-based
control with recurrent neural networks
Terrain-adaptive
locomotion skills using deep reinforcement learning
Compatible
Value Gradients for Reinforcement Learning of Continuous Deep Policies
SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY(11.13更新)
搜索与监督:
End-to-EndTraining of Deep Visuomotor Policies
Interactive
Control of Diverse Complex Characters with Neural Networks
连续动作空间下探索改进:
Curiosity-driven Exploration in DRL via Bayesian Neuarl Networks
结合策略梯度和Q学习:
Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN OFF-POLICYCRITIC(11.13更新)
PGQ: COMBINING POLICY GRADIENT AND Q-LEARNING(11.13更新)
其它策略梯度文章:
GradientEstimation Using Stochastic Computation Graphs
Continuous
Deep Q-Learning with Model-based Acceleration
Benchmarking
Deep Reinforcement Learning for Continuous Control
Learning
Continuous Control Policies by Stochastic Value Gradients
五. 分层DRL
DeepSuccessor Reinforcement Learning
Hierarchical
Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
Hierarchical
Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks
Stochastic Neural Networks for Hierarchical Reinforcement
Learning – Authors: Carlos Florensa, Yan Duan, Pieter Abbeel (11.14更新)
六. DRL中的多任务和迁移学习
ADAAPT:A Deep Arc hitecture for Adaptive Policy Transfer from Multiple Sources
A
Deep Hierarchical Approach to Lifelong Learning in Minecraft
Actor-Mimic:
Deep Multitask and Transfer Reinforcement Learning
Policy
Distillation
Progressive
Neural Networks
Universal Value Function
Approximators
Multi-task learning with deep model based reinforcement
learning(11.14更新)
Modular Multitask Reinforcement Learning with Policy
Sketches (11.14更新)
七. 基于外部记忆模块的DRL模型
Controlof Memory, Active Perception, and Action in Minecraft
Model-Free
Episodic Control
八. DRL中探索与利用问题
Action-ConditionalVideo Prediction using Deep Networks in Atari Games
Curiosity-driven
Exploration in Deep Reinforcement Learning via Bayesian Neural Networks
Deep
Exploration via Bootstrapped DQN
Hierarchical
Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
Incentivizing Exploration In Reinforcement Learning With
Deep Predictive Models
Unifying Count-Based Exploration and Intrinsic Motivation
#Exploration: A Study of Count-Based Exploration
for Deep Reinforcemen Learning(11.14更新)
Surprise-Based Intrinsic Motivation for Deep Reinforcement
Learning(11.14更新)
九. 多Agent的DRL
Learningto Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks
Multiagent
Cooperation and Competition with Deep Reinforcement Learning
十. 逆向DRL
GuidedCost Learning: Deep Inverse Optimal Control via Policy Optimization
Maximum
Entropy Deep Inverse Reinforcement Learning
Generalizing Skills with Semi-Supervised Reinforcement
Learning(11.14更新)
十一. 探索+监督学习
Deeplearning for real-time Atari game play using offline Monte-Carlo tree search planning
Better Computer Go Player with Neural Network and Long-term
Prediction
Mastering the game
of Go with deep neural networks and tree search, D. Silver et al., Nature, 2016.
十二. 异步DRL
Asynchronous Methods for Deep Reinforcement LearningReinforcement Learning through Asynchronous Advantage
Actor-Critic on a GPU(11.14更新)
十三:适用于难度较大的游戏场景
Hierarchical Deep Reinforcement Learning: IntegratingTemporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al., arXiv, 2016.
Strategic
Attentive Writer for Learning Macro-Actions
Unifying Count-Based Exploration and Intrinsic Motivation
十四:单个网络玩多个游戏
PolicyDistillation
Universal Value Function
Approximators
Learning
values across many orders of magnitude
十五:德州poker
Deep Reinforcement Learning from Self-Play in Imperfect-InformationGames
Fictitious Self-Play in Extensive-Form
Games
Smooth UCT search in computer
poker
十六:Doom游戏
ViZDoom:A Doom-based AI Research Platform for Visual Reinforcement Learning
Training Agent for First-Person Shooter Game with
Actor-Critic Curriculum Learning
Playing
FPS Games with Deep Reinforcement Learning
LEARNING TO ACT BY PREDICTING THE FUTURE(11.13更新)
Deep Reinforcement Learning From Raw Pixels in Doom(11.14更新)
十七:大规模动作空间
DeepReinforcement Learning in Large Discrete Action Spaces
十八:参数化连续动作空间
DeepReinforcement Learning in Parameterized Action Space
十九:Deep Model
Learning Visual Predictive Models of Physics for PlayingBilliards
J. Schmidhuber, On Learning to Think: Algorithmic Information
Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models, arXiv, 2015. arXiv
Learning Continuous Control Policies by Stochastic Value
Gradients
Data-Efficient Learning of Feedback Policies from Image
Pixels using Deep Dynamical Models
Action-Conditional Video Prediction using Deep Networks
in Atari Games
Incentivizing Exploration In Reinforcement Learning With
Deep Predictive Models
二十:DRL应用
机器人领域:
Trust Region Policy OptimizationTowards
Vision-Based Deep Reinforcement Learning for Robotic Motion Control
Path
Integral Guided Policy Search
Memory-based
control with recurrent neural networks
Learning
Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection
Learning
Deep Neural Network Policies with Continuous Memory States
High-Dimensional
Continuous Control Using Generalized Advantage Estimation
Guided
Cost Learning: Deep Inverse Optimal Control via Policy Optimization
End-to-End
Training of Deep Visuomotor Policies
DeepMPC:
Learning Deep Latent Features for Model Predictive Control
Deep
Visual Foresight for Planning Robot Motion
Deep
Reinforcement Learning for Robotic Manipulation
Continuous
Deep Q-Learning with Model-based Acceleration
Collective
Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search
Asynchronous Methods for Deep Reinforcement Learning
Learning
Continuous Control Policies by Stochastic Value Gradients
机器翻译:
Simultaneous Machine Translation using Deep Reinforcement Learning
目标定位:
Active Object Localization with Deep Reinforcement Learning
目标驱动的视觉导航:
Target-drivenVisual Navigation in Indoor Scenes using Deep Reinforcement Learning
自动调控参数:
UsingDeep Q-Learning to Control Optimization Hyperparameters
人机对话:
DeepReinforcement Learning for Dialogue Generation
SimpleDS:
A Simple Deep Reinforcement Learning Dialogue System
Strategic
Dialogue Management via Deep Reinforcement Learning
Towards
End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning
视频预测:
Action-ConditionalVideo Prediction using Deep Networks in Atari Games
文本到语音:
WaveNet:A Generative Model for Raw Audio
文本生成:
GeneratingText with Deep Reinforcement Learning
文本游戏:
LanguageUnderstanding for Text-based Games Using Deep Reinforcement Learning
无线电操控和信号监控:
DeepReinforcement Learning Radio Control and Signal Detection with KeRLym, a Gym RL Agent
DRL来学习做物理实验:
LEARNING TO PERFORM PHYSICS EXPERIMENTS VIA DEEP REINFORCEMENT LEARNING(11.13更新)
DRL加速收敛:
Deep ReinforcementLearning for Accelerating the Convergence Rate(11.14更新)
利用DRL来设计神经网络:
Designing Neural Network Architectures using ReinforcementLearning(11.14更新)
Tuning Recurrent Neural Networks with Reinforcement Learning(11.14更新)
Neural Architecture Search with Reinforcement Learning(11.14更新)
控制信号灯:
Using a Deep Reinforcement Learning Agent for Traffic Signal Control(11.14更新)
二十一:其它方向
避免危险状态:
Combating Deep Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear (11.14更新)
DRL中On-Policy vs. Off-Policy 比较:
On-Policy vs. Off-Policy Updates for DeepReinforcement Learning(11.14更新)
相关文章推荐
- 深度增强学习方向论文整理
- 深度增强学习方向论文整理
- 深度增强学习方向论文整理
- DLRS(深度学习应用于推荐系统论文汇总--2017年8月整理)
- 论文整理集合 -- 吴恩达老师深度学习课程
- (zhuan) 126 篇殿堂级深度学习论文分类整理 从入门到应用
- 126篇殿堂级深度学习论文分类整理 从入门到应用(上)
- 深度学习论文分类整理
- 126篇殿堂级深度学习论文分类整理 从入门到应用(下)
- 自然语言处理深度学习方向博文整理
- 深度学习-模型压缩之Quantization & Binarization方向论文阅读笔记
- 126篇殿堂级深度学习论文分类整理从入门到应用
- 深度学习论文整理
- 【深度学习笔记】个人阅读的Deep Learning方向的paper整理
- Deep Learning(深度学习)学习笔记整理系列之(五)
- Deep Learning(深度学习)学习笔记整理系列之(三)
- [转] Deep Learning(深度学习)学习笔记整理系列
- Deep Learning(深度学习)学习笔记整理系列之(八)
- Deep Learning(深度学习)学习笔记整理系列之常用模型(四、五、六、七)
- Deep Learning(深度学习)学习笔记整理系列之(四)