您的位置：首页 > 其它

深度增强学习方向论文整理

2016-11-28 15:36 573 查看

一、开山鼻祖DQN

Playing Atari with Deep Reinforcement Learning，V.
Mnih et al., NIPS Workshop, 2013.

Human-level control
through deep reinforcement learning, V. Mnih et al., Nature, 2015.

二. DQN的各种改进版本（侧重于算法上的改进）

Dueling Network Architectures for Deep Reinforcement Learning.
Z. Wang et al., arXiv, 2015.

Prioritized Experience Replay, T. Schaul et al., ICLR,
2016.

Deep Reinforcement Learning with Double Q-learning, H.
van Hasselt et al., arXiv, 2015.

Increasing the Action Gap: New Operators for Reinforcement
Learning, M. G. Bellemare et al., AAAI, 2016.

Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan
et al., IJCAI Deep RL Workshop, 2016.

Deep Exploration via Bootstrapped DQN, I. Osband et al.,
arXiv, 2016.

How to Discount Deep Reinforcement Learning: Towards New
Dynamic Strategies, V. François-Lavet et al., NIPS Workshop, 2015.

Learning functions across many orders of magnitudes，H
Van Hasselt，A Guez，M Hessel，D Silver

Massively Parallel Methods for Deep Reinforcement Learning,
A. Nair et al., ICML Workshop, 2015.

State
of the Art Control of Atari Games using shallow reinforcement learning

Learning to Play in a Day: Faster Deep Reinforcement
Learning by Optimality Tightening（11.13更新）

Deep Reinforcement Learning with Averaged Target DQN（11.14更新）

三. DQN的各种改进版本（侧重于模型的改进）

Deep Recurrent Q-Learning for Partially Observable MDPs,
M. Hausknecht and P. Stone, arXiv, 2015.

Deep
Attention Recurrent Q-Network

Control of Memory, Active Perception, and Action in Minecraft,
J. Oh et al., ICML, 2016.

Progressive
Neural Networks

Language
Understanding for Text-based Games Using Deep Reinforcement Learning

Learning
to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks

Hierarchical
Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Recurrent Reinforcement Learning: A Hybrid Approach

四. 基于策略梯度的深度强化学习

深度策略梯度：

End-to-End
Training of Deep Visuomotor Policies

Learning
Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search

Trust
Region Policy Optimization

深度行动者评论家算法：

Deterministic Policy Gradient
Algorithms

Continuous
control with deep reinforcement learning

High-Dimensional
Continuous Control Using Using Generalized Advantage Estimation

Compatible
Value Gradients for Reinforcement Learning of Continuous Deep Policies

Deep
Reinforcement Learning in Parameterized Action Space

Memory-based
control with recurrent neural networks

Terrain-adaptive
locomotion skills using deep reinforcement learning

Compatible
Value Gradients for Reinforcement Learning of Continuous Deep Policies

SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY（11.13更新）

搜索与监督：

End-to-End
Training of Deep Visuomotor Policies

Interactive
Control of Diverse Complex Characters with Neural Networks

连续动作空间下探索改进：

Curiosity-driven Exploration in DRL via Bayesian Neuarl Networks

结合策略梯度和Q学习：

Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN OFF-POLICY
CRITIC（11.13更新）

PGQ: COMBINING POLICY GRADIENT AND Q-LEARNING（11.13更新）

其它策略梯度文章：

Gradient
Estimation Using Stochastic Computation Graphs

Continuous
Deep Q-Learning with Model-based Acceleration

Benchmarking
Deep Reinforcement Learning for Continuous Control

Learning
Continuous Control Policies by Stochastic Value Gradients

五. 分层DRL

Deep
Successor Reinforcement Learning

Hierarchical
Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Hierarchical
Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks

Stochastic Neural Networks for Hierarchical Reinforcement
Learning – Authors: Carlos Florensa, Yan Duan, Pieter Abbeel （11.14更新）

六. DRL中的多任务和迁移学习

ADAAPT:
A Deep Arc hitecture for Adaptive Policy Transfer from Multiple Sources

A
Deep Hierarchical Approach to Lifelong Learning in Minecraft

Actor-Mimic:
Deep Multitask and Transfer Reinforcement Learning

Policy
Distillation

Progressive
Neural Networks

Universal Value Function
Approximators

Multi-task learning with deep model based reinforcement
learning（11.14更新）

Modular Multitask Reinforcement Learning with Policy
Sketches （11.14更新）

七. 基于外部记忆模块的DRL模型

Control
of Memory, Active Perception, and Action in Minecraft

Model-Free
Episodic Control

八. DRL中探索与利用问题

Action-Conditional
Video Prediction using Deep Networks in Atari Games

Curiosity-driven
Exploration in Deep Reinforcement Learning via Bayesian Neural Networks

Deep
Exploration via Bootstrapped DQN

Hierarchical
Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Incentivizing Exploration In Reinforcement Learning With
Deep Predictive Models

Unifying Count-Based Exploration and Intrinsic Motivation

#Exploration: A Study of Count-Based Exploration
for Deep Reinforcemen Learning（11.14更新）

Surprise-Based Intrinsic Motivation for Deep Reinforcement
Learning（11.14更新）

九. 多Agent的DRL

Learning
to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks

Multiagent
Cooperation and Competition with Deep Reinforcement Learning

十. 逆向DRL

Guided
Cost Learning: Deep Inverse Optimal Control via Policy Optimization

Maximum
Entropy Deep Inverse Reinforcement Learning

Generalizing Skills with Semi-Supervised Reinforcement
Learning（11.14更新）

十一. 探索+监督学习

Deep
learning for real-time Atari game play using offline Monte-Carlo tree search planning

Better Computer Go Player with Neural Network and Long-term
Prediction

Mastering the game
of Go with deep neural networks and tree search, D. Silver et al., Nature, 2016.

十二. 异步DRL

Asynchronous Methods for Deep Reinforcement Learning

Reinforcement Learning through Asynchronous Advantage
Actor-Critic on a GPU（11.14更新）

十三：适用于难度较大的游戏场景

Hierarchical Deep Reinforcement Learning: Integrating
Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al., arXiv, 2016.

Strategic
Attentive Writer for Learning Macro-Actions

Unifying Count-Based Exploration and Intrinsic Motivation

十四：单个网络玩多个游戏

Policy
Distillation

Universal Value Function
Approximators

Learning
values across many orders of magnitude

十五：德州poker

Deep Reinforcement Learning from Self-Play in Imperfect-Information
Games

Fictitious Self-Play in Extensive-Form
Games

Smooth UCT search in computer
poker

十六：Doom游戏

ViZDoom:
A Doom-based AI Research Platform for Visual Reinforcement Learning

Training Agent for First-Person Shooter Game with
Actor-Critic Curriculum Learning

Playing
FPS Games with Deep Reinforcement Learning

LEARNING TO ACT BY PREDICTING THE FUTURE（11.13更新）

Deep Reinforcement Learning From Raw Pixels in Doom（11.14更新）

十七：大规模动作空间

Deep
Reinforcement Learning in Large Discrete Action Spaces

十八：参数化连续动作空间

Deep
Reinforcement Learning in Parameterized Action Space

十九：Deep Model

Learning Visual Predictive Models of Physics for Playing
Billiards

J. Schmidhuber, On Learning to Think: Algorithmic Information
Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models, arXiv, 2015. arXiv

Learning Continuous Control Policies by Stochastic Value
Gradients

Data-Efficient Learning of Feedback Policies from Image
Pixels using Deep Dynamical Models

Action-Conditional Video Prediction using Deep Networks
in Atari Games

Incentivizing Exploration In Reinforcement Learning With
Deep Predictive Models

二十：DRL应用

机器人领域：

Trust Region Policy Optimization

Towards
Vision-Based Deep Reinforcement Learning for Robotic Motion Control

Path
Integral Guided Policy Search

Memory-based
control with recurrent neural networks

Learning
Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection

Learning
Deep Neural Network Policies with Continuous Memory States

High-Dimensional
Continuous Control Using Generalized Advantage Estimation

Guided
Cost Learning: Deep Inverse Optimal Control via Policy Optimization

End-to-End
Training of Deep Visuomotor Policies

DeepMPC:
Learning Deep Latent Features for Model Predictive Control

Deep
Visual Foresight for Planning Robot Motion

Deep
Reinforcement Learning for Robotic Manipulation

Continuous
Deep Q-Learning with Model-based Acceleration

Collective
Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search

Asynchronous Methods for Deep Reinforcement Learning

Learning
Continuous Control Policies by Stochastic Value Gradients

机器翻译:

Simultaneous Machine Translation using Deep Reinforcement Learning

目标定位：

Active Object Localization with Deep Reinforcement Learning

目标驱动的视觉导航：

Target-driven
Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

自动调控参数：

Using
Deep Q-Learning to Control Optimization Hyperparameters

人机对话：

Deep
Reinforcement Learning for Dialogue Generation

SimpleDS:
A Simple Deep Reinforcement Learning Dialogue System

Strategic
Dialogue Management via Deep Reinforcement Learning

Towards
End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning