>Agent Skill

@dukebw/dlrlss-2018

skillcode-quality

- RL with policy advice. Azar et al., ECML 2013.

performance

apm::install

$apm install @dukebw/dlrlss-2018

apm::skill.md

apm::badge

[![APM](https://img.shields.io/endpoint?url=https%3A%2F%2Fapm-p1ls2dz87-atlamors-projects.vercel.app%2Fapi%2Fbadge%2F%40dukebw%2Fdlrlss-2018&style=flat-square)](https://apm-p1ls2dz87-atlamors-projects.vercel.app/packages/@dukebw/dlrlss-2018)

# Transfer / meta / lifelong learning

- RL with policy advice. Azar et al., ECML 2013.

        - Reduction from RL to bandit problem.

- Regret bounds: sum of differences between actual policy and optimal policy.

- Regret scales with the number of tasks \sqrt(M), rather than the state and
  action space.

- Brunskill and Li, UAI 2013. Reduce from RL to (active) classification
  problem.

- https://cs.stanford.edu/people/ebrun

- Provably speeding multitask RL. Guo and Brunskill, AAAI 2015. K tasks sampled
  from M tasks. Evaluation goal: provably improve performance. Approach:
  quickly cluster, then share.

- Killian et al., NIPS 2017. Bayesian NNs for modeling MDP dynamics.

- Smooth latent policy space for crossdomain transfer.
  Anmar et al., IJCAI 2015. Limited theoretical results (some nice convergence
  results).

- Model-agnostic meta-learning. Finn et al., ICML 2017.