Zur Hauptnavigation wechseln Zur Suche wechseln Zum Hauptinhalt wechseln

XAI and Strategy Extraction via Reward Redistribution

Aktivität: Vortrag oder PräsentationVortrag nach Bewerbung und AuswahlScience-to-science

Beschreibung

Assigning credit for a received reward to previously performed actions is one of the central tasks in reinforcement learning. Credit assignment often uses world models, either in a forward or in a backward view. In a forward view, the future return is estimated by replacing the environment through a model or by rolling out sequences until episode end. A backward view either learns a backward model or performs a backward analysis of a forward model that predicts or models the return of an episode. Our method RUDDER performs a backward analysis to construct a reward redistribution to credit those actions that caused a reward. Its extension Align-RUDDER learns a reward redistribution from few demonstrations. An optimal reward redistribution has zero expected future reward and, therefore, immediately credits actions for all they will cause. XAI aims at credit assignment, too, when asking what caused a model to produce a particular output given an input. Even further, XAI wants to know how and why a policy solved a task, why an agent is better than humans, why a decision was made. Humans best comprehend a strategy of an agent if all its actions are immediately evaluated and do not have hidden consequences in the future. Reward redistributions learned by RUDDER and Align-RUDDER help to understand task-solving strategies of both humans and machines.
Zeitraum18 Juli 2020
EreignistitelICML 2020
VeranstaltungstypKonferenz
OrtÖsterreichAuf Karte anzeigen

Wissenschaftszweige

  • 101031 Approximationstheorie
  • 102 Informatik
  • 305901 Computerunterstützte Diagnose und Therapie
  • 102033 Data Mining
  • 102032 Computational Intelligence
  • 101029 Mathematische Statistik
  • 102013 Human-Computer Interaction
  • 305905 Medizinische Informatik
  • 101028 Mathematische Modellierung
  • 101027 Dynamische Systeme
  • 101004 Biomathematik
  • 101026 Zeitreihenanalyse
  • 202017 Embedded Systems
  • 101024 Wahrscheinlichkeitstheorie
  • 305907 Medizinische Statistik
  • 102019 Machine Learning
  • 202037 Signalverarbeitung
  • 102018 Künstliche Neuronale Netze
  • 103029 Statistische Physik
  • 202036 Sensorik
  • 202035 Robotik
  • 106005 Bioinformatik
  • 106007 Biostatistik
  • 101019 Stochastik
  • 101018 Statistik
  • 101017 Spieltheorie
  • 101016 Optimierung
  • 102001 Artificial Intelligence
  • 101015 Operations Research
  • 102004 Bioinformatik
  • 101014 Numerische Mathematik
  • 102003 Bildverarbeitung

JKU-Schwerpunkte

  • Digital Transformation