*banner
 

Markov Decision Processes with Multiple Objectives
Krishnendu Chatterjee, Rupak Majumdar, Tom Henzinger

Citation
Krishnendu Chatterjee, Rupak Majumdar, Tom Henzinger. "Markov Decision Processes with Multiple Objectives". STACS, February, 2006.

Abstract
We consider Markov decision processes (MDPs) with multiple discounted reward objectives. Such MDPs occur in design problems where one wishes to simultaneously optimize several criteria, for example, latency and power. The possible trade-offs between the different objectives are characterized by the Pareto curve. We show that every Pareto-optimal point can be achieved by a memoryless strategy; however, unlike in the single-objective case, the memoryless strategy may require randomization. Moreover, we show that the Pareto curve can be approximated in polynomial time in the size of the MDP. Additionally, we study the problem if a given value vector is realizable by any strategy, and show that it can be decided in polynomial time; but the question whether it is realizable by a deterministic memoryless strategy is NP-complete. These results provide efficient algorithms for design exploration in MDP models with multiple objectives.

Electronic downloads

Citation formats  
  • HTML
    Krishnendu Chatterjee, Rupak Majumdar, Tom Henzinger. <a
    href="http://chess.eecs.berkeley.edu/pubs/81.html"
    >Markov Decision Processes with Multiple
    Objectives</a>, STACS, February, 2006.
  • Plain text
    Krishnendu Chatterjee, Rupak Majumdar, Tom Henzinger.
    "Markov Decision Processes with Multiple
    Objectives". STACS, February, 2006.
  • BibTeX
    @inproceedings{ChatterjeeMajumdarHenzinger06_MarkovDecisionProcessesWithMultipleObjectives,
        author = {Krishnendu Chatterjee and Rupak Majumdar and Tom
                  Henzinger},
        title = {Markov Decision Processes with Multiple Objectives},
        booktitle = {STACS},
        month = {February},
        year = {2006},
        abstract = {We consider Markov decision processes (MDPs) with
                  multiple discounted reward objectives. Such MDPs
                  occur in design problems where one wishes to
                  simultaneously optimize several criteria, for
                  example, latency and power. The possible
                  trade-offs between the different objectives are
                  characterized by the Pareto curve. We show that
                  every Pareto-optimal point can be achieved by a
                  memoryless strategy; however, unlike in the
                  single-objective case, the memoryless strategy may
                  require randomization. Moreover, we show that the
                  Pareto curve can be approximated in polynomial
                  time in the size of the MDP. Additionally, we
                  study the problem if a given value vector is
                  realizable by any strategy, and show that it can
                  be decided in polynomial time; but the question
                  whether it is realizable by a deterministic
                  memoryless strategy is NP-complete. These results
                  provide efficient algorithms for design
                  exploration in MDP models with multiple
                  objectives. },
        URL = {http://chess.eecs.berkeley.edu/pubs/81.html}
    }
    

Posted by Krishnendu Chatterjee, PhD on 9 May 2006.
Groups: chess
For additional information, see the Publications FAQ or contact webmaster at chess eecs berkeley edu.

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.

©2002-2018 Chess