- Home
- Documents
*Approximation Algorithms for Stochastic Optimization 2020-01-03آ â€¢ Approximation algorithms...*

prev

next

out of 144

View

0Download

0

Embed Size (px)

Approximation Algorithms for Stochastic Optimization Kamesh Munagala Duke University

Markov Decision Process • Set S of states of the system • Set A of actions

• If action a taken in state s: Reward Ra(s) System transitions to state q with probability pa(s,q)

s Action a

q

Reward = Ra(s)

Markov Decision Process • Set S of states of the system • Set A of actions

• If action a taken in state s: Reward Ra(s) drawn from known distributions System transitions to state q with probability pa(s,q)

• Input: Rewards and state transition matrices for each action Start state s Time horizon T

Policy for an MDP

• Maximize expected reward over T steps Expectation over stochastic nature of rewards and

state transitions

• Policy: Mapping from states S to actions A Specifies optimal action for each observed state

• Dynamic Programming [Bellman ‘54] Optimal policy computable in time poly(|S|,|A|,T)

This talk

• For many problems: |S| is exponentially large in problem parameters … or |A| is exponentially large Many examples to follow

• Simpler decision policies? Approximately optimal in a provable sense Efficient to compute and execute

Talk Overview

Classes of Decision Problems Stochastic Optimization

Covering/Ordering Problems

Scheduling Problems

Set Cover Variants

Multi-stage Optimization

Knapsack, Matchings,

Bandits

Machine Scheduling

Bayesian Auctions

Inventory Management

Classes of Decision Problems Stochastic Optimization

Covering/Ordering Problems

Scheduling Problems

Set Cover Variants

Multi-stage Optimization

Knapsack, Matchings,

Bandits

Machine Scheduling

Bayesian Auctions

Inventory Management

Linear Programming Relaxations!

Part 1. Maximum Value Problem • Really simple decision problem

Illustrate basic concepts Adaptive vs. Non-adaptive policies

• Non-adaptive policies Submodularity and the Greedy algorithm

• Adaptive policies LP Relaxation and “Weak Coupling” Rounding using Markov’s Inequality

• Duality Simple structure of LP optimum Gap between adaptive and non-adaptive policies

Part 2. Weakly Coupled LPs

• General technique via LP and Duality LP relaxation has very few constraints Dual yields infeasible policies with simple structure

• Examples Stochastic knapsack Stochastic matching Bayesian multi-item pricing

Part 3. Sampling Scenarios • Exponential sized LP over all possible “scenarios” of

underlying distributions

• Solve LP or its Lagrangian by sampling the scenarios

• Examples: 2-stage vertex cover Stochastic Steiner trees Bayesian auctions Solving LPs online

Part 4. Stochastic Scheduling

• New aspect of timing the actions

• Two techniques: ▫ Stronger LP relaxations than weak coupling Stochastic scheduling on identical machines Stochastic knapsack (not covered) ▫ Greedy policies Gittins index theorem

Important Disclaimer

By no means is this comprehensive!

Part 1. The Maximum Value Problem [Guha, Munagala ’07, ’09, Dean, Goemans, Vondrak ’04]

The Maximum Value Problem

• There is a gambler who is shown n boxes

▫ Box j has reward drawn from distribution Xj

▫ Gambler knows Xj but box is closed

▫ All distributions are independent

The Maximum Value Problem

X2 X3 X4 X5 X1

• Gambler knows all the distributions

• Distributions are independent

The Maximum Value Problem

X1 X3 X4 X5 20

Open some box, say Box 2

The Maximum Value Problem Open another box based on observing X2 = 20

Can open at most k boxes: • Payoff = Maximum reward observed in these k boxes Adaptivity: • Gambler can choose next box to open based on observations so far

X1 X3 X4 X5 20

Example: Bernoulli Boxes

X1

X2

50 with probability ½

60 with probability 1/3

X3 25 with probability 1

Gambler can open k = 2 boxes

Optimal Decision Policy

X1

0 with prob ½

X3 has expected payoff 25

X2 has expected payoff 60/3 = 20

X1 = B(50,1/2) X2 = B(60,1/3) X3 = B(25, 1)

Optimal Decision Policy

X1

0 with prob ½

X3

25

X1 = B(50,1/2) X2 = B(60,1/3) X3 = B(25, 1)

Optimal Decision Policy

X1

0 with prob ½ 50 with prob ½

X3

25

Guaranteed payoff = 50 So it is pointless to open X3

X1 = B(50,1/2) X2 = B(60,1/3) X3 = B(25, 1)

Optimal Decision Policy

X1

0 with prob ½ 50 with prob ½

X3

25

X2

2/3 1/3

50 60

Guaranteed payoff of 50

X1 = B(50,1/2) X2 = B(60,1/3) X3 = B(25, 1)

Optimal Decision Policy

X1

0 with prob ½ 50 with prob ½

X3

25

X2

2/3 1/3

50 60

Guaranteed payoff of 50

Expected Payoff = 25/2 + 50/3 + 60/6 = 39.167

X1 = B(50,1/2) X2 = B(60,1/3) X3 = B(25, 1)

Can Gambler be Non-adaptive? • Choose k boxes upfront before opening them

Open these boxes and obtain maximum value

• Best solution = Pick X1 and X3 upfront

Payoff = ½ ×50 + ½ ×25 = 37.5 < 39.167

Adaptively choosing next box after opening X1 is better!

Can Gambler be Non-adaptive? • Choose k boxes upfront before opening them

Open these boxes and obtain maximum value

• Best solution = Pick X1 and X3 upfront

Payoff = ½ ×50 + ½ ×25 = 37.5 < 39.167

Adaptively choosing next box after opening X1 is better!

Subtler point: It’s not that much better…

Benchmark

• Value of optimal decision policy (decision tree) Call this value OPT Optimal decision tree can have size exponential in k

• Can we design a: Polynomial time algorithm … that produces poly-sized decision tree … that approximates OPT?

Outline for Part 1

• Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and “adaptivity gap”

▫ Please ignore the constant factors!

• Later on: “Weakly coupled” decision systems Applications to matching, pricing, scheduling, …

Non-adaptive Algorithm Submodularity [Kempe, Kleinberg, Tardos ’03, …]

Non-adaptive Problem

• For any subset S of boxes, if gambler opens S non-adaptively, the payoff observed is

• Goal: Find S such that |S| ≤ k Maximize f(S)

f(S) = E

max

i2S Xi

�

Submodularity of Set Functions

S1 S1 S2 t

f (S1 [ {t})� f (S1) � f (S2 [ {t})� f (S2)

Also need non-negativity and monotonicity: f(S2) � f(S1) � 0

The Greedy Algorithm

S �

While |S| ≤ k : t argmaxq/2S (f(S [ {q})� f(S))

S S [ {t}

Output S

Classical Result [Nemhauser, Wolsey, Fisher ‘78]

• Greedy is a 1 – 1/e ≈ 0.632 approximation to the value of the optimal subset of size k

• Similar results hold even when: Different elements have different costs and there is a

budget on total cost of chosen set S General matroid constraints on chosen set S

Maximum Value is Submodular • Let D = Joint distribution of X1, X2, …, Xn • Consider any sample r drawn from D

Yields a sample of values v1r, v2r, ..., vnr Let Easy to check this is submodular

• f(S) is the expectation over samples r of f(S,r) Submodularity preserved under taking expectation!

• Note: Do not need independence of variables!

f(S, r) = max i2S

vir

More things that are Submodular • Payoff from many opened boxes [Guha, Munagala ‘07]

f(S) = E

" max

~x2[0,1]n; P

i2S sixiB

X

i2T X

i

#

More things that are Submodular • Payoff from many opened boxes [Guha, Munagala ‘07]

• Payoff = Minimi