mjqm-simulator

Policy Comparison Guide

This guide explains how to systematically compare scheduling policies, interpret performance metrics, and conduct rigorous analysis of simulation results.

Overview

Comparing scheduling policies requires understanding multiple performance dimensions: aggregate efficiency, per-class fairness, stability boundaries, and parameter sensitivity. This guide provides the methodology for conducting such comparisons, focusing on measurement techniques and interpretation frameworks rather than prescriptive recommendations.

This guide covers:

Key performance metrics

Response time

Definition: Total time from job arrival to completion (waiting time + service time).

Interpretation: Directly measures user-perceived performance. Response time includes both queueing delay and actual service duration, making it sensitive to both system load and service time characteristics.

Usage in comparison:

What to look for:

Waiting time

Definition: Time spent in queue before service begins (excludes service time).

Interpretation: Isolates queueing delay from service requirements, making it particularly sensitive to scheduling policy and system saturation. Pure indicator of scheduling efficiency.

Usage in comparison:

What to look for:

Throughput

Definition: Rate at which jobs complete service.

Interpretation: In stable operation, throughput equals arrival rate (all arriving jobs are eventually served). Deviations indicate instability or insufficient simulation duration.

Usage in comparison:

What to look for:

Queue length

Definition: Number of jobs waiting in buffer at a given time.

Interpretation: Direct indicator of congestion. Grows super-linearly as the system approaches capacity.

Usage in comparison:

What to look for:

Utilization

Definition: Fraction of servers occupied by jobs in service.

Interpretation: Measures resource usage efficiency. Maximum sustainable utilization depends on policy mechanisms and workload characteristics.

Usage in comparison:

Kleinrock’s Power Metric

Definition: $P(\lambda) = X(\lambda) / R(\lambda)$ where $X$ is throughput and $R$ is response time.

Interpretation: Balances throughput against delay. The arrival rate $\lambda^*$ maximising power represents the operational stability boundary: beyond this point, throughput grows slowly while delay grows super-linearly. Kleinrock, 1979

Usage in comparison:

What to look for:

Comparison methodologies

Aggregate comparison

Approach: Compare policies at identical arrival rates, examining system-wide averages.

When to use:

Procedure:

  1. Define arrival rate sweep from light load to beyond expected stability
  2. Run all policies at same arrival rates (use pivot configuration)
  3. Plot metrics (waiting time, response time, utilization) vs arrival rate
  4. Identify stability boundaries using Kleinrock’s power metric

Limitations: May obscure per-class disparities; a policy with excellent aggregate metrics may severely disadvantage certain job classes.

Per-class comparison

Approach: Disaggregate metrics by job class to reveal fairness characteristics.

When to use:

Procedure:

  1. Select representative arrival rate (typically where most policies are stable but load is non-negligible)
  2. Extract waiting time, response time, and throughput for each job class
  3. Compare across policies for same class
  4. Compare across classes for same policy
  5. Quantify disparity using coefficient of variation or max/min ratios

Key per-class metrics:

Fairness quantification

Coefficient of variation (CV)

\[CV = \frac{\sigma}{\mu} = \frac{\text{Std. Dev. of per-class waiting times}}{\text{Mean waiting time}}\]

Lower CV values indicate more uniform treatment across job classes. As a rough guide: CV < 2 suggests uniform treatment, 2–4 moderate dispersion, and > 4 severe dispersion.

Waiting time ratios

Compare worst-treated class to best-treated class:

\[\text{Ratio} = \frac{\max_i W_i}{\min_i W_i}\]

Higher ratios indicate greater disparity in treatment between different job classes.

Identifying stability boundaries

Kleinrock’s Power Metric approach

Method: Compute power metric for each arrival rate and identify the peak.

Procedure:

  1. For each policy, compute $P(\lambda) = X(\lambda) / R(\lambda)$ across all tested arrival rates
  2. Find $\lambda^$ where power is maximized: $\lambda^ = \arg\max_\lambda P(\lambda)$
  3. Record corresponding utilization at $\lambda^*$

Advantages:

Visual identification

Alternative method: Plot waiting time vs arrival rate and identify exponential growth onset.

Procedure:

  1. Plot mean waiting time vs arrival rate (log-log scale)
  2. Identify where curve transitions from linear to super-linear
  3. Mark this transition as stability boundary

Challenges:

Theoretical policy properties

Different scheduling policies have inherent theoretical properties that affect their behavior:

FIFO (First-In, First-Out)

Theoretical properties:

SMASH (SMAll SHuffle)

Theoretical properties:

Back Filling

Theoretical properties:

Server Filling

Theoretical properties:

Most Server First

Theoretical properties:

Quick Swap, Adaptive MSF, Static MSF

Theoretical properties:

Visualization strategies

Log-log plots

When to use: Metrics spanning multiple orders of magnitude (waiting time, queue length).

Advantages:

What to plot:

Interpretation:

Linear plots with marked boundaries

When to use: Utilization, throughput, metrics with bounded ranges.

Enhancements:

Per-class comparisons

Approach: Bar charts or grouped scatter plots showing metrics for each class.

X-axis: Job class (often by server requirement)

Y-axis: Waiting time, response time, or throughput

Grouping: One group per policy

What to look for:

Heatmaps

Approach: 2D heatmap with policies on one axis, metrics or classes on the other.

Use cases:

Statistical considerations

Replication

Recommendation: 20-40 independent replications per configuration.

Why:

Simulation length

Recommendation: Sufficient events for steady-state convergence.

Guidelines:

Verification: Plot metrics vs time to ensure convergence (should plateau).

Confidence intervals

Computation: Standard error of mean across replications, 95% CI.

Presentation: Show error bars or confidence bands on plots.

Interpretation: Non-overlapping CIs indicate statistically significant differences.

Practical workflow

Step 1: Define experimental parameters

Step 2: Configure pivot sweep

[[pivot]]
arrival.rate = [10, 20, 30, 40, 50, 60, 80, 100, 120, 150, 180, 200]

policy = [
    "fifo",
    "back filling",
    "server filling memoryful",
    "most server first",
    { name = "smash", window = 5 }
]

Step 3: Run experiments

./simulator policy_comparison

Step 4: Identify stability boundaries

For each policy:

  1. Load results
  2. Compute Kleinrock’s power metric: $P(\lambda) = X(\lambda) / R(\lambda)$
  3. Find $\lambda^* = \arg\max P(\lambda)$
  4. Record utilization at $\lambda^*$

Step 5: Aggregate comparison

Plot for each metric:

Step 6: Per-class analysis

Select arrival rate where most policies are stable:

  1. Extract per-class waiting times for each policy
  2. Plot grouped bar chart: classes on X-axis, waiting time on Y-axis, one group per policy
  3. Compute fairness metrics (CV, max/min ratio)
  4. Create fairness vs efficiency scatter plot

Step 7: Interpret and select

Questions to answer:

Selection criteria:

Advanced topics

Parameter sensitivity

For policies with parameters (SMASH window, Quick Swap threshold), sweep parameter values:

[[pivot]]
policy.window = [1, 2, 5, 10, 20, 50]  # For SMASH

Plot performance vs parameter value to identify optimal settings for your workload.

Workload characterization

Workload heterogeneity impacts policy performance. Characterize your workload using:

Metrics:

Impact on policies:

Multi-dimensional optimization

Create Pareto frontier showing trade-offs:

Policies on the frontier are optimal for some trade-off preference; interior points are dominated.

Common pitfalls

Insufficient simulation length

Problem: Results haven’t reached steady state.

Symptom: Metrics still changing at end of simulation.

Solution: Increase events parameter, verify convergence by plotting metrics over time.

Testing only light load

Problem: All policies perform similarly at low load.

Symptom: No performance differentiation observed.

Solution: Extend arrival rate sweep to higher loads, include rates beyond expected stability.

Ignoring per-class metrics

Problem: Aggregate metrics hide fairness issues.

Symptom: Policy with best aggregate performance has extreme per-class disparities.

Solution: Always examine per-class waiting times, compute fairness metrics.

Comparing at different loads

Problem: Comparing policies at their respective “optimal” loads.

Symptom: Unfair comparison (each policy at different utilization).

Solution: Compare at matched arrival rates, or at matched utilization levels.

Over-interpreting small differences

Problem: Declaring significance without statistical testing.

Symptom: Choosing policy based on minor average differences within confidence intervals.

Solution: Use confidence intervals, increase replications, test statistical significance.

Further reading