Benchmark Results¶

This page documents performance comparisons between solver backends for the sortition algorithms.

Test Environment¶

Date: January 2025
Platform: Linux 6.8.0-90-generic
Python: 3.11
Packages: highspy (HiGHS solver), mip 1.15.0 (CBC solver)

Maximin Algorithm: highspy vs mip¶

Small Pool (150 people, panel size 22)¶

uv run python -m benchmarks.profile_solvers \
    --backends highspy mip \
    --algorithms maximin \
    --runs 5

Backend	Mean Time	Max Memory
highspy	3.25s	0.5MB
mip	1.40s	4.6MB

Result: MIP is 2.3x faster, but uses more memory on first run (JIT warmup).

Medium Pool (300 people, panel size 45)¶

uv run python -m benchmarks.profile_solvers \
    --backends highspy mip \
    --algorithms maximin \
    --sizes 300 \
    --runs 3

Backend	Mean Time	Max Memory
highspy	30.3s	1.1MB
mip	9.6s	1.8MB

Result: MIP is 3.2x faster.

Large Pool (500 people, panel size 75)¶

uv run python -m benchmarks.profile_solvers \
    --backends highspy mip \
    --algorithms maximin \
    --sizes 500 \
    --runs 3

Backend	Mean Time	Max Memory
highspy	148.1s	1.8MB
mip	41.8s	1.9MB

Result: MIP is 3.5x faster.

Extra Large Pool (750 people, panel size 112)¶

uv run python -m benchmarks.profile_solvers \
    --backends highspy mip \
    --algorithms maximin \
    --sizes 750 \
    --runs 3

Backend	Mean Time	Max Memory
highspy	261.1s	7.2MB
mip	119.1s	7.4MB

Result: MIP is 2.2x faster.

Tight Constraints (300 people, panel size 45)¶

Testing with narrow min/max gaps to stress the solvers:

uv run python -m benchmarks.profile_solvers \
    --backends highspy mip \
    --algorithms maximin \
    --sizes 300 \
    --runs 3 \
    --tight-constraints

Backend	Mean Time	Max Memory
highspy	17.3s	0.9MB
mip	4.5s	1.7MB

Result: MIP is 3.9x faster with tight constraints.

Summary¶

Performance Comparison¶

Pool Size	Panel Size	highspy	mip	MIP Speedup
150	22	3.25s	1.40s	2.3x
300	45	30.3s	9.6s	3.2x
500	75	148.1s	41.8s	3.5x
750	112	261.1s	119.1s	2.2x

Key Findings¶

MIP (CBC) is consistently faster than highspy (HiGHS) for the maximin algorithm, by a factor of 2-4x depending on problem size.
Memory usage is comparable between both backends after initial warmup.
Scaling behaviour is similar for both backends (roughly O(n²) with pool size), but MIP maintains its performance advantage across all sizes tested.
Tight constraints make problems harder for both solvers, but MIP's advantage increases slightly.

Recommendations¶

For production use with maximin algorithm, consider using solver_backend = "mip" if execution time is critical and the mip package is available.
For ease of deployment, solver_backend = "highspy" (the default) requires no additional dependencies.
These results are specific to the maximin algorithm. Other algorithms (nash, diversimax, leximin) may show different characteristics.

Possible Explanations for Performance Difference¶

The performance gap is unexpected since HiGHS is generally considered a fast solver. Possible factors:

Column generation pattern: The maximin algorithm iteratively adds constraints. CBC may handle this pattern more efficiently than HiGHS.
Python binding overhead: The highspy bindings may have different overhead characteristics than python-mip.
Default parameters: HiGHS and CBC have different default solver parameters that may favour different problem types.
Problem structure: The committee selection ILP may have structure that CBC exploits better.

Further Work¶

Areas for additional investigation:

Algorithm Comparison¶

Nash algorithm: Does mip maintain its advantage for Nash welfare optimization?
Diversimax algorithm: How do the backends compare for the diversity-maximizing approach?
Leximin algorithm: Currently requires Gurobi for the dual LP; could benefit from pure HiGHS/mip implementation.

Performance Tuning¶

HiGHS parameters: Test different solver parameters (presolve settings, cutting planes, etc.) to see if HiGHS can be tuned for better performance on this problem type.
Warm starting: Investigate whether warm-starting the solver between iterations could improve performance.
Model formulation: The column generation pattern may benefit from different constraint formulations.

Scaling Analysis¶

Larger pools: Test with 1000+ people to understand asymptotic scaling behaviour.
More features: Test with additional demographic features to see how constraint complexity affects performance.
Panel size ratios: Test different panel-to-pool ratios (e.g., 10%, 20%, 30%).

Memory Profiling¶

Detailed memory analysis: Use memray to generate flamegraphs and identify memory hotspots.
Peak vs sustained memory: Understand whether memory spikes are transient or sustained.

Real-World Validation¶

Production datasets: Test with actual sortition datasets (with appropriate anonymisation).
Constraint patterns: Understand which constraint patterns are common in practice and optimise for those.

Raw Data¶

Full benchmark results are saved in benchmarks/results/ as CSV and JSON files with timestamps.