Command Line Interface¶

The CLI provides a convenient way to run sortition algorithms without writing Python code. It's ideal for:

One-off selections: Quick panel selections for events or research
Batch processing: Running multiple selections with scripts
Non-programmers: Teams who prefer command-line tools
Integration: Incorporating sortition into existing workflows

Installation¶

Install the CLI with optional dependencies:

# Basic installation
pip install 'sortition-algorithms[cli]'

# With Gurobi support for leximin algorithm
pip install 'sortition-algorithms[cli,gurobi]'

Quick Start¶

# Check installation
python -m sortition_algorithms --help

# Basic CSV selection
python -m sortition_algorithms csv \
  --settings config.toml \
  --features-csv demographics.csv \
  --people-csv candidates.csv \
  --selected-csv selected.csv \
  --remaining-csv remaining.csv \
  --number-wanted 100

Commands Overview¶

The CLI provides three main commands:

$ python -m sortition_algorithms --help
Usage: python -m sortition_algorithms [OPTIONS] COMMAND [ARGS]...

  A command line tool to exercise the sortition algorithms.

Options:
  --help  Show this message and exit.

Commands:
  csv         Do sortition with CSV files
  gen-sample  Generate sample CSV file compatible with features
  gsheet      Do sortition with Google Spreadsheets

CSV Workflow¶

The most common usage pattern for working with local CSV files.

Command Reference¶

$ python -m sortition_algorithms csv --help
Usage: python -m sortition_algorithms csv [OPTIONS]

  Do sortition with CSV files.

Options:
  -S, --settings FILE             Settings file (TOML format) [required]
  -f, --features-csv FILE         CSV with demographic features [required]
  -p, --people-csv FILE           CSV with candidate pool [required]
  -s, --selected-csv FILE         Output: selected people [required]
  -r, --remaining-csv FILE        Output: remaining people [required]
  -n, --number-wanted INTEGER     Number of people to select [required]
  --help                          Show this message and exit.

Example Files¶

demographics.csv (feature definitions):

feature,value,min,max
Gender,Male,45,55
Gender,Female,45,55
Age,18-30,20,30
Age,31-50,35,45
Age,51+,25,35
Location,Urban,40,60
Location,Rural,40,60

candidates.csv (candidate pool):

id,Name,Email,Gender,Age,Location,Address,Postcode
p001,Alice Smith,alice@email.com,Female,18-30,Urban,123 Main St,12345
p002,Bob Jones,bob@email.com,Male,31-50,Rural,456 Oak Ave,67890
p003,Carol Davis,carol@email.com,Female,51+,Urban,789 Pine Rd,12345
...

config.toml (settings):

# Reproducible results
random_number_seed = 42

# Household diversity
check_same_address = true
check_same_address_columns = ["Address", "Postcode"]

# Algorithm choice
selection_algorithm = "maximin"
max_attempts = 10

# Output customization
columns_to_keep = ["Name", "Email", "Phone"]
id_column = "id"

Basic Selection¶

python -m sortition_algorithms csv \
  --settings config.toml \
  --features-csv demographics.csv \
  --people-csv candidates.csv \
  --selected-csv selected.csv \
  --remaining-csv remaining.csv \
  --number-wanted 100

Using Environment Variables¶

Set commonly used paths as environment variables:

export SORTITION_SETTINGS="config.toml"
export SORTITION_FEATURES="demographics.csv"
export SORTITION_PEOPLE="candidates.csv"

python -m sortition_algorithms csv \
  -s selected.csv \
  -r remaining.csv \
  -n 100

Batch Processing¶

Create a script for multiple selections:

#!/bin/bash
# batch_selection.sh

SETTINGS="config.toml"
FEATURES="demographics.csv"
PEOPLE="candidates.csv"
SIZES=(50 75 100 125 150)

for size in "${SIZES[@]}"; do
    echo "Selecting $size people..."
    python -m sortition_algorithms csv \
        --settings "$SETTINGS" \
        --features-csv "$FEATURES" \
        --people-csv "$PEOPLE" \
        --selected-csv "selected_${size}.csv" \
        --remaining-csv "remaining_${size}.csv" \
        --number-wanted "$size"
done

Note that to actually do selections with multiple sizes, you'd also need to be adjusting the min and max values in the quotas at the same time.

Google Sheets Workflow¶

For organizations using Google Sheets for data management.

Setup Requirements¶

Google Cloud Project: Create a project in Google Cloud Console
Enable APIs: Enable Google Sheets API and Google Drive API
Service Account: Create service account credentials
Share Sheet: Share your spreadsheet with the service account email

Command Reference¶

$ python -m sortition_algorithms gsheet --help
Usage: python -m sortition_algorithms gsheet [OPTIONS]

  Do sortition with Google Spreadsheets.

Options:
  -S, --settings FILE             Settings file (TOML format) [required]
  --auth-json-file FILE           Google API credentials JSON [required]
  --gen-rem-tab / --no-gen-rem-tab Generate 'Remaining' tab [default: true]
  -g, --gsheet-name TEXT          Spreadsheet name [required]
  -f, --feature-tab-name TEXT     Features tab name [default: Categories]
  -p, --people-tab-name TEXT      People tab name [default: Categories]
  -s, --selected-tab-name TEXT    Selected output tab [default: Selected]
  -r, --remaining-tab-name TEXT   Remaining output tab [default: Remaining]
  -n, --number-wanted INTEGER     Number of people to select [required]
  --help                          Show this message and exit.

Authentication Setup¶

Download service account credentials JSON file
Never commit this file to version control
Store securely and reference by path

Example Usage¶

python -m sortition_algorithms gsheet \
  --settings config.toml \
  --auth-json-file /secure/path/credentials.json \
  --gsheet-name "Citizen Panel 2024" \
  --feature-tab-name "Demographics" \
  --people-tab-name "Candidates" \
  --selected-tab-name "Selected Panel" \
  --remaining-tab-name "Reserve Pool" \
  --number-wanted 120

Spreadsheet Structure¶

Your Google Sheet should have tabs structured like this:

Demographics tab:

feature	value	min	max
Gender	Male	45	55
Gender	Female	45	55
Age	18-30	20	30

Candidates tab:

id	Name	Email	Gender	Age	Location
p001	Alice	alice@email.com	Female	18-30	Urban
p002	Bob	bob@email.com	Male	31-50	Rural

Sample Generation¶

Generate test data compatible with your feature definitions.

Command Reference¶

$ python -m sortition_algorithms gen-sample --help
Usage: python -m sortition_algorithms gen-sample [OPTIONS]

  Generate sample CSV file compatible with features and settings.

Options:
  -S, --settings FILE             Settings file [required]
  -f, --features-csv FILE         Features CSV file [required]
  -p, --people-csv FILE           Output: generated people CSV [required]
  -n, --number-wanted INTEGER     Number of people to generate [required]
  --help                          Show this message and exit.

Example Usage¶

# Generate 500 sample people
python -m sortition_algorithms gen-sample \
  --settings config.toml \
  --features-csv demographics.csv \
  --people-csv sample_candidates.csv \
  --number-wanted 500

This creates a CSV with realistic synthetic data that matches your feature definitions - useful for testing quotas and algorithms.

Configuration Files¶

Settings File Format¶

All settings are optional and have sensible defaults:

# config.toml

# Randomization
random_number_seed = 42  # Set for reproducible results, omit for random

# Address checking for household diversity
check_same_address = true
check_same_address_columns = ["Address", "Postcode", "City"]

# Algorithm selection
selection_algorithm = "maximin"  # "maximin", "nash", "leximin", "legacy"
max_attempts = 10

# Output customization
columns_to_keep = ["Name", "Email", "Phone", "Notes"]
id_column = "id"  # Column name containing unique IDs

Algorithm Comparison¶

Algorithm	Pros	Cons	Use Case
`maximin`	Fair to minorities	May not optimize overall	Default choice
`nash`	Balanced overall	Complex optimization	Large diverse pools
`leximin`	Strongest fairness	Requires Gurobi license	Academic/research
`legacy`	Backwards compatible	Less sophisticated	Historical consistency

Common Workflows¶

Standard Selection Process¶

# 1. Prepare your data files
# 2. Configure settings
# 3. Run selection
python -m sortition_algorithms csv \
  --settings config.toml \
  --features-csv demographics.csv \
  --people-csv candidates.csv \
  --selected-csv selected.csv \
  --remaining-csv remaining.csv \
  --number-wanted 100

# 4. Review results
head selected.csv
wc -l remaining.csv

With Address Checking¶

Ensure household diversity by preventing multiple selections from the same address:

# config.toml
check_same_address = true
check_same_address_columns = ["Address", "Postcode"]

Reproducible Selections¶

For auditable results, use a fixed random seed:

# config.toml
random_number_seed = 20241214  # Use today's date or similar

Testing Quotas¶

Use sample generation to test if your quotas are achievable:

# Generate large sample
python -m sortition_algorithms gen-sample \
  --settings config.toml \
  --features-csv demographics.csv \
  --people-csv test_pool.csv \
  --number-wanted 1000

# Test selection
python -m sortition_algorithms csv \
  --settings config.toml \
  --features-csv demographics.csv \
  --people-csv test_pool.csv \
  --selected-csv test_selected.csv \
  --remaining-csv test_remaining.csv \
  --number-wanted 100

Troubleshooting¶

Common Errors¶

"Selection failed"

Check that quota minimums don't exceed panel size
Verify feature values match between files
Review constraint feasibility

"File not found"

Use absolute paths or verify working directory
Check file permissions
Ensure files exist before running

"Invalid feature values"

Verify exact string matching between demographics.csv and candidates.csv
Check for typos, case sensitivity, extra spaces
Review non-ASCII characters

"Authentication failed" (Google Sheets)

Verify credentials.json is correct and accessible
Check that service account has access to the spreadsheet
Ensure APIs are enabled in Google Cloud Console

Debug Tips¶

Verbose output: The CLI provides detailed messages about the selection process.

Test with smaller numbers: If selection fails, try reducing --number-wanted to isolate the issue.

Check intermediate files: Use gen-sample to create test data and verify your workflow.

Environment variables: Set SORTITION_SETTINGS to avoid repeating file paths.

Getting Help¶

# General help
python -m sortition_algorithms --help

# Command-specific help
python -m sortition_algorithms csv --help
python -m sortition_algorithms gsheet --help
python -m sortition_algorithms gen-sample --help

Integration Examples¶

Shell Scripts¶

#!/bin/bash
# run_selection.sh

set -e  # Exit on error

echo "Starting selection process..."

python -m sortition_algorithms csv \
  --settings "${SORTITION_SETTINGS}" \
  --features-csv "${FEATURES_FILE}" \
  --people-csv "${PEOPLE_FILE}" \
  --selected-csv "selected_$(date +%Y%m%d).csv" \
  --remaining-csv "remaining_$(date +%Y%m%d).csv" \
  --number-wanted "${PANEL_SIZE}"

echo "Selection completed successfully!"

CI/CD Integration¶

# .github/workflows/selection.yml
name: Run Selection
on:
  workflow_dispatch:
    inputs:
      panel_size:
        description: "Number of people to select"
        required: true
        default: "100"

jobs:
  select:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: "3.11"
      - run: pip install 'sortition-algorithms[cli]'
      - run: |
          python -m sortition_algorithms csv \
            --settings config.toml \
            --features-csv data/demographics.csv \
            --people-csv data/candidates.csv \
            --selected-csv selected.csv \
            --remaining-csv remaining.csv \
            --number-wanted ${{ github.event.inputs.panel_size }}
      - uses: actions/upload-artifact@v3
        with:
          name: selection-results
          path: |
            selected.csv
            remaining.csv

Next Steps¶

Core Concepts - Understand the theory behind sortition
API Reference - For programmatic usage
Data Adapters - Custom data sources and formats
Advanced Usage - Complex scenarios and optimization