Command Line Interface¶
The CLI provides a convenient way to run sortition algorithms without writing Python code. It's ideal for:
- One-off selections: Quick panel selections for events or research
- Sample code: The code in the command line functions can be the basis for writing your own implementation.
- Batch processing: Running multiple selections with scripts
- Non-programmers: Teams who prefer command-line tools
- Integration: Incorporating sortition into existing workflows
Installation¶
Install the CLI with optional dependencies:
# Basic installation
pip install 'sortition-algorithms[cli]'
# With Gurobi support for leximin algorithm
pip install 'sortition-algorithms[cli,gurobi]'
Quick Start¶
# Check installation
python -m sortition_algorithms --help
# Basic CSV selection
python -m sortition_algorithms csv \
--settings config.toml \
--features-csv demographics.csv \
--people-csv candidates.csv \
--selected-csv selected.csv \
--remaining-csv remaining.csv \
--number-wanted 100
Commands Overview¶
The CLI provides three main commands:
$ python -m sortition_algorithms --help
Usage: python -m sortition_algorithms [OPTIONS] COMMAND [ARGS]...
A command line tool to exercise the sortition algorithms.
Options:
--help Show this message and exit.
Commands:
csv Do sortition with CSV files
gen-sample Generate sample CSV file compatible with features
gsheet Do sortition with Google Spreadsheets
CSV Workflow¶
The most common usage pattern for working with local CSV files.
Command Reference¶
$ python -m sortition_algorithms csv --help
Usage: python -m sortition_algorithms csv [OPTIONS]
Do sortition with CSV files.
Options:
-S, --settings FILE Settings file (TOML format) [required]
-f, --features-csv FILE CSV with demographic features [required]
-p, --people-csv FILE CSV with candidate pool [required]
-s, --selected-csv FILE Output: selected people [required]
-r, --remaining-csv FILE Output: remaining people [required]
-n, --number-wanted INTEGER Number of people to select [required]
-v, --verbose Produce extra detailed logging.
--no-progress Suppress the live progress display.
--help Show this message and exit.
Live progress display¶
When stdout is a terminal the CLI shows a single-line progress display backed by rich — spinner, phase description, bar, completion count, and elapsed time. It updates in place as the selection runs through each algorithm phase, so you can see at a glance whether a long run is healthy or wedged.
The display is suppressed automatically when stdout is piped or
redirected, and you can force-suppress it on a TTY with --no-progress
(useful for log scraping or when you want plain text only). --verbose
controls log verbosity and is independent of the progress display.
The progress display is powered by RichProgressReporter from
sortition_algorithms.progress_rich, which is also available for use
from your own scripts. If you're embedding the library in another
application, see Progress Reporting for the full
ProgressReporter protocol and recipes for routing events to a
database, WebSocket, or other sink.
Example Files¶
demographics.csv (feature definitions):
feature,value,min,max
Gender,Male,45,55
Gender,Female,45,55
Age,18-30,20,30
Age,31-50,35,45
Age,51+,25,35
Location,Urban,40,60
Location,Rural,40,60
candidates.csv (candidate pool):
id,Name,Email,Gender,Age,Location,Address,Postcode
p001,Alice Smith,alice@email.com,Female,18-30,Urban,123 Main St,12345
p002,Bob Jones,bob@email.com,Male,31-50,Rural,456 Oak Ave,67890
p003,Carol Davis,carol@email.com,Female,51+,Urban,789 Pine Rd,12345
...
config.toml (settings):
id_column = "id"
# Output customization
columns_to_keep = ["Name", "Email", "Phone"]
# Household diversity
check_same_address = true
check_same_address_columns = ["Address", "Postcode"]
Learn more about the settings file.
Basic Selection¶
python -m sortition_algorithms csv \
--settings config.toml \
--features-csv demographics.csv \
--people-csv candidates.csv \
--selected-csv selected.csv \
--remaining-csv remaining.csv \
--number-wanted 100
Using Environment Variables¶
Set commonly used paths as environment variables:
export SORTITION_SETTINGS="config.toml"
python -m sortition_algorithms csv \
--features-csv demographics.csv \
--people-csv candidates.csv \
-s selected.csv \
-r remaining.csv \
-n 100
Batch Processing¶
Create a script for multiple selections:
#!/bin/bash
# batch_selection.sh
SETTINGS="config.toml"
FEATURES="demographics.csv"
PEOPLE="candidates.csv"
AREAS=(north south east west)
SIZE=50
for area in "${AREAS[@]}"; do
echo "Selecting $size people..."
python -m sortition_algorithms csv \
--settings "$SETTINGS" \
--features-csv "$FEATURES" \
--people-csv "candidates_${area}.csv" \
--selected-csv "selected_${area}.csv" \
--remaining-csv "remaining_${area}.csv" \
--number-wanted "$SIZE"
done
Google Sheets Workflow¶
For organizations using Google Sheets for data management.
Setup Requirements¶
- Google Cloud Project: Create a project in Google Cloud Console
- Enable APIs: Enable Google Sheets API and Google Drive API
- Service Account: Create service account credentials
- Share Sheet: Share your spreadsheet with the service account email
Command Reference¶
$ python -m sortition_algorithms gsheet --help
Usage: python -m sortition_algorithms gsheet [OPTIONS]
Do sortition with Google Spreadsheets.
Options:
-S, --settings FILE Settings file (TOML format) [required]
--auth-json-file FILE Google API credentials JSON [required]
--gen-rem-tab / --no-gen-rem-tab Generate 'Remaining' tab [default: true]
-g, --gsheet-name TEXT Spreadsheet name [required]
-f, --feature-tab-name TEXT Features tab name [default: Categories]
-p, --people-tab-name TEXT People tab name [default: Categories]
-s, --selected-tab-name TEXT Selected output tab [default: Selected]
-r, --remaining-tab-name TEXT Remaining output tab [default: Remaining]
-n, --number-wanted INTEGER Number of people to select [required]
--help Show this message and exit.
Authentication Setup¶
- Download service account credentials JSON file
- Never commit this file to version control
- Store securely and reference by path
Example Usage¶
python -m sortition_algorithms gsheet \
--settings config.toml \
--auth-json-file /secure/path/credentials.json \
--gsheet-name "Citizen Panel 2024" \
--feature-tab-name "Demographics" \
--people-tab-name "Candidates" \
--selected-tab-name "Selected Panel" \
--remaining-tab-name "Reserve Pool" \
--number-wanted 120
Spreadsheet Structure¶
Your Google Sheet should have tabs structured like this:
Demographics tab:
| feature | value | min | max |
|---|---|---|---|
| Gender | Male | 45 | 55 |
| Gender | Female | 45 | 55 |
| Age | 18-30 | 20 | 30 |
Candidates tab:
| id | Name | Gender | Age | Location | |
|---|---|---|---|---|---|
| p001 | Alice | alice@email.com | Female | 18-30 | Urban |
| p002 | Bob | bob@email.com | Male | 31-50 | Rural |
Sample Generation¶
Generate test data compatible with your feature definitions.
Command Reference¶
$ python -m sortition_algorithms gen-sample --help
Usage: python -m sortition_algorithms gen-sample [OPTIONS]
Generate sample CSV file compatible with features and settings.
Options:
-S, --settings FILE Settings file [required]
-f, --features-csv FILE Features CSV file [required]
-p, --people-csv FILE Output: generated people CSV [required]
-n, --number-wanted INTEGER Number of people to generate [required]
--help Show this message and exit.
Example Usage¶
# Generate 500 sample people
python -m sortition_algorithms gen-sample \
--settings config.toml \
--features-csv demographics.csv \
--people-csv sample_candidates.csv \
--number-wanted 500
This creates a CSV with realistic synthetic data that matches your feature definitions - useful for testing quotas and algorithms.
Configuration Files¶
Settings File Format¶
All settings apart from id_column and columns_to_keep are optional and have sensible defaults:
# config.toml
id_column = "id" # Column name containing unique IDs
# Output customization
columns_to_keep = ["Name", "Email", "Phone", "Notes"]
# Address checking for household diversity
check_same_address = true
check_same_address_columns = ["Address", "Postcode", "City"]
# Algorithm selection
selection_algorithm = "maximin" # "maximin", "nash", "leximin", "legacy"
Learn more about the settings file.
Common Workflows¶
Standard Selection Process¶
# 1. Prepare your data files
# 2. Configure settings
# 3. Run selection
python -m sortition_algorithms csv \
--settings config.toml \
--features-csv demographics.csv \
--people-csv candidates.csv \
--selected-csv selected.csv \
--remaining-csv remaining.csv \
--number-wanted 100
# 4. Review results
head selected.csv
wc -l remaining.csv
With Address Checking¶
Ensure household diversity by preventing multiple selections from the same address:
Reproducible Selections¶
For auditable results, use a fixed random seed:
Testing Quotas¶
Use sample generation to test if your quotas are achievable:
# Generate large sample
python -m sortition_algorithms gen-sample \
--settings config.toml \
--features-csv demographics.csv \
--people-csv test_pool.csv \
--number-wanted 1000
# Test selection
python -m sortition_algorithms csv \
--settings config.toml \
--features-csv demographics.csv \
--people-csv test_pool.csv \
--selected-csv test_selected.csv \
--remaining-csv test_remaining.csv \
--number-wanted 100
Troubleshooting¶
Common Errors¶
"Selection failed"
- Check that the sum of quota minimums for any given features don't exceed panel size (or that maximums are smaller than the panel size).
- Verify feature values match between files.
- Review constraint feasibility.
"File not found"
- Use absolute paths or verify working directory.
- Check file permissions.
- Ensure files exist before running.
"Invalid feature values"
- Verify exact string matching between demographics.csv and candidates.csv
- Check for typos, case sensitivity, extra spaces
- Review non-ASCII characters
"Authentication failed" (Google Sheets)
- Verify
credentials.jsonis correct and accessible - Check that service account has access to the spreadsheet
- Ensure APIs are enabled in Google Cloud Console
Next Steps¶
- Core Concepts - Understand the theory behind sortition
- API Reference - For programmatic usage
- Data Adapters - Custom data sources and formats
- Advanced Usage - Complex scenarios and optimization