Files

Nathan Schneider 897c30406b Reorganize directory, add manual dataset and sync tooling

- Move all scripts to scripts/, web assets to web/, analysis results
  into self-contained data/readings/<type>_<YYYYMMDD>/ directories
- Add data/readings/manual_20260320/ with 32 JSON readings from
  git.medlab.host/ntnsndr/protocol-bicorder-data
- Add scripts/json_to_csv.py to convert bicorder JSON files to CSV
- Add scripts/sync_readings.sh for one-command sync + re-analysis of
  any dataset backed by a .sync_source config file
- Add scripts/classify_readings.py to apply the LDA classifier to all
  readings and save per-reading cluster assignments
- Add --min-coverage flag to multivariate_analysis.py for sparse/shortform
  datasets; also applies in lda_visualization.py
- Fix lda_visualization.py NaN handling and 0-d array annotation bug
- Update README.md and WORKFLOW.md to document datasets, sync workflow,
  shortform handling, and new scripts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-20 17:35:13 -06:00

4.3 KiB

Raw Blame History

Test Commands for Refactored Bicorder

Run these tests in order to verify the refactored code works correctly.

Test 1: Dry Run - Single Protocol

Test that prompts are generated correctly with protocol context:

python3 scripts/bicorder_query.py data/readings/synthetic_20251116/protocols_edited.csv 1 --dry-run | head -80

Expected result:

Should show "DRY RUN: Row 1, 23 gradients"
Should show protocol descriptor and description
Each prompt should include full protocol context
Should show 23 gradient prompts

Test 2: Verify CSV Structure

Check that the analyze script still creates proper CSV structure:

python3 scripts/bicorder_analyze.py data/readings/synthetic_20251116/protocols_edited.csv -o test_output.csv
head -1 test_output.csv | tr ',' '\n' | grep -E "(explicit|precise|elite)" | head -5

Expected result:

Should show gradient column names like:
- Design_explicit_vs_implicit
- Design_precise_vs_interpretive
- Design_elite_vs_vernacular

Test 3: Single Gradient Query (Real LLM Call)

Query just one protocol to test the full pipeline:

python3 scripts/bicorder_query.py test_output.csv 1 -m gpt-4o-mini

Expected result:

Should show "Protocol: [name]"
Should show "[1/23] Querying: Design explicit vs implicit..."
Should complete all 23 gradients
Should show "✓ CSV updated: test_output.csv"
Each gradient should show a value 1-9

Verify the output:

# Check that values were written
head -2 test_output.csv | tail -1 | tr ',' '\n' | tail -25 | head -5

Test 4: Check for No Conversation State

Verify that the tool doesn't create any conversation files:

# Before running test
llm logs list | grep -i bicorder

# Run a query
python3 scripts/bicorder_query.py test_output.csv 2 -m gpt-4o-mini

# After running test
llm logs list | grep -i bicorder

Expected result:

Should not see any "bicorder_row_*" or similar conversation IDs
Each query should be independent

Test 5: Batch Processing (Small Set)

Test batch processing on rows 1-3:

python3 scripts/bicorder_batch.py data/readings/synthetic_20251116/protocols_edited.csv -o test_batch_output.csv --start 1 --end 3 -m gpt-4o-mini

Expected result:

Should process 3 protocols
Should show progress for each row
Should show "Successful: 3" at the end
No mention of "initializing conversation"

Verify outputs:

# Check that all 3 rows have values
python3 -c "
import csv
with open('test_batch_output.csv') as f:
    reader = csv.DictReader(f)
    for i, row in enumerate(reader, 1):
        if i > 3:
            break
        gradient_cols = [k for k in row.keys() if '_vs_' in k]
        filled = sum(1 for k in gradient_cols if row[k])
        print(f'Row {i}: {filled}/23 gradients filled')
"

Test 6: Dry Run with Different Model

Test that model parameter works in dry run:

python3 scripts/bicorder_query.py data/readings/synthetic_20251116/protocols_edited.csv 5 --dry-run -m mistral | head -50

Expected result:

Should show prompts (model doesn't matter in dry run, but flag should be accepted)

Test 7: Error Handling

Test with invalid row number:

python3 scripts/bicorder_query.py test_output.csv 999

Expected result:

Should show error: "Error: Row 999 not found in CSV"

Test 8: Compare Prompt Structure

Compare the new standalone prompts vs old system prompt approach:

# New approach - protocol context in each prompt
python3 scripts/bicorder_query.py data/readings/synthetic_20251116/protocols_edited.csv 1 --dry-run | grep -A 5 "Analyze this protocol"

# Old approach would have had protocol in system prompt only (no longer used)
# Verify that protocol context appears in EVERY gradient prompt
python3 scripts/bicorder_query.py data/readings/synthetic_20251116/protocols_edited.csv 1 --dry-run | grep -c "Analyze this protocol"

Expected result:

Should show "23" (protocol context appears in all 23 prompts)

Cleanup

Remove test files:

rm -f test_output.csv test_batch_output.csv

Success Criteria

✅ All 23 gradients queried for each protocol ✅ No conversation IDs created or referenced ✅ Protocol context included in every prompt ✅ CSV values properly written (1-9) ✅ Batch processing works without initialization step ✅ Error handling works correctly

4.3 KiB Raw Blame History

Test Commands for Refactored Bicorder

Test 1: Dry Run - Single Protocol

Test 2: Verify CSV Structure

Test 3: Single Gradient Query (Real LLM Call)

Test 4: Check for No Conversation State

Test 5: Batch Processing (Small Set)

Test 6: Dry Run with Different Model

Test 7: Error Handling

Test 8: Compare Prompt Structure

Cleanup

Success Criteria

4.3 KiB

Raw Blame History