# Test Commands for Refactored Bicorder Run these tests in order to verify the refactored code works correctly. ## Test 1: Dry Run - Single Protocol Test that prompts are generated correctly with protocol context: ```bash python3 bicorder_query.py protocols_edited.csv 1 --dry-run | head -80 ``` **Expected result:** - Should show "DRY RUN: Row 1, 23 gradients" - Should show protocol descriptor and description - Each prompt should include full protocol context - Should show 23 gradient prompts ## Test 2: Verify CSV Structure Check that the analyze script still creates proper CSV structure: ```bash python3 bicorder_analyze.py protocols_edited.csv -o test_output.csv head -1 test_output.csv | tr ',' '\n' | grep -E "(explicit|precise|elite)" | head -5 ``` **Expected result:** - Should show gradient column names like: - Design_explicit_vs_implicit - Design_precise_vs_interpretive - Design_elite_vs_vernacular ## Test 3: Single Gradient Query (Real LLM Call) Query just one protocol to test the full pipeline: ```bash python3 bicorder_query.py test_output.csv 1 -m gpt-4o-mini ``` **Expected result:** - Should show "Protocol: [name]" - Should show "[1/23] Querying: Design explicit vs implicit..." - Should complete all 23 gradients - Should show "✓ CSV updated: test_output.csv" - Each gradient should show a value 1-9 **Verify the output:** ```bash # Check that values were written head -2 test_output.csv | tail -1 | tr ',' '\n' | tail -25 | head -5 ``` ## Test 4: Check for No Conversation State Verify that the tool doesn't create any conversation files: ```bash # Before running test llm logs list | grep -i bicorder # Run a query python3 bicorder_query.py test_output.csv 2 -m gpt-4o-mini # After running test llm logs list | grep -i bicorder ``` **Expected result:** - Should not see any "bicorder_row_*" or similar conversation IDs - Each query should be independent ## Test 5: Batch Processing (Small Set) Test batch processing on rows 1-3: ```bash python3 bicorder_batch.py protocols_edited.csv -o test_batch_output.csv --start 1 --end 3 -m gpt-4o-mini ``` **Expected result:** - Should process 3 protocols - Should show progress for each row - Should show "Successful: 3" at the end - No mention of "initializing conversation" **Verify outputs:** ```bash # Check that all 3 rows have values python3 -c " import csv with open('test_batch_output.csv') as f: reader = csv.DictReader(f) for i, row in enumerate(reader, 1): if i > 3: break gradient_cols = [k for k in row.keys() if '_vs_' in k] filled = sum(1 for k in gradient_cols if row[k]) print(f'Row {i}: {filled}/23 gradients filled') " ``` ## Test 6: Dry Run with Different Model Test that model parameter works in dry run: ```bash python3 bicorder_query.py protocols_edited.csv 5 --dry-run -m mistral | head -50 ``` **Expected result:** - Should show prompts (model doesn't matter in dry run, but flag should be accepted) ## Test 7: Error Handling Test with invalid row number: ```bash python3 bicorder_query.py test_output.csv 999 ``` **Expected result:** - Should show error: "Error: Row 999 not found in CSV" ## Test 8: Compare Prompt Structure Compare the new standalone prompts vs old system prompt approach: ```bash # New approach - protocol context in each prompt python3 bicorder_query.py protocols_edited.csv 1 --dry-run | grep -A 5 "Analyze this protocol" # Old approach would have had protocol in system prompt only (no longer used) # Verify that protocol context appears in EVERY gradient prompt python3 bicorder_query.py protocols_edited.csv 1 --dry-run | grep -c "Analyze this protocol" ``` **Expected result:** - Should show "23" (protocol context appears in all 23 prompts) ## Cleanup Remove test files: ```bash rm -f test_output.csv test_batch_output.csv ``` ## Success Criteria ✅ All 23 gradients queried for each protocol ✅ No conversation IDs created or referenced ✅ Protocol context included in every prompt ✅ CSV values properly written (1-9) ✅ Batch processing works without initialization step ✅ Error handling works correctly