Set up analysis scripts

2025-10-30 10:56:21 -06:00
parent d2da0425c6
commit 815ed9d6f4
14 changed files with 1427 additions and 651 deletions
--- a/analysis/WORKFLOW.md
+++ b/analysis/WORKFLOW.md
@@ -0,0 +1,134 @@
+# Protocol Bicorder Analysis Workflow
+
+This directory contains scripts for analyzing protocols using the Protocol Bicorder framework with LLM assistance.
+
+## Scripts
+
+1. **bicorder_batch.py** - **[RECOMMENDED]** Process entire CSV with one command
+2. **bicorder_analyze.py** - Prepares CSV with gradient columns
+3. **bicorder_query.py** - Queries LLM for each gradient value and updates CSV (each query is a new chat)
+
+## Quick Start (Recommended)
+
+### Process All Protocols with One Command
+
+```bash
+python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv
+```
+
+This will:
+1. Create the analysis CSV with gradient columns
+2. For each protocol row, query all gradients (each query is a new chat with full protocol context)
+3. Update the CSV automatically with the results
+4. Show progress and summary
+
+### Common Options
+
+```bash
+# Process only rows 1-5 (useful for testing)
+python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv --start 1 --end 5
+
+# Use specific LLM model
+python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv -m mistral
+
+# Add analyst metadata
+python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv \
+  -a "Your Name" -s "Your analytical standpoint"
+```
+
+---
+
+## Manual Workflow (Advanced)
+
+### Step 1: Prepare the Analysis CSV
+
+Create a CSV with empty gradient columns:
+
+```bash
+python3 bicorder_analyze.py protocols_edited.csv -o analysis_output.csv
+```
+
+Optional: Add analyst metadata:
+```bash
+python3 bicorder_analyze.py protocols_edited.csv -o analysis_output.csv \
+  -a "Your Name" -s "Your analytical standpoint"
+```
+
+### Step 2: Query Gradients for a Protocol Row
+
+Query all gradients for a specific protocol:
+
+```bash
+python3 bicorder_query.py analysis_output.csv 1
+```
+
+- Replace `1` with the row number you want to analyze
+- Each gradient is queried in a new chat with full protocol context
+- Each response is automatically parsed and written to the CSV
+- Progress is shown for each gradient
+
+Optional: Specify a model:
+```bash
+python3 bicorder_query.py analysis_output.csv 1 -m mistral
+```
+
+### Step 3: Repeat for All Protocols
+
+For each protocol in your CSV:
+
+```bash
+python3 bicorder_query.py analysis_output.csv 1
+python3 bicorder_query.py analysis_output.csv 2
+python3 bicorder_query.py analysis_output.csv 3
+# ... and so on
+
+# OR: Use bicorder_batch.py to automate all of this!
+```
+
+## Architecture
+
+### How It Works
+
+Each gradient query is sent to the LLM as a **new, independent chat**. Every query includes:
+- The protocol descriptor (name)
+- The protocol description
+- The gradient definition (left term, right term, and their descriptions)
+- Instructions to rate 1-9
+
+This approach:
+- **Simplifies the code** - No conversation state management
+- **Prevents bias** - Each evaluation is independent, not influenced by previous responses
+- **Enables parallelization** - Queries could theoretically run concurrently
+- **Makes debugging easier** - Each query/response pair is self-contained
+
+## Tips
+
+### Dry Run Mode
+
+Test prompts without calling the LLM:
+
+```bash
+python3 bicorder_query.py analysis_output.csv 1 --dry-run
+```
+
+This shows you exactly what prompt will be sent for each gradient, including the full protocol context.
+
+### Check Your Progress
+
+View completed values:
+
+```bash
+python3 -c "
+import csv
+with open('analysis_output.csv') as f:
+    reader = csv.DictReader(f)
+    for i, row in enumerate(reader, 1):
+        empty = sum(1 for k, v in row.items() if 'vs' in k and not v)
+        print(f'Row {i}: {empty}/23 gradients empty')
+"
+```
+
+### Batch Processing
+
+Use the `bicorder_batch.py` script (see Quick Start section above) for processing multiple protocols.
+