Files
protocol-bicorder/analysis/WORKFLOW.md
2025-10-30 10:56:21 -06:00

3.6 KiB

Protocol Bicorder Analysis Workflow

This directory contains scripts for analyzing protocols using the Protocol Bicorder framework with LLM assistance.

Scripts

  1. bicorder_batch.py - [RECOMMENDED] Process entire CSV with one command
  2. bicorder_analyze.py - Prepares CSV with gradient columns
  3. bicorder_query.py - Queries LLM for each gradient value and updates CSV (each query is a new chat)

Process All Protocols with One Command

python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv

This will:

  1. Create the analysis CSV with gradient columns
  2. For each protocol row, query all gradients (each query is a new chat with full protocol context)
  3. Update the CSV automatically with the results
  4. Show progress and summary

Common Options

# Process only rows 1-5 (useful for testing)
python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv --start 1 --end 5

# Use specific LLM model
python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv -m mistral

# Add analyst metadata
python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv \
  -a "Your Name" -s "Your analytical standpoint"

Manual Workflow (Advanced)

Step 1: Prepare the Analysis CSV

Create a CSV with empty gradient columns:

python3 bicorder_analyze.py protocols_edited.csv -o analysis_output.csv

Optional: Add analyst metadata:

python3 bicorder_analyze.py protocols_edited.csv -o analysis_output.csv \
  -a "Your Name" -s "Your analytical standpoint"

Step 2: Query Gradients for a Protocol Row

Query all gradients for a specific protocol:

python3 bicorder_query.py analysis_output.csv 1
  • Replace 1 with the row number you want to analyze
  • Each gradient is queried in a new chat with full protocol context
  • Each response is automatically parsed and written to the CSV
  • Progress is shown for each gradient

Optional: Specify a model:

python3 bicorder_query.py analysis_output.csv 1 -m mistral

Step 3: Repeat for All Protocols

For each protocol in your CSV:

python3 bicorder_query.py analysis_output.csv 1
python3 bicorder_query.py analysis_output.csv 2
python3 bicorder_query.py analysis_output.csv 3
# ... and so on

# OR: Use bicorder_batch.py to automate all of this!

Architecture

How It Works

Each gradient query is sent to the LLM as a new, independent chat. Every query includes:

  • The protocol descriptor (name)
  • The protocol description
  • The gradient definition (left term, right term, and their descriptions)
  • Instructions to rate 1-9

This approach:

  • Simplifies the code - No conversation state management
  • Prevents bias - Each evaluation is independent, not influenced by previous responses
  • Enables parallelization - Queries could theoretically run concurrently
  • Makes debugging easier - Each query/response pair is self-contained

Tips

Dry Run Mode

Test prompts without calling the LLM:

python3 bicorder_query.py analysis_output.csv 1 --dry-run

This shows you exactly what prompt will be sent for each gradient, including the full protocol context.

Check Your Progress

View completed values:

python3 -c "
import csv
with open('analysis_output.csv') as f:
    reader = csv.DictReader(f)
    for i, row in enumerate(reader, 1):
        empty = sum(1 for k, v in row.items() if 'vs' in k and not v)
        print(f'Row {i}: {empty}/23 gradients empty')
"

Batch Processing

Use the bicorder_batch.py script (see Quick Start section above) for processing multiple protocols.