137 lines
3.7 KiB
Markdown
137 lines
3.7 KiB
Markdown
# Protocol Bicorder Analysis Workflow
|
|
|
|
This directory contains scripts for analyzing protocols using the Protocol Bicorder framework with LLM assistance.
|
|
|
|
The scripts automatically draw the gradients from the current state of the [bicorder.json](`../bicorder.json`) file.
|
|
|
|
## Scripts
|
|
|
|
1. **bicorder_batch.py** - **[RECOMMENDED]** Process entire CSV with one command
|
|
2. **bicorder_analyze.py** - Prepares CSV with gradient columns
|
|
3. **bicorder_query.py** - Queries LLM for each gradient value and updates CSV (each query is a new chat)
|
|
|
|
## Quick Start (Recommended)
|
|
|
|
### Process All Protocols with One Command
|
|
|
|
```bash
|
|
python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv
|
|
```
|
|
|
|
This will:
|
|
1. Create the analysis CSV with gradient columns
|
|
2. For each protocol row, query all gradients (each query is a new chat with full protocol context)
|
|
3. Update the CSV automatically with the results
|
|
4. Show progress and summary
|
|
|
|
### Common Options
|
|
|
|
```bash
|
|
# Process only rows 1-5 (useful for testing)
|
|
python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv --start 1 --end 5
|
|
|
|
# Use specific LLM model
|
|
python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv -m mistral
|
|
|
|
# Add analyst metadata
|
|
python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv \
|
|
-a "Your Name" -s "Your analytical standpoint"
|
|
```
|
|
|
|
---
|
|
|
|
## Manual Workflow (Advanced)
|
|
|
|
### Step 1: Prepare the Analysis CSV
|
|
|
|
Create a CSV with empty gradient columns:
|
|
|
|
```bash
|
|
python3 bicorder_analyze.py protocols_edited.csv -o analysis_output.csv
|
|
```
|
|
|
|
Optional: Add analyst metadata:
|
|
```bash
|
|
python3 bicorder_analyze.py protocols_edited.csv -o analysis_output.csv \
|
|
-a "Your Name" -s "Your analytical standpoint"
|
|
```
|
|
|
|
### Step 2: Query Gradients for a Protocol Row
|
|
|
|
Query all gradients for a specific protocol:
|
|
|
|
```bash
|
|
python3 bicorder_query.py analysis_output.csv 1
|
|
```
|
|
|
|
- Replace `1` with the row number you want to analyze
|
|
- Each gradient is queried in a new chat with full protocol context
|
|
- Each response is automatically parsed and written to the CSV
|
|
- Progress is shown for each gradient
|
|
|
|
Optional: Specify a model:
|
|
```bash
|
|
python3 bicorder_query.py analysis_output.csv 1 -m mistral
|
|
```
|
|
|
|
### Step 3: Repeat for All Protocols
|
|
|
|
For each protocol in your CSV:
|
|
|
|
```bash
|
|
python3 bicorder_query.py analysis_output.csv 1
|
|
python3 bicorder_query.py analysis_output.csv 2
|
|
python3 bicorder_query.py analysis_output.csv 3
|
|
# ... and so on
|
|
|
|
# OR: Use bicorder_batch.py to automate all of this!
|
|
```
|
|
|
|
## Architecture
|
|
|
|
### How It Works
|
|
|
|
Each gradient query is sent to the LLM as a **new, independent chat**. Every query includes:
|
|
- The protocol descriptor (name)
|
|
- The protocol description
|
|
- The gradient definition (left term, right term, and their descriptions)
|
|
- Instructions to rate 1-9
|
|
|
|
This approach:
|
|
- **Simplifies the code** - No conversation state management
|
|
- **Prevents bias** - Each evaluation is independent, not influenced by previous responses
|
|
- **Enables parallelization** - Queries could theoretically run concurrently
|
|
- **Makes debugging easier** - Each query/response pair is self-contained
|
|
|
|
## Tips
|
|
|
|
### Dry Run Mode
|
|
|
|
Test prompts without calling the LLM:
|
|
|
|
```bash
|
|
python3 bicorder_query.py analysis_output.csv 1 --dry-run
|
|
```
|
|
|
|
This shows you exactly what prompt will be sent for each gradient, including the full protocol context.
|
|
|
|
### Check Your Progress
|
|
|
|
View completed values:
|
|
|
|
```bash
|
|
python3 -c "
|
|
import csv
|
|
with open('analysis_output.csv') as f:
|
|
reader = csv.DictReader(f)
|
|
for i, row in enumerate(reader, 1):
|
|
empty = sum(1 for k, v in row.items() if 'vs' in k and not v)
|
|
print(f'Row {i}: {empty}/23 gradients empty')
|
|
"
|
|
```
|
|
|
|
### Batch Processing
|
|
|
|
Use the `bicorder_batch.py` script (see Quick Start section above) for processing multiple protocols.
|
|
|