Set up analysis scripts

2025-10-30 10:56:21 -06:00
parent d2da0425c6
commit 815ed9d6f4
14 changed files with 1427 additions and 651 deletions
--- a/analysis/.~lock.output-edit.csv#
+++ b/analysis/.~lock.output-edit.csv#
@@ -1 +0,0 @@
-Nathan Schneider,ntnsndr,satellite,25.10.2025 13:38,file:///home/ntnsndr/.config/libreoffice/4;
--- a/analysis/README.md
+++ b/analysis/README.md
@@ -2,26 +2,78 @@

 This directory concerns a synthetic data analysis conducted with the Protocol Bicorder.

+Scripts were created with the assistance of Claude Code, but the data processing was done with local models.
+
+## Purpose
+
+This analyses has several purposes:
+
+* To test the usefulness and limitations of the Protocol Bicorder
+* To identify any patterns in a synthetic dataset derived from recent works on protocols
+
 ## Procedure

 See [`prompts.md`](prompts.md) for a collection of prompts used in this process. SHOULD THESE BE INTEGRATED BELOW?

-* Document chunking: Gathering raw data from protocol-focused texts, including the draft of the author's book, _The Protocol Reader_, _As for Protocols_, and _Das Protokoll_; produces a CSV list of protocols
-    - The dataset includes some LLM hallucinations---that is, protocols not in the texts---but the hallucinations are often acceptable examples and so some have been retained (`data/output-raw.csv`, n=776)
-    - Cleaning: Manual review of the protocols listed to remove overly broad or inappropriate entries (`data/output-edit.csv`, n=TKTK)
-* Dataset elaboration: Expand the dataset with LLM background knowledge
-    - Add analyst personas: a control analyst with an academic "view from nowhere" and two others directly involved in the protocol
-    - Cleaning: Manual review of the elaborated protocols and analysts for quality and correctness, editing or removing problematic entries
-* Bicorder diagnostic
-    - Automated diagnoses
-        - For each protocol
-            - Start a thread; for each gradient
-                - extract term explanation
-                - pick a number, add to csv, followed by comma
+### Document chunking
+
+This stage gathered raw data from recent protocol-focused texts. 
+
+The following prompt was applied to book chapter drafts and major protocol-related books, including the draft of the author's book, _The Protocol Reader_, _As for Protocols_, and _Das Protokoll_. The texts were pasted in plain text and then divided into 5000-word files, with the following prompt applied to each of them with the `chunk.sh` script:
+
+```yaml
+model: "gemma3:12b" 
+context: "model running on ollama locally, accessed with llm on the command line"
+prompt: "Return csv-formatted data (with no markdown wrapper) that consists of a list of protocols discussed or referred to in the attached text. Protocols are defined extremely broadly as 'patterns of interaction,' and may be of a nontechnical nature. Protocols should be as specific as possible, such as 'Sacrament of Reconciliation' rather than 'Religious Protocols.' The first column should provide a brief descriptor of the protocol, and the second column should describe it in a substantial paragraph of 3-5 sentences, encapsulated in quotation marks to avoid breaking on commas. Be sure to paraphrase rather than quoting directly from the source text."
+```
+
+The result was a CSV-formatted list of protocols (`protocols_raw.csv`, n=774 total protocols listed).
+
+### Dataset cleaning
+
+The dataset was then manually reviewed. The review involved the following:
+
+* Removal of repetitive formatting material introduced by the LLM
+* Correction of formatting errors
+* Removal of rows whose contents met the following criteria:
+    - Repetition of entries---though some repetitions were simply merged into a single entry
+    - Overly broad entries that lacked meaningful context-specificity
+    - Overly narrow entries, e.g., referring to specific events
+
+The cleaning process was carried out in a subjective manner, so some entries that meet the above criteria may remain in the dataset. The dataset also appears to include some LLM hallucinations---that is, protocols not in the texts---but the hallucinations are often acceptable examples and so some have been retained. Some degree of noise in the dataset was considered acceptable for the purposes of the study. Some degree of repetition, also, provides the dataset with a kind of control cases for evaluating the diagnostic process.
+
+The result was a CSV-formatted list of protocols (`protocols_edited.csv`, n=419).
+
+
+### Initial diagnostic
+
+This is the part of the process where an LLM proceeds to apply the bicorder tool to the dataset. For each row in the dataset, and on each gradient, it prompts the LLM to apply each gradient to the protocol. The outputs are then added to a CSV output file.
+
+See detailed documentation of the scripts at `WORKFLOW.md`.
+
+
+
+
    - Manual audit of analyses: TKTK
    - Test different models
        - Perhaps use a simplified template without citations and descriptions to reduce tokens, and just provide those materials once
    - Iterate on bicorder design?
+
+### Persona elaboration
+
+* Dataset elaboration: Expand the dataset with LLM background knowledge
+    - Add analyst personas: a control analyst with an academic "view from nowhere" and two others directly involved in the protocol
+    - Cleaning: Manual review of the elaborated protocols and analysts for quality and correctness, editing or removing problematic entries
+
+
+```yaml
+model: "claude"
+context: "Claude Code interface"
+prompt: "In protocols.csv, fill in the third and fourth columns with plausible inputs that reflect diversity along cultural, professional, gender, and class lines. The 'analyst' should be a particular persona described generically, like '23-year-old male student in New Delhi,' and the 'standpoint' should more thoroughly describe the analyst's relationship to the protocol, like, 'Learning Brahmanic rituals from elders in order to maintain the tradition, but feeling pulled away from these rituals by contemporary culture.' Replicate each protocol line twice to provide a total of three plausible analyst and standpoint pairs for each protocol." 
+```
+
+### Results analysis
+
 * Diagnostic results analysis
    - TKTK need to identify relevant tests
        - Correlations: Which gradients seem to travel together?
--- a/analysis/TEST_COMMANDS.md
+++ b/analysis/TEST_COMMANDS.md
@@ -0,0 +1,157 @@
+# Test Commands for Refactored Bicorder
+
+Run these tests in order to verify the refactored code works correctly.
+
+## Test 1: Dry Run - Single Protocol
+
+Test that prompts are generated correctly with protocol context:
+
+```bash
+python3 bicorder_query.py protocols_edited.csv 1 --dry-run | head -80
+```
+
+**Expected result:**
+- Should show "DRY RUN: Row 1, 23 gradients"
+- Should show protocol descriptor and description
+- Each prompt should include full protocol context
+- Should show 23 gradient prompts
+
+## Test 2: Verify CSV Structure
+
+Check that the analyze script still creates proper CSV structure:
+
+```bash
+python3 bicorder_analyze.py protocols_edited.csv -o test_output.csv
+head -1 test_output.csv | tr ',' '\n' | grep -E "(explicit|precise|elite)" | head -5
+```
+
+**Expected result:**
+- Should show gradient column names like:
+  - Design_explicit_vs_implicit
+  - Design_precise_vs_interpretive
+  - Design_elite_vs_vernacular
+
+## Test 3: Single Gradient Query (Real LLM Call)
+
+Query just one protocol to test the full pipeline:
+
+```bash
+python3 bicorder_query.py test_output.csv 1 -m gpt-4o-mini
+```
+
+**Expected result:**
+- Should show "Protocol: [name]"
+- Should show "[1/23] Querying: Design explicit vs implicit..."
+- Should complete all 23 gradients
+- Should show "✓ CSV updated: test_output.csv"
+- Each gradient should show a value 1-9
+
+**Verify the output:**
+```bash
+# Check that values were written
+head -2 test_output.csv | tail -1 | tr ',' '\n' | tail -25 | head -5
+```
+
+## Test 4: Check for No Conversation State
+
+Verify that the tool doesn't create any conversation files:
+
+```bash
+# Before running test
+llm logs list | grep -i bicorder
+
+# Run a query
+python3 bicorder_query.py test_output.csv 2 -m gpt-4o-mini
+
+# After running test
+llm logs list | grep -i bicorder
+```
+
+**Expected result:**
+- Should not see any "bicorder_row_*" or similar conversation IDs
+- Each query should be independent
+
+## Test 5: Batch Processing (Small Set)
+
+Test batch processing on rows 1-3:
+
+```bash
+python3 bicorder_batch.py protocols_edited.csv -o test_batch_output.csv --start 1 --end 3 -m gpt-4o-mini
+```
+
+**Expected result:**
+- Should process 3 protocols
+- Should show progress for each row
+- Should show "Successful: 3" at the end
+- No mention of "initializing conversation"
+
+**Verify outputs:**
+```bash
+# Check that all 3 rows have values
+python3 -c "
+import csv
+with open('test_batch_output.csv') as f:
+    reader = csv.DictReader(f)
+    for i, row in enumerate(reader, 1):
+        if i > 3:
+            break
+        gradient_cols = [k for k in row.keys() if '_vs_' in k]
+        filled = sum(1 for k in gradient_cols if row[k])
+        print(f'Row {i}: {filled}/23 gradients filled')
+"
+```
+
+## Test 6: Dry Run with Different Model
+
+Test that model parameter works in dry run:
+
+```bash
+python3 bicorder_query.py protocols_edited.csv 5 --dry-run -m mistral | head -50
+```
+
+**Expected result:**
+- Should show prompts (model doesn't matter in dry run, but flag should be accepted)
+
+## Test 7: Error Handling
+
+Test with invalid row number:
+
+```bash
+python3 bicorder_query.py test_output.csv 999
+```
+
+**Expected result:**
+- Should show error: "Error: Row 999 not found in CSV"
+
+## Test 8: Compare Prompt Structure
+
+Compare the new standalone prompts vs old system prompt approach:
+
+```bash
+# New approach - protocol context in each prompt
+python3 bicorder_query.py protocols_edited.csv 1 --dry-run | grep -A 5 "Analyze this protocol"
+
+# Old approach would have had protocol in system prompt only (no longer used)
+# Verify that protocol context appears in EVERY gradient prompt
+python3 bicorder_query.py protocols_edited.csv 1 --dry-run | grep -c "Analyze this protocol"
+```
+
+**Expected result:**
+- Should show "23" (protocol context appears in all 23 prompts)
+
+## Cleanup
+
+Remove test files:
+
+```bash
+rm -f test_output.csv test_batch_output.csv
+```
+
+## Success Criteria
+
+✅ All 23 gradients queried for each protocol
+✅ No conversation IDs created or referenced
+✅ Protocol context included in every prompt
+✅ CSV values properly written (1-9)
+✅ Batch processing works without initialization step
+✅ Error handling works correctly
--- a/analysis/WORKFLOW.md
+++ b/analysis/WORKFLOW.md
@@ -0,0 +1,134 @@
+# Protocol Bicorder Analysis Workflow
+
+This directory contains scripts for analyzing protocols using the Protocol Bicorder framework with LLM assistance.
+
+## Scripts
+
+1. **bicorder_batch.py** - **[RECOMMENDED]** Process entire CSV with one command
+2. **bicorder_analyze.py** - Prepares CSV with gradient columns
+3. **bicorder_query.py** - Queries LLM for each gradient value and updates CSV (each query is a new chat)
+
+## Quick Start (Recommended)
+
+### Process All Protocols with One Command
+
+```bash
+python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv
+```
+
+This will:
+1. Create the analysis CSV with gradient columns
+2. For each protocol row, query all gradients (each query is a new chat with full protocol context)
+3. Update the CSV automatically with the results
+4. Show progress and summary
+
+### Common Options
+
+```bash
+# Process only rows 1-5 (useful for testing)
+python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv --start 1 --end 5
+
+# Use specific LLM model
+python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv -m mistral
+
+# Add analyst metadata
+python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv \
+  -a "Your Name" -s "Your analytical standpoint"
+```
+
+---
+
+## Manual Workflow (Advanced)
+
+### Step 1: Prepare the Analysis CSV
+
+Create a CSV with empty gradient columns:
+
+```bash
+python3 bicorder_analyze.py protocols_edited.csv -o analysis_output.csv
+```
+
+Optional: Add analyst metadata:
+```bash
+python3 bicorder_analyze.py protocols_edited.csv -o analysis_output.csv \
+  -a "Your Name" -s "Your analytical standpoint"
+```
+
+### Step 2: Query Gradients for a Protocol Row
+
+Query all gradients for a specific protocol:
+
+```bash
+python3 bicorder_query.py analysis_output.csv 1
+```
+
+- Replace `1` with the row number you want to analyze
+- Each gradient is queried in a new chat with full protocol context
+- Each response is automatically parsed and written to the CSV
+- Progress is shown for each gradient
+
+Optional: Specify a model:
+```bash
+python3 bicorder_query.py analysis_output.csv 1 -m mistral
+```
+
+### Step 3: Repeat for All Protocols
+
+For each protocol in your CSV:
+
+```bash
+python3 bicorder_query.py analysis_output.csv 1
+python3 bicorder_query.py analysis_output.csv 2
+python3 bicorder_query.py analysis_output.csv 3
+# ... and so on
+
+# OR: Use bicorder_batch.py to automate all of this!
+```
+
+## Architecture
+
+### How It Works
+
+Each gradient query is sent to the LLM as a **new, independent chat**. Every query includes:
+- The protocol descriptor (name)
+- The protocol description
+- The gradient definition (left term, right term, and their descriptions)
+- Instructions to rate 1-9
+
+This approach:
+- **Simplifies the code** - No conversation state management
+- **Prevents bias** - Each evaluation is independent, not influenced by previous responses
+- **Enables parallelization** - Queries could theoretically run concurrently
+- **Makes debugging easier** - Each query/response pair is self-contained
+
+## Tips
+
+### Dry Run Mode
+
+Test prompts without calling the LLM:
+
+```bash
+python3 bicorder_query.py analysis_output.csv 1 --dry-run
+```
+
+This shows you exactly what prompt will be sent for each gradient, including the full protocol context.
+
+### Check Your Progress
+
+View completed values:
+
+```bash
+python3 -c "
+import csv
+with open('analysis_output.csv') as f:
+    reader = csv.DictReader(f)
+    for i, row in enumerate(reader, 1):
+        empty = sum(1 for k, v in row.items() if 'vs' in k and not v)
+        print(f'Row {i}: {empty}/23 gradients empty')
+"
+```
+
+### Batch Processing
+
+Use the `bicorder_batch.py` script (see Quick Start section above) for processing multiple protocols.
+
--- a/analysis/bicorder_analyze.py
+++ b/analysis/bicorder_analyze.py
@@ -0,0 +1,155 @@
+#!/usr/bin/env python3
+"""
+Protocol Bicorder Analysis Script
+
+Processes a two-column CSV file (protocol descriptor and description) and adds
+columns for each diagnostic gradient from bicorder.json. Values to be filled
+by LLM commands.
+"""
+
+import csv
+import json
+import sys
+import argparse
+from pathlib import Path
+
+
+def load_bicorder_config(bicorder_path):
+    """Load and parse the bicorder.json configuration file."""
+    with open(bicorder_path, 'r') as f:
+        return json.load(f)
+
+
+def extract_gradients(bicorder_data):
+    """Extract all gradients from the diagnostic sets."""
+    gradients = []
+    for diagnostic_set in bicorder_data['diagnostic']:
+        set_name = diagnostic_set['set_name']
+
+        for gradient in diagnostic_set['gradients']:
+            # Create a unique column name for this gradient
+            col_name = f"{set_name}_{gradient['term_left']}_vs_{gradient['term_right']}"
+            gradients.append({
+                'column_name': col_name,
+                'set_name': set_name,
+                'term_left': gradient['term_left'],
+                'term_left_description': gradient['term_left_description'],
+                'term_right': gradient['term_right'],
+                'term_right_description': gradient['term_right_description']
+            })
+
+    return gradients
+
+
+def process_csv(input_csv, output_csv, bicorder_path, analyst=None, standpoint=None):
+    """
+    Process the input CSV and add gradient columns.
+
+    Args:
+        input_csv: Path to input CSV file
+        output_csv: Path to output CSV file
+        bicorder_path: Path to bicorder.json file
+        analyst: Optional analyst name
+        standpoint: Optional standpoint description
+    """
+    # Load bicorder configuration
+    bicorder_data = load_bicorder_config(bicorder_path)
+    gradients = extract_gradients(bicorder_data)
+
+    with open(input_csv, 'r', encoding='utf-8') as infile, \
+         open(output_csv, 'w', newline='', encoding='utf-8') as outfile:
+
+        reader = csv.DictReader(infile)
+
+        # Get original fieldnames from input CSV, filter out None/empty
+        original_fields = [f for f in reader.fieldnames if f and f.strip()]
+
+        # Add gradient columns and metadata columns
+        gradient_columns = [g['column_name'] for g in gradients]
+        output_fields = list(original_fields) + gradient_columns
+
+        # Add metadata columns if provided
+        if analyst is not None:
+            output_fields.append('analyst')
+        if standpoint is not None:
+            output_fields.append('standpoint')
+
+        writer = csv.DictWriter(outfile, fieldnames=output_fields)
+        writer.writeheader()
+
+        # Process each protocol row
+        row_count = 0
+        for protocol_row in reader:
+            # Start with original row data, filter out None keys
+            output_row = {k: v for k, v in protocol_row.items() if k and k.strip()}
+
+            # Initialize all gradient columns as empty (to be filled by LLM)
+            for gradient in gradients:
+                output_row[gradient['column_name']] = ''
+
+            # Add metadata if provided
+            if analyst is not None:
+                output_row['analyst'] = analyst
+            if standpoint is not None:
+                output_row['standpoint'] = standpoint
+
+            writer.writerow(output_row)
+            row_count += 1
+
+            descriptor = protocol_row.get('Descriptor', '').strip()
+            print(f"Processed protocol {row_count}: {descriptor}")
+
+    print(f"\nOutput written to: {output_csv}")
+    print(f"Total protocols: {row_count}")
+    print(f"Gradient columns added: {len(gradients)}")
+    print(f"\nGradient columns:")
+    for i, gradient in enumerate(gradients, 1):
+        print(f"  {i}. {gradient['column_name']}")
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description='Process protocol CSV and add bicorder diagnostic columns',
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Example usage:
+  python3 bicorder_analyze.py protocols_edited.csv -o output.csv
+  python3 bicorder_analyze.py protocols_raw.csv -o output.csv -a "Jane Doe" -s "Researcher perspective"
+
+The script will preserve all original columns and add one column per diagnostic gradient.
+Each gradient column will be empty, ready to be filled by LLM commands.
+        """
+    )
+
+    parser.add_argument('input_csv', help='Input CSV file with protocol data')
+    parser.add_argument('-o', '--output', required=True, help='Output CSV file')
+    parser.add_argument('-b', '--bicorder',
+                        default='../bicorder.json',
+                        help='Path to bicorder.json (default: ../bicorder.json)')
+    parser.add_argument('-a', '--analyst', help='Analyst name (adds analyst column)')
+    parser.add_argument('-s', '--standpoint', help='Analyst standpoint (adds standpoint column)')
+
+    args = parser.parse_args()
+
+    # Validate input file exists
+    if not Path(args.input_csv).exists():
+        print(f"Error: Input file '{args.input_csv}' not found", file=sys.stderr)
+        sys.exit(1)
+
+    # Validate bicorder.json exists
+    if not Path(args.bicorder).exists():
+        print(f"Error: Bicorder config '{args.bicorder}' not found", file=sys.stderr)
+        sys.exit(1)
+
+    # Process the CSV
+    process_csv(
+        args.input_csv,
+        args.output,
+        args.bicorder,
+        args.analyst,
+        args.standpoint
+    )
+
+
+if __name__ == '__main__':
+    main()
--- a/analysis/bicorder_batch.py
+++ b/analysis/bicorder_batch.py
@@ -0,0 +1,175 @@
+#!/usr/bin/env python3
+"""
+Batch process all protocols in a CSV using the Bicorder framework.
+
+This script orchestrates the entire analysis workflow:
+1. Creates output CSV with gradient columns
+2. For each protocol row:
+   - Queries all 23 gradients (each in a new chat)
+   - Updates CSV with results
+"""
+
+import csv
+import json
+import sys
+import argparse
+import subprocess
+from pathlib import Path
+
+
+def count_csv_rows(csv_path):
+    """Count the number of data rows in a CSV file."""
+    with open(csv_path, 'r', encoding='utf-8') as f:
+        reader = csv.DictReader(f)
+        return sum(1 for _ in reader)
+
+
+def run_bicorder_analyze(input_csv, output_csv, bicorder_path, analyst=None, standpoint=None):
+    """Run bicorder_analyze.py to create output CSV."""
+    cmd = ['python3', 'bicorder_analyze.py', input_csv, '-o', output_csv, '-b', bicorder_path]
+
+    if analyst:
+        cmd.extend(['-a', analyst])
+    if standpoint:
+        cmd.extend(['-s', standpoint])
+
+    print(f"Creating analysis CSV: {output_csv}")
+    result = subprocess.run(cmd, capture_output=True, text=True)
+
+    if result.returncode != 0:
+        print(f"Error creating CSV: {result.stderr}", file=sys.stderr)
+        return False
+
+    print(result.stdout)
+    return True
+
+
+def query_gradients(output_csv, row_num, bicorder_path, model=None):
+    """Query all gradients for a protocol row."""
+    cmd = ['python3', 'bicorder_query.py', output_csv, str(row_num),
+           '-b', bicorder_path]
+
+    if model:
+        cmd.extend(['-m', model])
+
+    print(f"Starting gradient queries...")
+
+    # Don't capture output - let it print in real-time for progress visibility
+    result = subprocess.run(cmd)
+
+    if result.returncode != 0:
+        print(f"Error querying gradients", file=sys.stderr)
+        return False
+
+    return True
+
+
+def process_protocol_row(input_csv, output_csv, row_num, total_rows, bicorder_path, model=None):
+    """Process a single protocol row through the complete workflow."""
+    print(f"\n{'='*60}")
+    print(f"Row {row_num}/{total_rows}")
+    print(f"{'='*60}")
+
+    # Query all gradients (each gradient gets a new chat)
+    if not query_gradients(output_csv, row_num, bicorder_path, model):
+        print(f"[FAILED] Could not query gradients")
+        return False
+
+    print(f"✓ Row {row_num} complete")
+    return True
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description='Batch process protocols through Bicorder analysis (each gradient uses a new chat)',
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Example usage:
+  # Process all protocols
+  python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv
+
+  # Process specific rows
+  python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv --start 1 --end 5
+
+  # With specific model
+  python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv -m mistral
+
+  # With metadata
+  python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv -a "Your Name" -s "Your standpoint"
+        """
+    )
+
+    parser.add_argument('input_csv', help='Input CSV file with protocol data')
+    parser.add_argument('-o', '--output', required=True, help='Output CSV file')
+    parser.add_argument('-b', '--bicorder',
+                        default='../bicorder.json',
+                        help='Path to bicorder.json (default: ../bicorder.json)')
+    parser.add_argument('-m', '--model', help='LLM model to use')
+    parser.add_argument('-a', '--analyst', help='Analyst name')
+    parser.add_argument('-s', '--standpoint', help='Analyst standpoint')
+    parser.add_argument('--start', type=int, default=1,
+                        help='Start row number (1-indexed, default: 1)')
+    parser.add_argument('--end', type=int,
+                        help='End row number (1-indexed, default: all rows)')
+    parser.add_argument('--resume', action='store_true',
+                        help='Resume from existing output CSV (skip rows with values)')
+
+    args = parser.parse_args()
+
+    # Validate input file exists
+    if not Path(args.input_csv).exists():
+        print(f"Error: Input file '{args.input_csv}' not found", file=sys.stderr)
+        sys.exit(1)
+
+    # Validate bicorder.json exists
+    if not Path(args.bicorder).exists():
+        print(f"Error: Bicorder config '{args.bicorder}' not found", file=sys.stderr)
+        sys.exit(1)
+
+    # Count rows in input CSV
+    total_rows = count_csv_rows(args.input_csv)
+    end_row = args.end if args.end else total_rows
+
+    if args.start > total_rows or end_row > total_rows:
+        print(f"Error: Row range exceeds CSV size ({total_rows} rows)", file=sys.stderr)
+        sys.exit(1)
+
+    print(f"Bicorder Batch Analysis")
+    print(f"Input: {args.input_csv} ({total_rows} protocols)")
+    print(f"Output: {args.output}")
+    print(f"Processing rows: {args.start} to {end_row}")
+    if args.model:
+        print(f"Model: {args.model}")
+    print()
+
+    # Step 1: Create output CSV (unless resuming)
+    if not args.resume or not Path(args.output).exists():
+        if not run_bicorder_analyze(args.input_csv, args.output, args.bicorder,
+                                     args.analyst, args.standpoint):
+            sys.exit(1)
+    else:
+        print(f"Resuming from existing CSV: {args.output}")
+
+    # Step 2: Process each protocol row
+    success_count = 0
+    fail_count = 0
+
+    for row_num in range(args.start, end_row + 1):
+        if process_protocol_row(args.input_csv, args.output, row_num, end_row,
+                                args.bicorder, args.model):
+            success_count += 1
+        else:
+            fail_count += 1
+            print(f"[WARNING] Row {row_num} failed, continuing...")
+
+    # Summary
+    print(f"\n{'='*60}")
+    print(f"BATCH COMPLETE")
+    print(f"{'='*60}")
+    print(f"Successful: {success_count}")
+    print(f"Failed: {fail_count}")
+    print(f"Output: {args.output}")
+
+
+if __name__ == '__main__':
+    main()
--- a/analysis/bicorder_init.py
+++ b/analysis/bicorder_init.py
@@ -0,0 +1,95 @@
+#!/usr/bin/env python3
+"""
+Initialize LLM conversation with bicorder framework and protocol context.
+
+This script reads a protocol from the CSV and the bicorder.json framework,
+then generates a prompt to initialize the LLM conversation.
+"""
+
+import csv
+import json
+import sys
+import argparse
+from pathlib import Path
+
+
+def load_bicorder_config(bicorder_path):
+    """Load and parse the bicorder.json configuration file."""
+    with open(bicorder_path, 'r') as f:
+        return json.load(f)
+
+
+def get_protocol_by_row(csv_path, row_number):
+    """Get protocol data from CSV by row number (1-indexed)."""
+    with open(csv_path, 'r', encoding='utf-8') as f:
+        reader = csv.DictReader(f)
+        for i, row in enumerate(reader, start=1):
+            if i == row_number:
+                return {
+                    'descriptor': row.get('Descriptor', '').strip(),
+                    'description': row.get('Description', '').strip()
+                }
+    return None
+
+
+def generate_init_prompt(protocol, bicorder_data):
+    """Generate the initialization prompt for the LLM."""
+
+    # Ultra-minimal version for system prompt
+    prompt = f"""Analyze this protocol: "{protocol['descriptor']}"
+
+Description: {protocol['description']}
+
+Task: Rate this protocol on diagnostic gradients using scale 1-9 (1=left term, 5=neutral/balanced, 9=right term). Respond with just the number and brief explanation."""
+
+    return prompt
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description='Initialize LLM conversation with protocol and bicorder framework',
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Example usage:
+  # Initialize conversation for protocol in row 1
+  python3 bicorder_init.py protocols_edited.csv 1 | llm -m mistral --save init_1
+
+  # Initialize for row 5
+  python3 bicorder_init.py protocols_edited.csv 5 | llm -m mistral --save init_5
+        """
+    )
+
+    parser.add_argument('input_csv', help='Input CSV file with protocol data')
+    parser.add_argument('row_number', type=int, help='Row number to analyze (1-indexed)')
+    parser.add_argument('-b', '--bicorder',
+                        default='../bicorder.json',
+                        help='Path to bicorder.json (default: ../bicorder.json)')
+
+    args = parser.parse_args()
+
+    # Validate input file exists
+    if not Path(args.input_csv).exists():
+        print(f"Error: Input file '{args.input_csv}' not found", file=sys.stderr)
+        sys.exit(1)
+
+    # Validate bicorder.json exists
+    if not Path(args.bicorder).exists():
+        print(f"Error: Bicorder config '{args.bicorder}' not found", file=sys.stderr)
+        sys.exit(1)
+
+    # Load protocol
+    protocol = get_protocol_by_row(args.input_csv, args.row_number)
+    if protocol is None:
+        print(f"Error: Row {args.row_number} not found in CSV", file=sys.stderr)
+        sys.exit(1)
+
+    # Load bicorder config
+    bicorder_data = load_bicorder_config(args.bicorder)
+
+    # Generate and output prompt
+    prompt = generate_init_prompt(protocol, bicorder_data)
+    print(prompt)
+
+
+if __name__ == '__main__':
+    main()
--- a/analysis/bicorder_query.py
+++ b/analysis/bicorder_query.py
@@ -0,0 +1,230 @@
+#!/usr/bin/env python3
+"""
+Query LLM for individual gradient values and update CSV.
+
+This script generates prompts for each gradient, queries the LLM conversation,
+and updates the CSV with the returned values.
+"""
+
+import csv
+import json
+import sys
+import argparse
+import subprocess
+import re
+from pathlib import Path
+
+
+def load_bicorder_config(bicorder_path):
+    """Load and parse the bicorder.json configuration file."""
+    with open(bicorder_path, 'r') as f:
+        return json.load(f)
+
+
+def extract_gradients(bicorder_data):
+    """Extract all gradients from the diagnostic sets."""
+    gradients = []
+    for diagnostic_set in bicorder_data['diagnostic']:
+        set_name = diagnostic_set['set_name']
+
+        for gradient in diagnostic_set['gradients']:
+            col_name = f"{set_name}_{gradient['term_left']}_vs_{gradient['term_right']}"
+            gradients.append({
+                'column_name': col_name,
+                'set_name': set_name,
+                'term_left': gradient['term_left'],
+                'term_left_description': gradient['term_left_description'],
+                'term_right': gradient['term_right'],
+                'term_right_description': gradient['term_right_description']
+            })
+
+    return gradients
+
+
+def get_protocol_by_row(csv_path, row_number):
+    """Get protocol data from CSV by row number (1-indexed)."""
+    with open(csv_path, 'r', encoding='utf-8') as f:
+        reader = csv.DictReader(f)
+        for i, row in enumerate(reader, start=1):
+            if i == row_number:
+                return {
+                    'descriptor': row.get('Descriptor', '').strip(),
+                    'description': row.get('Description', '').strip()
+                }
+    return None
+
+
+def generate_gradient_prompt(protocol_descriptor, protocol_description, gradient):
+    """Generate a prompt for a single gradient evaluation."""
+    return f"""Analyze this protocol: "{protocol_descriptor}"
+
+Description: {protocol_description}
+
+Evaluate the protocol on this gradient:
+
+**{gradient['term_left']}** (1) vs **{gradient['term_right']}** (9)
+
+- **{gradient['term_left']}**: {gradient['term_left_description']}
+- **{gradient['term_right']}**: {gradient['term_right_description']}
+
+Provide a rating from 1 to 9, where:
+- 1 = strongly {gradient['term_left']}
+- 5 = neutral/balanced
+- 9 = strongly {gradient['term_right']}
+
+Respond with ONLY the number (1-9), optionally followed by a brief explanation.
+"""
+
+
+def query_llm(prompt, model=None):
+    """Send prompt to llm CLI and get response."""
+    cmd = ['llm']
+    if model:
+        cmd.extend(['-m', model])
+
+    try:
+        result = subprocess.run(
+            cmd,
+            input=prompt,
+            text=True,
+            capture_output=True,
+            check=True
+        )
+        return result.stdout.strip()
+    except subprocess.CalledProcessError as e:
+        print(f"  Error calling llm: {e.stderr}", file=sys.stderr)
+        return None
+
+
+def extract_value(llm_response):
+    """Extract numeric value (1-9) from LLM response."""
+    # Look for a number 1-9 at the start of the response
+    match = re.search(r'^(\d)', llm_response.strip())
+    if match:
+        value = int(match.group(1))
+        if 1 <= value <= 9:
+            return value
+    return None
+
+
+def update_csv_cell(csv_path, row_number, column_name, value):
+    """Update a specific cell in the CSV."""
+    # Read all rows
+    rows = []
+    with open(csv_path, 'r', encoding='utf-8') as f:
+        reader = csv.DictReader(f)
+        fieldnames = reader.fieldnames
+        for row in reader:
+            rows.append(row)
+
+    # Update the specific cell
+    if row_number <= len(rows):
+        rows[row_number - 1][column_name] = str(value)
+
+        # Write back
+        with open(csv_path, 'w', newline='', encoding='utf-8') as f:
+            writer = csv.DictWriter(f, fieldnames=fieldnames)
+            writer.writeheader()
+            writer.writerows(rows)
+        return True
+    return False
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description='Query LLM for gradient values and update CSV',
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Example usage:
+  # Query all gradients for protocol in row 1
+  python3 bicorder_query.py analysis_output.csv 1
+
+  # Query specific model
+  python3 bicorder_query.py analysis_output.csv 1 -m mistral
+
+  # Dry run (show prompts without calling LLM)
+  python3 bicorder_query.py analysis_output.csv 1 --dry-run
+        """
+    )
+
+    parser.add_argument('csv_path', help='CSV file to update')
+    parser.add_argument('row_number', type=int, help='Row number to analyze (1-indexed)')
+    parser.add_argument('-b', '--bicorder',
+                        default='../bicorder.json',
+                        help='Path to bicorder.json (default: ../bicorder.json)')
+    parser.add_argument('-m', '--model', help='LLM model to use')
+    parser.add_argument('--dry-run', action='store_true',
+                        help='Show prompts without calling LLM or updating CSV')
+
+    args = parser.parse_args()
+
+    # Validate files exist
+    if not Path(args.csv_path).exists():
+        print(f"Error: CSV file '{args.csv_path}' not found", file=sys.stderr)
+        sys.exit(1)
+
+    if not Path(args.bicorder).exists():
+        print(f"Error: Bicorder config '{args.bicorder}' not found", file=sys.stderr)
+        sys.exit(1)
+
+    # Load protocol data
+    protocol = get_protocol_by_row(args.csv_path, args.row_number)
+    if protocol is None:
+        print(f"Error: Row {args.row_number} not found in CSV", file=sys.stderr)
+        sys.exit(1)
+
+    # Load bicorder config
+    bicorder_data = load_bicorder_config(args.bicorder)
+    gradients = extract_gradients(bicorder_data)
+
+    if args.dry_run:
+        print(f"DRY RUN: Row {args.row_number}, {len(gradients)} gradients")
+        print(f"Protocol: {protocol['descriptor']}\n")
+    else:
+        print(f"Protocol: {protocol['descriptor']}")
+        print(f"Loaded {len(gradients)} gradients, starting queries...")
+
+    # Process each gradient
+    for i, gradient in enumerate(gradients, 1):
+        gradient_short = gradient['column_name'].replace('_', ' ')
+
+        if not args.dry_run:
+            print(f"[{i}/{len(gradients)}] Querying: {gradient_short}...", flush=True)
+
+        # Generate prompt (including protocol context)
+        prompt = generate_gradient_prompt(
+            protocol['descriptor'],
+            protocol['description'],
+            gradient
+        )
+
+        if args.dry_run:
+            print(f"[{i}/{len(gradients)}] {gradient_short}")
+            print(f"Prompt:\n{prompt}\n")
+            continue
+
+        # Query LLM (new chat each time)
+        response = query_llm(prompt, args.model)
+
+        if response is None:
+            print(f"[{i}/{len(gradients)}] {gradient_short}: FAILED")
+            continue
+
+        # Extract value
+        value = extract_value(response)
+        if value is None:
+            print(f"[{i}/{len(gradients)}] {gradient_short}: WARNING - no valid value")
+            continue
+
+        # Update CSV
+        if update_csv_cell(args.csv_path, args.row_number, gradient['column_name'], value):
+            print(f"[{i}/{len(gradients)}] {gradient_short}: {value}")
+        else:
+            print(f"[{i}/{len(gradients)}] {gradient_short}: ERROR updating CSV")
+
+    if not args.dry_run:
+        print(f"\n✓ CSV updated: {args.csv_path}")
+
+
+if __name__ == '__main__':
+    main()
--- a/analysis/chunk.sh
+++ b/analysis/chunk.sh
--- a/analysis/protocols.csv
+++ b/analysis/protocols.csv
@@ -1,12 +0,0 @@
-Protocol,Description,Analyst,Standpoint
-"ARPANET Host Software","Early network communication rules (technical)."
-"Request for Comments (RFCs)," "Open Internet standards documentation (technical/governance)."
-"Diplomatic Immunity","Legal protections afforded to diplomats (legal/social)."
-"Request for Proposal (RFP)," "Process for soliciting bids/proposals (business)."
-"Customer Service Scripts","Standardized responses for customer interactions (business/social)."
-"Military Standing Orders","Predefined procedures for military personnel (military)."
-"Scientific Method","Systematic approach to research & experimentation (scientific)."
-"Etiquette (Social)","Established norms for polite interaction (social)."
-"Crisis Management Plans","Procedures for responding to emergencies (organizational)."
-"Internal Memo Format","Standardized documentation within organizations (organizational)."
-
--- a/analysis/protocols_edited.csv
+++ b/analysis/protocols_edited.csv
--- a/analysis/protocols_raw.csv
+++ b/analysis/protocols_raw.csv
				`@@ -1 +0,0 @@`
				`Nathan Schneider,ntnsndr,satellite,25.10.2025 13:38,file:///home/ntnsndr/.config/libreoffice/4;`