# Protocol Bicorder Analysis Workflow This directory contains scripts for analyzing protocols using the Protocol Bicorder framework with LLM assistance. ## Scripts 1. **bicorder_batch.py** - **[RECOMMENDED]** Process entire CSV with one command 2. **bicorder_analyze.py** - Prepares CSV with gradient columns 3. **bicorder_query.py** - Queries LLM for each gradient value and updates CSV (each query is a new chat) ## Quick Start (Recommended) ### Process All Protocols with One Command ```bash python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv ``` This will: 1. Create the analysis CSV with gradient columns 2. For each protocol row, query all gradients (each query is a new chat with full protocol context) 3. Update the CSV automatically with the results 4. Show progress and summary ### Common Options ```bash # Process only rows 1-5 (useful for testing) python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv --start 1 --end 5 # Use specific LLM model python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv -m mistral # Add analyst metadata python3 bicorder_batch.py protocols_edited.csv -o analysis_output.csv \ -a "Your Name" -s "Your analytical standpoint" ``` --- ## Manual Workflow (Advanced) ### Step 1: Prepare the Analysis CSV Create a CSV with empty gradient columns: ```bash python3 bicorder_analyze.py protocols_edited.csv -o analysis_output.csv ``` Optional: Add analyst metadata: ```bash python3 bicorder_analyze.py protocols_edited.csv -o analysis_output.csv \ -a "Your Name" -s "Your analytical standpoint" ``` ### Step 2: Query Gradients for a Protocol Row Query all gradients for a specific protocol: ```bash python3 bicorder_query.py analysis_output.csv 1 ``` - Replace `1` with the row number you want to analyze - Each gradient is queried in a new chat with full protocol context - Each response is automatically parsed and written to the CSV - Progress is shown for each gradient Optional: Specify a model: ```bash python3 bicorder_query.py analysis_output.csv 1 -m mistral ``` ### Step 3: Repeat for All Protocols For each protocol in your CSV: ```bash python3 bicorder_query.py analysis_output.csv 1 python3 bicorder_query.py analysis_output.csv 2 python3 bicorder_query.py analysis_output.csv 3 # ... and so on # OR: Use bicorder_batch.py to automate all of this! ``` ## Architecture ### How It Works Each gradient query is sent to the LLM as a **new, independent chat**. Every query includes: - The protocol descriptor (name) - The protocol description - The gradient definition (left term, right term, and their descriptions) - Instructions to rate 1-9 This approach: - **Simplifies the code** - No conversation state management - **Prevents bias** - Each evaluation is independent, not influenced by previous responses - **Enables parallelization** - Queries could theoretically run concurrently - **Makes debugging easier** - Each query/response pair is self-contained ## Tips ### Dry Run Mode Test prompts without calling the LLM: ```bash python3 bicorder_query.py analysis_output.csv 1 --dry-run ``` This shows you exactly what prompt will be sent for each gradient, including the full protocol context. ### Check Your Progress View completed values: ```bash python3 -c " import csv with open('analysis_output.csv') as f: reader = csv.DictReader(f) for i, row in enumerate(reader, 1): empty = sum(1 for k, v in row.items() if 'vs' in k and not v) print(f'Row {i}: {empty}/23 gradients empty') " ``` ### Batch Processing Use the `bicorder_batch.py` script (see Quick Start section above) for processing multiple protocols.