Reorganize directory, add manual dataset and sync tooling
- Move all scripts to scripts/, web assets to web/, analysis results into self-contained data/readings/<type>_<YYYYMMDD>/ directories - Add data/readings/manual_20260320/ with 32 JSON readings from git.medlab.host/ntnsndr/protocol-bicorder-data - Add scripts/json_to_csv.py to convert bicorder JSON files to CSV - Add scripts/sync_readings.sh for one-command sync + re-analysis of any dataset backed by a .sync_source config file - Add scripts/classify_readings.py to apply the LDA classifier to all readings and save per-reading cluster assignments - Add --min-coverage flag to multivariate_analysis.py for sparse/shortform datasets; also applies in lda_visualization.py - Fix lda_visualization.py NaN handling and 0-d array annotation bug - Update README.md and WORKFLOW.md to document datasets, sync workflow, shortform handling, and new scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -7,7 +7,7 @@ Run these tests in order to verify the refactored code works correctly.
|
||||
Test that prompts are generated correctly with protocol context:
|
||||
|
||||
```bash
|
||||
python3 bicorder_query.py protocols_edited.csv 1 --dry-run | head -80
|
||||
python3 scripts/bicorder_query.py data/readings/synthetic_20251116/protocols_edited.csv 1 --dry-run | head -80
|
||||
```
|
||||
|
||||
**Expected result:**
|
||||
@@ -21,7 +21,7 @@ python3 bicorder_query.py protocols_edited.csv 1 --dry-run | head -80
|
||||
Check that the analyze script still creates proper CSV structure:
|
||||
|
||||
```bash
|
||||
python3 bicorder_analyze.py protocols_edited.csv -o test_output.csv
|
||||
python3 scripts/bicorder_analyze.py data/readings/synthetic_20251116/protocols_edited.csv -o test_output.csv
|
||||
head -1 test_output.csv | tr ',' '\n' | grep -E "(explicit|precise|elite)" | head -5
|
||||
```
|
||||
|
||||
@@ -36,7 +36,7 @@ head -1 test_output.csv | tr ',' '\n' | grep -E "(explicit|precise|elite)" | hea
|
||||
Query just one protocol to test the full pipeline:
|
||||
|
||||
```bash
|
||||
python3 bicorder_query.py test_output.csv 1 -m gpt-4o-mini
|
||||
python3 scripts/bicorder_query.py test_output.csv 1 -m gpt-4o-mini
|
||||
```
|
||||
|
||||
**Expected result:**
|
||||
@@ -61,7 +61,7 @@ Verify that the tool doesn't create any conversation files:
|
||||
llm logs list | grep -i bicorder
|
||||
|
||||
# Run a query
|
||||
python3 bicorder_query.py test_output.csv 2 -m gpt-4o-mini
|
||||
python3 scripts/bicorder_query.py test_output.csv 2 -m gpt-4o-mini
|
||||
|
||||
# After running test
|
||||
llm logs list | grep -i bicorder
|
||||
@@ -76,7 +76,7 @@ llm logs list | grep -i bicorder
|
||||
Test batch processing on rows 1-3:
|
||||
|
||||
```bash
|
||||
python3 bicorder_batch.py protocols_edited.csv -o test_batch_output.csv --start 1 --end 3 -m gpt-4o-mini
|
||||
python3 scripts/bicorder_batch.py data/readings/synthetic_20251116/protocols_edited.csv -o test_batch_output.csv --start 1 --end 3 -m gpt-4o-mini
|
||||
```
|
||||
|
||||
**Expected result:**
|
||||
@@ -106,7 +106,7 @@ with open('test_batch_output.csv') as f:
|
||||
Test that model parameter works in dry run:
|
||||
|
||||
```bash
|
||||
python3 bicorder_query.py protocols_edited.csv 5 --dry-run -m mistral | head -50
|
||||
python3 scripts/bicorder_query.py data/readings/synthetic_20251116/protocols_edited.csv 5 --dry-run -m mistral | head -50
|
||||
```
|
||||
|
||||
**Expected result:**
|
||||
@@ -117,7 +117,7 @@ python3 bicorder_query.py protocols_edited.csv 5 --dry-run -m mistral | head -50
|
||||
Test with invalid row number:
|
||||
|
||||
```bash
|
||||
python3 bicorder_query.py test_output.csv 999
|
||||
python3 scripts/bicorder_query.py test_output.csv 999
|
||||
```
|
||||
|
||||
**Expected result:**
|
||||
@@ -129,11 +129,11 @@ Compare the new standalone prompts vs old system prompt approach:
|
||||
|
||||
```bash
|
||||
# New approach - protocol context in each prompt
|
||||
python3 bicorder_query.py protocols_edited.csv 1 --dry-run | grep -A 5 "Analyze this protocol"
|
||||
python3 scripts/bicorder_query.py data/readings/synthetic_20251116/protocols_edited.csv 1 --dry-run | grep -A 5 "Analyze this protocol"
|
||||
|
||||
# Old approach would have had protocol in system prompt only (no longer used)
|
||||
# Verify that protocol context appears in EVERY gradient prompt
|
||||
python3 bicorder_query.py protocols_edited.csv 1 --dry-run | grep -c "Analyze this protocol"
|
||||
python3 scripts/bicorder_query.py data/readings/synthetic_20251116/protocols_edited.csv 1 --dry-run | grep -c "Analyze this protocol"
|
||||
```
|
||||
|
||||
**Expected result:**
|
||||
|
||||
Reference in New Issue
Block a user