Flatten data/readings/ → data/
Remove the intermediate readings/ subdirectory level — dataset naming (synthetic_YYYYMMDD, manual_YYYYMMDD) already encodes what the data is. Update all path references across scripts and docs accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@@ -12,7 +12,7 @@ This guide explains how to integrate the cluster classification system into the
|
|||||||
|
|
||||||
**Version-based compatibility**: The model includes a `bicorder_version` field. The classifier checks that versions match. When bicorder.json structure changes:
|
**Version-based compatibility**: The model includes a `bicorder_version` field. The classifier checks that versions match. When bicorder.json structure changes:
|
||||||
1. Increment the version number in bicorder.json
|
1. Increment the version number in bicorder.json
|
||||||
2. Retrain the model with `python3 scripts/export_model_for_js.py data/readings/synthetic_20251116/readings.csv`
|
2. Retrain the model with `python3 scripts/export_model_for_js.py data/synthetic_20251116/readings.csv`
|
||||||
3. The new model will have the updated version
|
3. The new model will have the updated version
|
||||||
|
|
||||||
This ensures the web app and model stay in sync without complex backward compatibility.
|
This ensures the web app and model stay in sync without complex backward compatibility.
|
||||||
@@ -25,7 +25,7 @@ This ensures the web app and model stay in sync without complex backward compati
|
|||||||
The model is the only artifact produced by this analysis directory that the app consumes. Regenerate it after re-running analysis on the synthetic dataset:
|
The model is the only artifact produced by this analysis directory that the app consumes. Regenerate it after re-running analysis on the synthetic dataset:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python3 scripts/export_model_for_js.py data/readings/synthetic_20251116/readings.csv
|
python3 scripts/export_model_for_js.py data/synthetic_20251116/readings.csv
|
||||||
```
|
```
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|||||||
@@ -6,26 +6,26 @@ Scripts were created with the assistance of Claude Code. Data processing was don
|
|||||||
|
|
||||||
## Datasets
|
## Datasets
|
||||||
|
|
||||||
Readings are organized under `data/readings/<type>_<YYYYMMDD>/`, each self-contained with its own `readings.csv`, `analysis/`, and `json/` subdirectories:
|
Readings are organized under `data/<type>_<YYYYMMDD>/`, each self-contained with its own `readings.csv`, `analysis/`, and `json/` subdirectories:
|
||||||
|
|
||||||
- **`data/readings/synthetic_20251116/`** — 411 protocols from synthetic LLM-generated readings (see detailed procedure below)
|
- **`data/synthetic_20251116/`** — 411 protocols from synthetic LLM-generated readings (see detailed procedure below)
|
||||||
- **`data/readings/manual_20260320/`** — manual readings collected at [git.medlab.host/ntnsndr/protocol-bicorder-data](https://git.medlab.host/ntnsndr/protocol-bicorder-data), continuously expanding
|
- **`data/manual_20260320/`** — manual readings collected at [git.medlab.host/ntnsndr/protocol-bicorder-data](https://git.medlab.host/ntnsndr/protocol-bicorder-data), continuously expanding
|
||||||
|
|
||||||
### Syncing the manual dataset
|
### Syncing the manual dataset
|
||||||
|
|
||||||
The manual dataset is kept current via a `.sync_source` config file and a one-command sync script:
|
The manual dataset is kept current via a `.sync_source` config file and a one-command sync script:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
scripts/sync_readings.sh data/readings/manual_20260320
|
scripts/sync_readings.sh data/manual_20260320
|
||||||
```
|
```
|
||||||
|
|
||||||
This clones the remote repository, copies JSON reading files, regenerates `readings.csv`, runs multivariate analysis (filtering to well-covered dimensions), generates an LDA visualization, and saves per-reading cluster classifications to `analysis/classifications.csv`.
|
This clones the remote repository, copies JSON reading files, regenerates `readings.csv`, runs multivariate analysis (filtering to well-covered dimensions), generates an LDA visualization, and saves per-reading cluster classifications to `analysis/classifications.csv`.
|
||||||
|
|
||||||
Options:
|
Options:
|
||||||
```bash
|
```bash
|
||||||
scripts/sync_readings.sh data/readings/manual_20260320 --min-coverage 0.8 # default
|
scripts/sync_readings.sh data/manual_20260320 --min-coverage 0.8 # default
|
||||||
scripts/sync_readings.sh data/readings/manual_20260320 --no-analysis # sync JSON only
|
scripts/sync_readings.sh data/manual_20260320 --no-analysis # sync JSON only
|
||||||
scripts/sync_readings.sh data/readings/manual_20260320 --training data/readings/synthetic_20251116/readings.csv
|
scripts/sync_readings.sh data/manual_20260320 --training data/synthetic_20251116/readings.csv
|
||||||
```
|
```
|
||||||
|
|
||||||
### Handling shortform readings
|
### Handling shortform readings
|
||||||
@@ -59,7 +59,7 @@ context: "model running on ollama locally, accessed with llm on the command line
|
|||||||
prompt: "Return csv-formatted data (with no markdown wrapper) that consists of a list of protocols discussed or referred to in the attached text. Protocols are defined extremely broadly as 'patterns of interaction,' and may be of a nontechnical nature. Protocols should be as specific as possible, such as 'Sacrament of Reconciliation' rather than 'Religious Protocols.' The first column should provide a brief descriptor of the protocol, and the second column should describe it in a substantial paragraph of 3-5 sentences, encapsulated in quotation marks to avoid breaking on commas. Be sure to paraphrase rather than quoting directly from the source text."
|
prompt: "Return csv-formatted data (with no markdown wrapper) that consists of a list of protocols discussed or referred to in the attached text. Protocols are defined extremely broadly as 'patterns of interaction,' and may be of a nontechnical nature. Protocols should be as specific as possible, such as 'Sacrament of Reconciliation' rather than 'Religious Protocols.' The first column should provide a brief descriptor of the protocol, and the second column should describe it in a substantial paragraph of 3-5 sentences, encapsulated in quotation marks to avoid breaking on commas. Be sure to paraphrase rather than quoting directly from the source text."
|
||||||
```
|
```
|
||||||
|
|
||||||
The result was a CSV-formatted list of protocols (`data/readings/synthetic_20251116/protocols_raw.csv`, n=774 total protocols listed).
|
The result was a CSV-formatted list of protocols (`data/synthetic_20251116/protocols_raw.csv`, n=774 total protocols listed).
|
||||||
|
|
||||||
### Dataset cleaning
|
### Dataset cleaning
|
||||||
|
|
||||||
@@ -74,7 +74,7 @@ The dataset was then manually reviewed. The review involved the following:
|
|||||||
|
|
||||||
The cleaning process was carried out in a subjective manner, so some entries that meet the above criteria may remain in the dataset. The dataset also appears to include some LLM hallucinations---that is, protocols not in the texts---but the hallucinations are often acceptable examples and so some have been retained. Some degree of noise in the dataset was considered acceptable for the purposes of the study. Some degree of repetition, also, provides the dataset with a kind of control cases for evaluating the diagnostic process.
|
The cleaning process was carried out in a subjective manner, so some entries that meet the above criteria may remain in the dataset. The dataset also appears to include some LLM hallucinations---that is, protocols not in the texts---but the hallucinations are often acceptable examples and so some have been retained. Some degree of noise in the dataset was considered acceptable for the purposes of the study. Some degree of repetition, also, provides the dataset with a kind of control cases for evaluating the diagnostic process.
|
||||||
|
|
||||||
The result was a CSV-formatted list of protocols (`data/readings/synthetic_20251116/protocols_edited.csv`, n=411).
|
The result was a CSV-formatted list of protocols (`data/synthetic_20251116/protocols_edited.csv`, n=411).
|
||||||
|
|
||||||
|
|
||||||
### Initial diagnostic
|
### Initial diagnostic
|
||||||
@@ -83,24 +83,24 @@ This diagnostic used the file now at `bicorder_analyzed.json`, though the script
|
|||||||
|
|
||||||
For each row in the dataset, and on each gradient, a series of scripts prompts the LLM to apply each gradient to the protocol. The outputs are then added to a CSV output file.
|
For each row in the dataset, and on each gradient, a series of scripts prompts the LLM to apply each gradient to the protocol. The outputs are then added to a CSV output file.
|
||||||
|
|
||||||
The result was a CSV-formatted list of protocols (`data/readings/synthetic_20251116/readings.csv`, n=411).
|
The result was a CSV-formatted list of protocols (`data/synthetic_20251116/readings.csv`, n=411).
|
||||||
|
|
||||||
See detailed documentation of the scripts at `WORKFLOW.md`.
|
See detailed documentation of the scripts at `WORKFLOW.md`.
|
||||||
|
|
||||||
### Manual and alternate model audit
|
### Manual and alternate model audit
|
||||||
|
|
||||||
To test the output, a manual review of the first 10 protocols in the `data/readings/synthetic_20251116/protocols_edited.csv` dataset was produced in the file `data/readings/synthetic_20251116/readings_manual.csv`. (Alphabetization in this case seems a reasonable proxy for a random sample of protocols. It includes some partially overlapping protocols, as does the dataset as a whole.) Additionally, three models were tested on the same cases:
|
To test the output, a manual review of the first 10 protocols in the `data/synthetic_20251116/protocols_edited.csv` dataset was produced in the file `data/synthetic_20251116/readings_manual.csv`. (Alphabetization in this case seems a reasonable proxy for a random sample of protocols. It includes some partially overlapping protocols, as does the dataset as a whole.) Additionally, three models were tested on the same cases:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python3 scripts/bicorder_batch.py data/readings/synthetic_20251116/protocols_edited.csv -o data/readings/synthetic_20251116/readings_mistral.csv -m mistral -a "Mistral" -s "A careful ethnographer and outsider aspiring to achieve a neutral stance and a high degree of precision" --start 1 --end 10
|
python3 scripts/bicorder_batch.py data/synthetic_20251116/protocols_edited.csv -o data/synthetic_20251116/readings_mistral.csv -m mistral -a "Mistral" -s "A careful ethnographer and outsider aspiring to achieve a neutral stance and a high degree of precision" --start 1 --end 10
|
||||||
```
|
```
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python3 scripts/bicorder_batch.py data/readings/synthetic_20251116/protocols_edited.csv -o data/readings/synthetic_20251116/readings_gpt-oss.csv -m gpt-oss -a "GPT-OSS" -s "A careful ethnographer and outsider aspiring to achieve a neutral stance and a high degree of precision" --start 1 --end 10
|
python3 scripts/bicorder_batch.py data/synthetic_20251116/protocols_edited.csv -o data/synthetic_20251116/readings_gpt-oss.csv -m gpt-oss -a "GPT-OSS" -s "A careful ethnographer and outsider aspiring to achieve a neutral stance and a high degree of precision" --start 1 --end 10
|
||||||
```
|
```
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python3 scripts/bicorder_batch.py data/readings/synthetic_20251116/protocols_edited.csv -o data/readings/synthetic_20251116/readings_gemma3-12b.csv -m gemma3:12b -a "Gemma3:12b" -s "A careful ethnographer and outsider aspiring to achieve a neutral stance and a high degree of precision" --start 1 --end 10
|
python3 scripts/bicorder_batch.py data/synthetic_20251116/protocols_edited.csv -o data/synthetic_20251116/readings_gemma3-12b.csv -m gemma3:12b -a "Gemma3:12b" -s "A careful ethnographer and outsider aspiring to achieve a neutral stance and a high degree of precision" --start 1 --end 10
|
||||||
```
|
```
|
||||||
|
|
||||||
A Euclidean distance analysis (`python3 scripts/compare_analyses.py`) found that the `gpt-oss` model was closer to the manual example than the others. It was therefore selected to be the model used for conducting the bicorder diagnostic on the dataset.
|
A Euclidean distance analysis (`python3 scripts/compare_analyses.py`) found that the `gpt-oss` model was closer to the manual example than the others. It was therefore selected to be the model used for conducting the bicorder diagnostic on the dataset.
|
||||||
@@ -112,13 +112,13 @@ Average Euclidean Distance:
|
|||||||
3. readings_mistral.csv - Avg Distance: 13.33
|
3. readings_mistral.csv - Avg Distance: 13.33
|
||||||
```
|
```
|
||||||
|
|
||||||
Command used to produce `data/readings/synthetic_20251116/readings.csv` (using the Ollama cloud service for the `gpt-oss` model):
|
Command used to produce `data/synthetic_20251116/readings.csv` (using the Ollama cloud service for the `gpt-oss` model):
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python3 scripts/bicorder_batch.py data/readings/synthetic_20251116/protocols_edited.csv -o data/readings/synthetic_20251116/readings.csv -m gpt-oss:20b-cloud -a "GPT-OSS" -s "A careful ethnographer and outsider aspiring to achieve a neutral stance and a high degree of precision"
|
python3 scripts/bicorder_batch.py data/synthetic_20251116/protocols_edited.csv -o data/synthetic_20251116/readings.csv -m gpt-oss:20b-cloud -a "GPT-OSS" -s "A careful ethnographer and outsider aspiring to achieve a neutral stance and a high degree of precision"
|
||||||
```
|
```
|
||||||
|
|
||||||
The result was a CSV-formatted list of protocols (`data/readings/synthetic_20251116/readings.csv`, n=411).
|
The result was a CSV-formatted list of protocols (`data/synthetic_20251116/readings.csv`, n=411).
|
||||||
|
|
||||||
### Further analysis
|
### Further analysis
|
||||||
|
|
||||||
@@ -126,7 +126,7 @@ The result was a CSV-formatted list of protocols (`data/readings/synthetic_20251
|
|||||||
|
|
||||||
Per-protocol values are meaningful for the bicorder because, despite varying levels of appropriateness, all of the gradients are structured as ranging from "hardness" to "softness"---with lower values associated with greater rigidity. The average value for a given protocol, therefore, provides a rough sense of the protocol's hardness.
|
Per-protocol values are meaningful for the bicorder because, despite varying levels of appropriateness, all of the gradients are structured as ranging from "hardness" to "softness"---with lower values associated with greater rigidity. The average value for a given protocol, therefore, provides a rough sense of the protocol's hardness.
|
||||||
|
|
||||||
Basic averages appear in `data/readings/synthetic_20251116/readings-analysis.ods`.
|
Basic averages appear in `data/synthetic_20251116/readings-analysis.ods`.
|
||||||
|
|
||||||
#### Univariate analysis
|
#### Univariate analysis
|
||||||
|
|
||||||
@@ -136,7 +136,7 @@ First, a plot of average values for each protocol:
|
|||||||
|
|
||||||
This reveals a linear distribution of values among the protocols, aside from exponential curves only at the extremes. Perhaps the most interesting finding is a skew toward the higher end of the scale, associated with softness. Even relatively hard, technical protocols appear to have significant soft characteristics.
|
This reveals a linear distribution of values among the protocols, aside from exponential curves only at the extremes. Perhaps the most interesting finding is a skew toward the higher end of the scale, associated with softness. Even relatively hard, technical protocols appear to have significant soft characteristics.
|
||||||
|
|
||||||
The protocol value averages have a mean of 5.45 and a median of 5.48. In comparison to the midpoint of 5, the normalized midpoint deviation is 0.11. In comparison, the Pearson coefficient measures the skew at just -0.07, which means that the relative skew of the data is actually slightly downward. So the distribution of protocol values is very balanced but has a consistent upward deviation from the scale's baseline. (These calculations are in `data/readings/synthetic_20251116/readings-analysis.odt[averages]`.)
|
The protocol value averages have a mean of 5.45 and a median of 5.48. In comparison to the midpoint of 5, the normalized midpoint deviation is 0.11. In comparison, the Pearson coefficient measures the skew at just -0.07, which means that the relative skew of the data is actually slightly downward. So the distribution of protocol values is very balanced but has a consistent upward deviation from the scale's baseline. (These calculations are in `data/synthetic_20251116/readings-analysis.odt[averages]`.)
|
||||||
|
|
||||||
Second, a plot of average values for each gradient (with gaps to indicate the three groupings of gradients):
|
Second, a plot of average values for each gradient (with gaps to indicate the three groupings of gradients):
|
||||||
|
|
||||||
@@ -165,21 +165,21 @@ Claude Code created a `multivariate_analysis.py` tool to conduct this analysis.
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Run all analyses (default)
|
# Run all analyses (default)
|
||||||
python3 scripts/multivariate_analysis.py data/readings/synthetic_20251116/readings.csv
|
python3 scripts/multivariate_analysis.py data/synthetic_20251116/readings.csv
|
||||||
|
|
||||||
# Run specific analyses only
|
# Run specific analyses only
|
||||||
python3 scripts/multivariate_analysis.py data/readings/synthetic_20251116/readings.csv --analyses
|
python3 scripts/multivariate_analysis.py data/synthetic_20251116/readings.csv --analyses
|
||||||
clustering pca
|
clustering pca
|
||||||
```
|
```
|
||||||
|
|
||||||
Initial manual observations:
|
Initial manual observations:
|
||||||
|
|
||||||
* The correlations generally seem predictable; for example, the strongest is between `Design_static_vs_malleable` and `Experience_predictable_vs_emergent`, which is not surprising
|
* The correlations generally seem predictable; for example, the strongest is between `Design_static_vs_malleable` and `Experience_predictable_vs_emergent`, which is not surprising
|
||||||
* The elite vs. vernacular distinction appears to be the most predictive gradient (`data/readings/synthetic_20251116/analysis/plots/feature_importances.png`)
|
* The elite vs. vernacular distinction appears to be the most predictive gradient (`data/synthetic_20251116/analysis/plots/feature_importances.png`)
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
Claude's interpretation:
|
Claude's interpretation:
|
||||||
|
|
||||||
|
|||||||
@@ -7,7 +7,7 @@ Run these tests in order to verify the refactored code works correctly.
|
|||||||
Test that prompts are generated correctly with protocol context:
|
Test that prompts are generated correctly with protocol context:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python3 scripts/bicorder_query.py data/readings/synthetic_20251116/protocols_edited.csv 1 --dry-run | head -80
|
python3 scripts/bicorder_query.py data/synthetic_20251116/protocols_edited.csv 1 --dry-run | head -80
|
||||||
```
|
```
|
||||||
|
|
||||||
**Expected result:**
|
**Expected result:**
|
||||||
@@ -21,7 +21,7 @@ python3 scripts/bicorder_query.py data/readings/synthetic_20251116/protocols_edi
|
|||||||
Check that the analyze script still creates proper CSV structure:
|
Check that the analyze script still creates proper CSV structure:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python3 scripts/bicorder_analyze.py data/readings/synthetic_20251116/protocols_edited.csv -o test_output.csv
|
python3 scripts/bicorder_analyze.py data/synthetic_20251116/protocols_edited.csv -o test_output.csv
|
||||||
head -1 test_output.csv | tr ',' '\n' | grep -E "(explicit|precise|elite)" | head -5
|
head -1 test_output.csv | tr ',' '\n' | grep -E "(explicit|precise|elite)" | head -5
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -76,7 +76,7 @@ llm logs list | grep -i bicorder
|
|||||||
Test batch processing on rows 1-3:
|
Test batch processing on rows 1-3:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python3 scripts/bicorder_batch.py data/readings/synthetic_20251116/protocols_edited.csv -o test_batch_output.csv --start 1 --end 3 -m gpt-4o-mini
|
python3 scripts/bicorder_batch.py data/synthetic_20251116/protocols_edited.csv -o test_batch_output.csv --start 1 --end 3 -m gpt-4o-mini
|
||||||
```
|
```
|
||||||
|
|
||||||
**Expected result:**
|
**Expected result:**
|
||||||
@@ -106,7 +106,7 @@ with open('test_batch_output.csv') as f:
|
|||||||
Test that model parameter works in dry run:
|
Test that model parameter works in dry run:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python3 scripts/bicorder_query.py data/readings/synthetic_20251116/protocols_edited.csv 5 --dry-run -m mistral | head -50
|
python3 scripts/bicorder_query.py data/synthetic_20251116/protocols_edited.csv 5 --dry-run -m mistral | head -50
|
||||||
```
|
```
|
||||||
|
|
||||||
**Expected result:**
|
**Expected result:**
|
||||||
@@ -129,11 +129,11 @@ Compare the new standalone prompts vs old system prompt approach:
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
# New approach - protocol context in each prompt
|
# New approach - protocol context in each prompt
|
||||||
python3 scripts/bicorder_query.py data/readings/synthetic_20251116/protocols_edited.csv 1 --dry-run | grep -A 5 "Analyze this protocol"
|
python3 scripts/bicorder_query.py data/synthetic_20251116/protocols_edited.csv 1 --dry-run | grep -A 5 "Analyze this protocol"
|
||||||
|
|
||||||
# Old approach would have had protocol in system prompt only (no longer used)
|
# Old approach would have had protocol in system prompt only (no longer used)
|
||||||
# Verify that protocol context appears in EVERY gradient prompt
|
# Verify that protocol context appears in EVERY gradient prompt
|
||||||
python3 scripts/bicorder_query.py data/readings/synthetic_20251116/protocols_edited.csv 1 --dry-run | grep -c "Analyze this protocol"
|
python3 scripts/bicorder_query.py data/synthetic_20251116/protocols_edited.csv 1 --dry-run | grep -c "Analyze this protocol"
|
||||||
```
|
```
|
||||||
|
|
||||||
**Expected result:**
|
**Expected result:**
|
||||||
|
|||||||
@@ -27,10 +27,10 @@ The scripts automatically draw the gradients from the current state of the [bico
|
|||||||
|
|
||||||
## Syncing a manual readings dataset
|
## Syncing a manual readings dataset
|
||||||
|
|
||||||
If the dataset has a `.sync_source` file (e.g., `data/readings/manual_20260320/`), one command handles everything:
|
If the dataset has a `.sync_source` file (e.g., `data/manual_20260320/`), one command handles everything:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
scripts/sync_readings.sh data/readings/manual_20260320
|
scripts/sync_readings.sh data/manual_20260320
|
||||||
```
|
```
|
||||||
|
|
||||||
This fetches new JSON files from the remote repo, regenerates `readings.csv`, runs multivariate analysis (with `--min-coverage 0.8` to handle shortform readings), generates the LDA visualization, and saves cluster classifications to `analysis/classifications.csv`.
|
This fetches new JSON files from the remote repo, regenerates `readings.csv`, runs multivariate analysis (with `--min-coverage 0.8` to handle shortform readings), generates the LDA visualization, and saves cluster classifications to `analysis/classifications.csv`.
|
||||||
@@ -39,15 +39,15 @@ This fetches new JSON files from the remote repo, regenerates `readings.csv`, ru
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Full analysis pipeline
|
# Full analysis pipeline
|
||||||
python3 scripts/multivariate_analysis.py data/readings/manual_20260320/readings.csv \
|
python3 scripts/multivariate_analysis.py data/manual_20260320/readings.csv \
|
||||||
--min-coverage 0.8 \
|
--min-coverage 0.8 \
|
||||||
--analyses clustering pca correlation importance
|
--analyses clustering pca correlation importance
|
||||||
|
|
||||||
# LDA visualization (cluster separation plot)
|
# LDA visualization (cluster separation plot)
|
||||||
python3 scripts/lda_visualization.py data/readings/manual_20260320/readings.csv
|
python3 scripts/lda_visualization.py data/manual_20260320/readings.csv
|
||||||
|
|
||||||
# Classify all readings (uses synthetic dataset as training data by default)
|
# Classify all readings (uses synthetic dataset as training data by default)
|
||||||
python3 scripts/classify_readings.py data/readings/manual_20260320/readings.csv
|
python3 scripts/classify_readings.py data/manual_20260320/readings.csv
|
||||||
```
|
```
|
||||||
|
|
||||||
Use `--min-coverage` (0.0–1.0) to drop dimension columns below the given coverage fraction before analysis. This is important for datasets with many shortform readings where most dimensions are sparsely filled.
|
Use `--min-coverage` (0.0–1.0) to drop dimension columns below the given coverage fraction before analysis. This is important for datasets with many shortform readings where most dimensions are sparsely filled.
|
||||||
@@ -57,8 +57,8 @@ Use `--min-coverage` (0.0–1.0) to drop dimension columns below the given cover
|
|||||||
If you have a directory of individual bicorder JSON reading files:
|
If you have a directory of individual bicorder JSON reading files:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python3 scripts/json_to_csv.py data/readings/manual_20260320/json/ \
|
python3 scripts/json_to_csv.py data/manual_20260320/json/ \
|
||||||
-o data/readings/manual_20260320/readings.csv
|
-o data/manual_20260320/readings.csv
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -68,7 +68,7 @@ python3 scripts/json_to_csv.py data/readings/manual_20260320/json/ \
|
|||||||
### Process All Protocols with One Command
|
### Process All Protocols with One Command
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python3 scripts/bicorder_batch.py data/readings/synthetic_20251116/protocols_edited.csv -o analysis_output.csv
|
python3 scripts/bicorder_batch.py data/synthetic_20251116/protocols_edited.csv -o analysis_output.csv
|
||||||
```
|
```
|
||||||
|
|
||||||
This will:
|
This will:
|
||||||
@@ -81,13 +81,13 @@ This will:
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Process only rows 1-5 (useful for testing)
|
# Process only rows 1-5 (useful for testing)
|
||||||
python3 scripts/bicorder_batch.py data/readings/synthetic_20251116/protocols_edited.csv -o analysis_output.csv --start 1 --end 5
|
python3 scripts/bicorder_batch.py data/synthetic_20251116/protocols_edited.csv -o analysis_output.csv --start 1 --end 5
|
||||||
|
|
||||||
# Use specific LLM model
|
# Use specific LLM model
|
||||||
python3 scripts/bicorder_batch.py data/readings/synthetic_20251116/protocols_edited.csv -o analysis_output.csv -m mistral
|
python3 scripts/bicorder_batch.py data/synthetic_20251116/protocols_edited.csv -o analysis_output.csv -m mistral
|
||||||
|
|
||||||
# Add analyst metadata
|
# Add analyst metadata
|
||||||
python3 scripts/bicorder_batch.py data/readings/synthetic_20251116/protocols_edited.csv -o analysis_output.csv \
|
python3 scripts/bicorder_batch.py data/synthetic_20251116/protocols_edited.csv -o analysis_output.csv \
|
||||||
-a "Your Name" -s "Your analytical standpoint"
|
-a "Your Name" -s "Your analytical standpoint"
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -100,12 +100,12 @@ python3 scripts/bicorder_batch.py data/readings/synthetic_20251116/protocols_edi
|
|||||||
Create a CSV with empty gradient columns:
|
Create a CSV with empty gradient columns:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python3 scripts/bicorder_analyze.py data/readings/synthetic_20251116/protocols_edited.csv -o analysis_output.csv
|
python3 scripts/bicorder_analyze.py data/synthetic_20251116/protocols_edited.csv -o analysis_output.csv
|
||||||
```
|
```
|
||||||
|
|
||||||
Optional: Add analyst metadata:
|
Optional: Add analyst metadata:
|
||||||
```bash
|
```bash
|
||||||
python3 scripts/bicorder_analyze.py data/readings/synthetic_20251116/protocols_edited.csv -o analysis_output.csv \
|
python3 scripts/bicorder_analyze.py data/synthetic_20251116/protocols_edited.csv -o analysis_output.csv \
|
||||||
-a "Your Name" -s "Your analytical standpoint"
|
-a "Your Name" -s "Your analytical standpoint"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
|
Before Width: | Height: | Size: 177 KiB After Width: | Height: | Size: 177 KiB |
|
Before Width: | Height: | Size: 255 KiB After Width: | Height: | Size: 255 KiB |
|
Before Width: | Height: | Size: 124 KiB After Width: | Height: | Size: 124 KiB |
|
Before Width: | Height: | Size: 399 KiB After Width: | Height: | Size: 399 KiB |
|
Before Width: | Height: | Size: 193 KiB After Width: | Height: | Size: 193 KiB |
|
Before Width: | Height: | Size: 323 KiB After Width: | Height: | Size: 323 KiB |
|
Before Width: | Height: | Size: 208 KiB After Width: | Height: | Size: 208 KiB |
|
Before Width: | Height: | Size: 258 KiB After Width: | Height: | Size: 258 KiB |
|
Before Width: | Height: | Size: 208 KiB After Width: | Height: | Size: 208 KiB |
|
Before Width: | Height: | Size: 200 KiB After Width: | Height: | Size: 200 KiB |
|
Before Width: | Height: | Size: 196 KiB After Width: | Height: | Size: 196 KiB |
|
Before Width: | Height: | Size: 1.0 MiB After Width: | Height: | Size: 1.0 MiB |
|
Before Width: | Height: | Size: 417 KiB After Width: | Height: | Size: 417 KiB |
|
Before Width: | Height: | Size: 496 KiB After Width: | Height: | Size: 496 KiB |
|
Before Width: | Height: | Size: 411 KiB After Width: | Height: | Size: 411 KiB |
|
Before Width: | Height: | Size: 851 KiB After Width: | Height: | Size: 851 KiB |
|
Before Width: | Height: | Size: 408 KiB After Width: | Height: | Size: 408 KiB |
|
Before Width: | Height: | Size: 352 KiB After Width: | Height: | Size: 352 KiB |
|
Before Width: | Height: | Size: 2.4 MiB After Width: | Height: | Size: 2.4 MiB |
|
Before Width: | Height: | Size: 189 KiB After Width: | Height: | Size: 189 KiB |
|
Before Width: | Height: | Size: 669 KiB After Width: | Height: | Size: 669 KiB |
|
Before Width: | Height: | Size: 9.6 MiB After Width: | Height: | Size: 9.6 MiB |
|
Before Width: | Height: | Size: 1.3 MiB After Width: | Height: | Size: 1.3 MiB |
|
Before Width: | Height: | Size: 890 KiB After Width: | Height: | Size: 890 KiB |
|
Before Width: | Height: | Size: 396 KiB After Width: | Height: | Size: 396 KiB |
|
Before Width: | Height: | Size: 194 KiB After Width: | Height: | Size: 194 KiB |
|
Before Width: | Height: | Size: 1.2 MiB After Width: | Height: | Size: 1.2 MiB |
|
Before Width: | Height: | Size: 890 KiB After Width: | Height: | Size: 890 KiB |
|
Before Width: | Height: | Size: 1.2 MiB After Width: | Height: | Size: 1.2 MiB |
|
Before Width: | Height: | Size: 906 KiB After Width: | Height: | Size: 906 KiB |