Flatten data/readings/ → data/

Remove the intermediate readings/ subdirectory level — dataset naming (synthetic_YYYYMMDD, manual_YYYYMMDD) already encodes what the data is. Update all path references across scripts and docs accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 17:46:23 -06:00
parent 1a80219a25
commit 60e83783ec
533 changed files with 97 additions and 97 deletions
@@ -6,26 +6,26 @@ Scripts were created with the assistance of Claude Code. Data processing was don

 ## Datasets

-Readings are organized under `data/readings/<type>_<YYYYMMDD>/`, each self-contained with its own `readings.csv`, `analysis/`, and `json/` subdirectories:
+Readings are organized under `data/<type>_<YYYYMMDD>/`, each self-contained with its own `readings.csv`, `analysis/`, and `json/` subdirectories:

- **`data/readings/synthetic_20251116/`** — 411 protocols from synthetic LLM-generated readings (see detailed procedure below)
- **`data/readings/manual_20260320/`** — manual readings collected at [git.medlab.host/ntnsndr/protocol-bicorder-data](https://git.medlab.host/ntnsndr/protocol-bicorder-data), continuously expanding
+- **`data/synthetic_20251116/`** — 411 protocols from synthetic LLM-generated readings (see detailed procedure below)
+- **`data/manual_20260320/`** — manual readings collected at [git.medlab.host/ntnsndr/protocol-bicorder-data](https://git.medlab.host/ntnsndr/protocol-bicorder-data), continuously expanding

 ### Syncing the manual dataset

 The manual dataset is kept current via a `.sync_source` config file and a one-command sync script:

 ```bash
-scripts/sync_readings.sh data/readings/manual_20260320
+scripts/sync_readings.sh data/manual_20260320
 ```

 This clones the remote repository, copies JSON reading files, regenerates `readings.csv`, runs multivariate analysis (filtering to well-covered dimensions), generates an LDA visualization, and saves per-reading cluster classifications to `analysis/classifications.csv`.

 Options:
 ```bash
-scripts/sync_readings.sh data/readings/manual_20260320 --min-coverage 0.8   # default
-scripts/sync_readings.sh data/readings/manual_20260320 --no-analysis        # sync JSON only
-scripts/sync_readings.sh data/readings/manual_20260320 --training data/readings/synthetic_20251116/readings.csv
+scripts/sync_readings.sh data/manual_20260320 --min-coverage 0.8   # default
+scripts/sync_readings.sh data/manual_20260320 --no-analysis        # sync JSON only
+scripts/sync_readings.sh data/manual_20260320 --training data/synthetic_20251116/readings.csv
 ```

 ### Handling shortform readings
@@ -59,7 +59,7 @@ context: "model running on ollama locally, accessed with llm on the command line
 prompt: "Return csv-formatted data (with no markdown wrapper) that consists of a list of protocols discussed or referred to in the attached text. Protocols are defined extremely broadly as 'patterns of interaction,' and may be of a nontechnical nature. Protocols should be as specific as possible, such as 'Sacrament of Reconciliation' rather than 'Religious Protocols.' The first column should provide a brief descriptor of the protocol, and the second column should describe it in a substantial paragraph of 3-5 sentences, encapsulated in quotation marks to avoid breaking on commas. Be sure to paraphrase rather than quoting directly from the source text."
 ```

-The result was a CSV-formatted list of protocols (`data/readings/synthetic_20251116/protocols_raw.csv`, n=774 total protocols listed).
+The result was a CSV-formatted list of protocols (`data/synthetic_20251116/protocols_raw.csv`, n=774 total protocols listed).

 ### Dataset cleaning

@@ -74,7 +74,7 @@ The dataset was then manually reviewed. The review involved the following:

 The cleaning process was carried out in a subjective manner, so some entries that meet the above criteria may remain in the dataset. The dataset also appears to include some LLM hallucinations---that is, protocols not in the texts---but the hallucinations are often acceptable examples and so some have been retained. Some degree of noise in the dataset was considered acceptable for the purposes of the study. Some degree of repetition, also, provides the dataset with a kind of control cases for evaluating the diagnostic process.

-The result was a CSV-formatted list of protocols (`data/readings/synthetic_20251116/protocols_edited.csv`, n=411).
+The result was a CSV-formatted list of protocols (`data/synthetic_20251116/protocols_edited.csv`, n=411).


 ### Initial diagnostic
@@ -83,24 +83,24 @@ This diagnostic used the file now at `bicorder_analyzed.json`, though the script

 For each row in the dataset, and on each gradient, a series of scripts prompts the LLM to apply each gradient to the protocol. The outputs are then added to a CSV output file.

-The result was a CSV-formatted list of protocols (`data/readings/synthetic_20251116/readings.csv`, n=411).
+The result was a CSV-formatted list of protocols (`data/synthetic_20251116/readings.csv`, n=411).

 See detailed documentation of the scripts at `WORKFLOW.md`. 

 ### Manual and alternate model audit

-To test the output, a manual review of the first 10 protocols in the `data/readings/synthetic_20251116/protocols_edited.csv` dataset was produced in the file `data/readings/synthetic_20251116/readings_manual.csv`. (Alphabetization in this case seems a reasonable proxy for a random sample of protocols. It includes some partially overlapping protocols, as does the dataset as a whole.) Additionally, three models were tested on the same cases:
+To test the output, a manual review of the first 10 protocols in the `data/synthetic_20251116/protocols_edited.csv` dataset was produced in the file `data/synthetic_20251116/readings_manual.csv`. (Alphabetization in this case seems a reasonable proxy for a random sample of protocols. It includes some partially overlapping protocols, as does the dataset as a whole.) Additionally, three models were tested on the same cases:

 ```bash
-python3 scripts/bicorder_batch.py data/readings/synthetic_20251116/protocols_edited.csv -o data/readings/synthetic_20251116/readings_mistral.csv -m mistral -a "Mistral" -s "A careful ethnographer and outsider aspiring to achieve a neutral stance and a high degree of precision" --start 1 --end 10
+python3 scripts/bicorder_batch.py data/synthetic_20251116/protocols_edited.csv -o data/synthetic_20251116/readings_mistral.csv -m mistral -a "Mistral" -s "A careful ethnographer and outsider aspiring to achieve a neutral stance and a high degree of precision" --start 1 --end 10
 ```

 ```bash
-python3 scripts/bicorder_batch.py data/readings/synthetic_20251116/protocols_edited.csv -o data/readings/synthetic_20251116/readings_gpt-oss.csv -m gpt-oss -a "GPT-OSS" -s "A careful ethnographer and outsider aspiring to achieve a neutral stance and a high degree of precision" --start 1 --end 10
+python3 scripts/bicorder_batch.py data/synthetic_20251116/protocols_edited.csv -o data/synthetic_20251116/readings_gpt-oss.csv -m gpt-oss -a "GPT-OSS" -s "A careful ethnographer and outsider aspiring to achieve a neutral stance and a high degree of precision" --start 1 --end 10
 ```

 ```bash
-python3 scripts/bicorder_batch.py data/readings/synthetic_20251116/protocols_edited.csv -o data/readings/synthetic_20251116/readings_gemma3-12b.csv -m gemma3:12b -a "Gemma3:12b" -s "A careful ethnographer and outsider aspiring to achieve a neutral stance and a high degree of precision" --start 1 --end 10
+python3 scripts/bicorder_batch.py data/synthetic_20251116/protocols_edited.csv -o data/synthetic_20251116/readings_gemma3-12b.csv -m gemma3:12b -a "Gemma3:12b" -s "A careful ethnographer and outsider aspiring to achieve a neutral stance and a high degree of precision" --start 1 --end 10
 ```

 A Euclidean distance analysis (`python3 scripts/compare_analyses.py`) found that the `gpt-oss` model was closer to the manual example than the others. It was therefore selected to be the model used for conducting the bicorder diagnostic on the dataset.
@@ -112,13 +112,13 @@ Average Euclidean Distance:
  3. readings_mistral.csv    - Avg Distance: 13.33
 ```

-Command used to produce `data/readings/synthetic_20251116/readings.csv` (using the Ollama cloud service for the `gpt-oss` model):
+Command used to produce `data/synthetic_20251116/readings.csv` (using the Ollama cloud service for the `gpt-oss` model):

 ```bash
-python3 scripts/bicorder_batch.py data/readings/synthetic_20251116/protocols_edited.csv -o data/readings/synthetic_20251116/readings.csv -m gpt-oss:20b-cloud -a "GPT-OSS" -s "A careful ethnographer and outsider aspiring to achieve a neutral stance and a high degree of precision"
+python3 scripts/bicorder_batch.py data/synthetic_20251116/protocols_edited.csv -o data/synthetic_20251116/readings.csv -m gpt-oss:20b-cloud -a "GPT-OSS" -s "A careful ethnographer and outsider aspiring to achieve a neutral stance and a high degree of precision"
 ```

-The result was a CSV-formatted list of protocols (`data/readings/synthetic_20251116/readings.csv`, n=411).
+The result was a CSV-formatted list of protocols (`data/synthetic_20251116/readings.csv`, n=411).

 ### Further analysis

@@ -126,7 +126,7 @@ The result was a CSV-formatted list of protocols (`data/readings/synthetic_20251

 Per-protocol values are meaningful for the bicorder because, despite varying levels of appropriateness, all of the gradients are structured as ranging from "hardness" to "softness"---with lower values associated with greater rigidity. The average value for a given protocol, therefore, provides a rough sense of the protocol's hardness. 

-Basic averages appear in `data/readings/synthetic_20251116/readings-analysis.ods`.
+Basic averages appear in `data/synthetic_20251116/readings-analysis.ods`.

 #### Univariate analysis

@@ -136,7 +136,7 @@ First, a plot of average values for each protocol:

 This reveals a linear distribution of values among the protocols, aside from exponential curves only at the extremes. Perhaps the most interesting finding is a skew toward the higher end of the scale, associated with softness. Even relatively hard, technical protocols appear to have significant soft characteristics.

-The protocol value averages have a mean of 5.45 and a median of 5.48. In comparison to the midpoint of 5, the normalized midpoint deviation is 0.11. In comparison, the Pearson coefficient measures the skew at just -0.07, which means that the relative skew of the data is actually slightly downward. So the distribution of protocol values is very balanced but has a consistent upward deviation from the scale's baseline. (These calculations are in `data/readings/synthetic_20251116/readings-analysis.odt[averages]`.)
+The protocol value averages have a mean of 5.45 and a median of 5.48. In comparison to the midpoint of 5, the normalized midpoint deviation is 0.11. In comparison, the Pearson coefficient measures the skew at just -0.07, which means that the relative skew of the data is actually slightly downward. So the distribution of protocol values is very balanced but has a consistent upward deviation from the scale's baseline. (These calculations are in `data/synthetic_20251116/readings-analysis.odt[averages]`.)

 Second, a plot of average values for each gradient (with gaps to indicate the three groupings of gradients):

@@ -165,21 +165,21 @@ Claude Code created a `multivariate_analysis.py` tool to conduct this analysis.

 ```bash
 # Run all analyses (default)
-python3 scripts/multivariate_analysis.py data/readings/synthetic_20251116/readings.csv
+python3 scripts/multivariate_analysis.py data/synthetic_20251116/readings.csv

 # Run specific analyses only
-python3 scripts/multivariate_analysis.py data/readings/synthetic_20251116/readings.csv --analyses
+python3 scripts/multivariate_analysis.py data/synthetic_20251116/readings.csv --analyses
 clustering pca
 ```

 Initial manual observations:

 * The correlations generally seem predictable; for example, the strongest is between `Design_static_vs_malleable` and `Experience_predictable_vs_emergent`, which is not surprising
-* The elite vs. vernacular distinction appears to be the most predictive gradient (`data/readings/synthetic_20251116/analysis/plots/feature_importances.png`)
+* The elite vs. vernacular distinction appears to be the most predictive gradient (`data/synthetic_20251116/analysis/plots/feature_importances.png`)

-![Correlation heatmap](data/readings/synthetic_20251116/analysis/plots/correlation_heatmap_full.png)
+![Correlation heatmap](data/synthetic_20251116/analysis/plots/correlation_heatmap_full.png)

-![Importance ranking](data/readings/synthetic_20251116/analysis/plots/feature_importances.png)
+![Importance ranking](data/synthetic_20251116/analysis/plots/feature_importances.png)

 Claude's interpretation: