59 lines
5.1 KiB
Markdown
59 lines
5.1 KiB
Markdown
# Protocol virtues study
|
|
|
|
The purpose of this study is to produce an inductive sample of virtues for living among protocols by drawing deductively on two recent, open-access books on protocols: _[The Protocol Reader](https://summerofprotocols.com/research/protocol-reader)_ and _[As for Protocols](https://www.fulcrum.org/concern/monographs/3t945t729?locale=en)_. It experiments with the use of LLMs.
|
|
|
|
This document describes in detail the method of the study.
|
|
|
|
## Preliminary experiments
|
|
|
|
The `script.sh` file contains the instructions used to deploy an LLM on the texts. It was used to test several local models on the introductions (and then full text) of both books.
|
|
|
|
Observations:
|
|
|
|
* The ministral-3 run took considerably more time than Gemma and LFM, which were both comparable.
|
|
* The ministral-3 output (`outputs/output-ministral3-20260310.csv`) is fairly nonsensical and extremely repetitive; it does attempt to cite the source text but does so inaccurately
|
|
* lfm2.5-thinking output (`outputs/output-lfm25-20260310.csv`) is hallucinatory and does not appear to draw from the source text meaningfully
|
|
* gemma3 output (`outputs/output-gemma3-20260310.csv`) is constructive and plausible, and while the source-text quotations are not exact, they do resemble actual passages enough that most can be located and confirmed
|
|
|
|
Based on this, along with a broader recognition of the limits of LLM interpretation, I am opting for a more manual method, while using closely scrutinized LLM outputs as a corrective.
|
|
|
|
However, gemma3 output from the introductions to both books will be retained.
|
|
|
|
## Method
|
|
|
|
The method for producing a list of virtues is as follows:
|
|
|
|
* **Highlighting** through a manual re-reading of _Protocol Reader_ and _As for Protocols_, while highlighting any passages that state or imply virtues for living well among protocols
|
|
* **Coding** through manual grouping of the passages according to a set of concise candidate virtues
|
|
* **Analysis** of the coding to identify patterns
|
|
|
|
This method seeks to obtain a list of virtues based on manual reading by the researcher, while consulting an LLM interpretation to identify any oversights on the part of the researcher.
|
|
|
|
### Highlighting
|
|
|
|
Coding involved re-reading the books on KOReader. I highlighted passages that seemed to directly or indirectly relate to virtues for life among protocols (n=134). Of those, 62 were from _As for Protocols_ and 72 were from _The Protocol Reader_. Those highlights were then exported into text files and then gathered into `text_coding/snippets.csv`.
|
|
|
|
### Coding
|
|
|
|
The `text_coding/snippets.csv` data was ported into `text_coding/coding.ods` for coding. An initial list of virtues was derived from the gemma3 analyses of the introductions to the books (`outputs/output-gemma3-20260310.csv`, with 49 virtues, and `outputs/output-gemma3-AsForProtocolsIntro-20260315.csv`, with 22 virtues). Duplicates were removed along with entries that did not seem to qualify as virtues, resulting in a combined set of 56 virtues.
|
|
|
|
I then reviewed all of the highlighted snippets (in the `coding` tab of `text_coding/coding.ods`), coding each snippet with whatever virtue names seemed relevant to it. Digital copies of the books were on hand for consulting the surrounding context.
|
|
|
|
Additional virtues were added if the text appeared to communicate something not previously represented on the list (n=36). They were placed at the bottom of the list, which are seen first during coding, to prioritize the use of manually identified virtues.
|
|
|
|
Virtues were identified interpretively; their identification depended on the sense of the text, not necessarily the literal use of words, though I made efforts to use words from the texts where appropriate.
|
|
|
|
19 of the LLM-suggested virtues were not applied to any of the snippets.
|
|
|
|
### Analysis
|
|
|
|
An initial analysis (`results` tab of the above spreadsheet, or `text_coding/results.csv`), aided by several LLM tools (kimi-k2.5, glm-5, minimax-m2.5), reveals a distribution with several clusters alongside the outlier of "Adaptability." But the groupings do not create any clear, natural cutoffs. It appears best to treat these virtues as a continuum rather than leaning too hard on the clustering, which is not statistically significant.
|
|
|
|
A multivariate analysis of the raw coding in (`coding` tab of the spreadsheet, or `text_coding/coding.csv`) suggests, again, that "Adaptability" is not only high in frequency but is a central hub. "Care" and "Consent" represent the strongest association, although they are not very frequent. Interestingly, the _As for Protocols_ snippets have higher network density than the _Protocol Reader_ ones. See multivariate analysis produced by [kimi-k2.5](https://ollama.com/library/kimi-k2.5) in `text_coding/analysis/`.
|
|
|
|
## Data stewardship
|
|
|
|
Both books are available freely on the internet in open-access editions. Initial processing was done with a local LLM, without transferring data to a cloud provider. Subsequent analysis was conducted with the Ollama cloud service, which does not permit model training on prompt data and does not retain prompts or responses.
|
|
|
|
The source texts are not included in this repository.
|