Protocol virtues study

The purpose of this study is to produce an inductive sample of virtues for living among protocols by drawing deductively on two recent, open-access books on protocols: The Protocol Reader and As for Protocols. It experiments with the use of LLMs.

This document describes in detail the method of the study.

Preliminary experiments

The script.sh file contains the instructions used to deploy an LLM on the texts. It was used to test several local models on the introductions (and then full text) of both books.

Observations:

The ministral-3 run took considerably more time than Gemma and LFM, which were both comparable.
The ministral-3 output (outputs/output-ministral3-20260310.csv) is fairly nonsensical and extremely repetitive; it does attempt to cite the source text but does so inaccurately
lfm2.5-thinking output (outputs/output-lfm25-20260310.csv) is hallucinatory and does not appear to draw from the source text meaningfully
gemma3 output (outputs/output-gemma3-20260310.csv) is constructive and plausible, and while the source-text quotations are not exact, they do resemble actual passages enough that most can be located and confirmed

Based on this, along with a broader recognition of the limits of LLM interpretation, I am opting for a more manual method, while using closely scrutinized LLM outputs as a corrective.

However, gemma3 output from the introductions to both books will be retained.

Method

The method for producing a list of virtues is as follows:

Highlighting through a manual re-reading of Protocol Reader and As for Protocols, while highlighting any passages that state or imply virtues for living well among protocols
Coding through manual grouping of the passages according to a set of concise candidate virtues
Analysis of the coding to identify patterns

This method seeks to obtain a list of virtues based on manual reading by the researcher, while consulting an LLM interpretation to identify any oversights on the part of the researcher.

Highlighting

Coding involved re-reading the books on KOReader. I highlighted passages that seemed to directly or indirectly relate to virtues for life among protocols (n=134). Of those, 62 were from As for Protocols and 72 were from The Protocol Reader. Those highlights were then exported into text files and then gathered into text_coding/snippets.csv.

Coding

The text_coding/snippets.csv data was ported into text_coding/coding.ods for coding. An initial list of virtues was derived from the gemma3 analyses of the introductions to the books (outputs/output-gemma3-20260310.csv, with 49 virtues, and outputs/output-gemma3-AsForProtocolsIntro-20260315.csv, with 22 virtues). Duplicates were removed along with entries that did not seem to qualify as virtues, resulting in a combined set of 56 virtues.

I then reviewed all of the highlighted snippets (in the coding tab of text_coding/coding.ods), coding each snippet with whatever virtue names seemed relevant to it. Digital copies of the books were on hand for consulting the surrounding context.

Additional virtues were added if the text appeared to communicate something not previously represented on the list (n=36). They were placed at the bottom of the list, which are seen first during coding, to prioritize the use of manually identified virtues.

Virtues were identified interpretively; their identification depended on the sense of the text, not necessarily the literal use of words, though I made efforts to use words from the texts where appropriate.

19 of the LLM-suggested virtues were not applied to any of the snippets.

Analysis

An initial analysis (results tab of the above spreadsheet, or text_coding/results.csv), aided by several LLM tools (kimi-k2.5, glm-5, minimax-m2.5), reveals a distribution with several clusters alongside the outlier of "Adaptability." But the groupings do not create any clear, natural cutoffs. It appears best to treat these virtues as a continuum rather than leaning too hard on the clustering, which is not statistically significant.

A multivariate analysis of the raw coding in (coding tab of the spreadsheet, or text_coding/coding.csv) suggests, again, that "Adaptability" is not only high in frequency but is a central hub. "Care" and "Consent" represent the strongest association, although they are not very frequent. Interestingly, the As for Protocols snippets have higher network density than the Protocol Reader ones. See multivariate analysis produced by kimi-k2.5 in text_coding/analysis/. The analysis was double-checked by minimax-m2.5.

Data stewardship

Both books are available freely on the internet in open-access editions. Initial processing was done with a local LLM, without transferring data to a cloud provider. Subsequent analysis was conducted with the Ollama cloud service, which does not permit model training on prompt data and does not retain prompts or responses.

The source texts are not included in this repository.

5.1 KiB Raw Permalink Blame History