diff --git a/analysis/README.md b/analysis/README.md index 77b293a..445edb3 100644 --- a/analysis/README.md +++ b/analysis/README.md @@ -95,17 +95,17 @@ Basic averages appear in `diagnostic_output-analysis.ods`. #### Univariate analysis -First, a histogram of average values for each protocol: +First, a plot of average values for each protocol: -![Protocol averages histogram](img/protocol_averages.png) +![Protocol averages plot](img/protocol_averages.png) This reveals a linear distribution of values among the protocols, aside from exponential curves only at the extremes. Perhaps the most interesting finding is a skew toward the higher end of the scale, associated with softness. Even relatively hard, technical protocols appear to have significant soft characteristics. The protocol value averages have a mean of 5.45 and a median of 5.48. In comparison to the midpoint of 5, the normalized midpoint deviation is 0.11. In comparison, the Pearson coefficient measures the skew at just -0.07, which means that the relative skew of the data is actually slightly downward. So the distribution of protocol values is very balanced but has a consistent upward deviation from the scale's baseline. (These calculations are in `diagnostic_output-analysis.odt[averages]`.) -Second, a histogram of average values for each gradient: +Second, a plot of average values for each gradient (with gaps to indicate the three groupings of gradients): -![Gradient averages histogram](img/gradient_averages.png) +![Gradient averages plot](img/gradient_averages.png) This indicates that a few of the gradients appear to have outsized responsibility for the high skew of the protocol averages. @@ -142,6 +142,10 @@ Initial manual observations: * The correlations generally seem predictable; for example, the strongest is between `Design_static_vs_malleable` and `Experience_predictable_vs_emergent`, which is not surprising * The elite vs. vernacular distinction appears to be the most predictive gradient (`analysis_results/plots/feature_importances.png`) +![Correlation heatmap](analysis_results/plots/correlation_heatmap_full.png) + +![Importance ranking](analysis_results/plots/feature_importances.png) + Claude's interpretation: > 1. Two Fundamental Protocol Types (K-Means Clustering) @@ -149,15 +153,13 @@ Claude's interpretation: > The data reveals two distinct protocol families (216 vs 192 protocols): > > Cluster 1: "Vernacular/Emergent Protocols" -> - Examples: Marronage, Songlines, Access-Centered Practices, Ethereum Proof of -> Work, Sangoma Healing Practices +> - Examples: Marronage, Songlines, Access-Centered Practices, Ethereum Proof of Work, Sangoma Healing Practices > - Characteristics: > - HIGH: elite→vernacular (6.4), malleable (7.3), flocking→swarming (6.4) > - LOW: self-enforcing→enforced (3.4), sovereign→subsidiary (2.9) > > Cluster 2: "Institutional/Standardized Protocols" -> - Examples: ISO standards, Greenwich Mean Time, Building Codes, German -> Bureaucratic Prose, Royal Access Protocol +> - Examples: ISO standards, Greenwich Mean Time, Building Codes, German Bureaucratic Prose, Royal Access Protocol > - Characteristics: > - HIGH: self-enforcing (7.1), sovereign (6.0) > - LOW: elite→vernacular (1.8), flocking→swarming (2.3), static (3.5) @@ -177,45 +179,35 @@ Claude's interpretation: > > 3. Strong Correlations (Most Significant Relationships) > -> 1. Static ↔ Predictable (r=0.61): Unchanging protocols create predictable -> experiences -> 2. Elite ↔ Self-enforcing (r=-0.58): Elite protocols need external -> enforcement; vernacular ones self-enforce -> 3. Self-enforcing ↔ Flocking (r=-0.56): Self-enforcing protocols resist -> swarming dynamics -> 4. Exclusion ↔ Kafka (r=0.52): Exclusionary protocols feel Kafka-esque +> 1. Static ↔ Predictable (r=0.61): Unchanging protocols create predictable experiences +> 2. Elite ↔ Self-enforcing (r=-0.58): Elite protocols need external enforcement; vernacular ones self-enforce +> 3. Self-enforcing ↔ Flocking (r=-0.56): Self-enforcing protocols resist swarming dynamics +> 4. Exclusion ↔ Kafka (r=0.52): Exclusionary protocols feel Kafka-esque > > 4. Most Discriminative Dimension (Feature Importance) > -> Design_elite_vs_vernacular (22.7% importance) is by far the most powerful -> predictor of protocol type, followed by Entanglement_flocking_vs_swarming -> (13.8%). +> Design_elite_vs_vernacular (22.7% importance) is by far the most powerful predictor of protocol type, followed by Entanglement_flocking_vs_swarming (13.8%). > > 5. Most "Central" Protocols (Network Analysis) > > These protocols share the most dimensional similarities with others: -> 1. VPN Usage (Circumvention Protocol) - bridges many protocol types -> 2. Access Check-in - connects accessibility and participation patterns -> 3. Quadratic Voting - spans governance dimensions +> 1. VPN Usage (Circumvention Protocol) - bridges many protocol types +> 2. Access Check-in - connects accessibility and participation patterns +> 3. Quadratic Voting - spans governance dimensions > > 6. Outliers (DBSCAN found 281!) > -> Most protocols are actually quite unique - DBSCAN identified 281 outliers, -> suggesting the dataset contains many distinctive protocol configurations that -> don't fit neat clusters. Only 10 tight sub-clusters exist. +> Most protocols are actually quite unique - DBSCAN identified 281 outliers, suggesting the dataset contains many distinctive protocol configurations that don't fit neat clusters. Only 10 tight sub-clusters exist. > > 7. Category Prediction Power > -> - Design dimensions predict clustering with 90.4% accuracy -> - Entanglement dimensions: 89.2% accuracy -> - Experience dimensions: only 78.3% accuracy +> - Design dimensions predict clustering with 90.4% accuracy +> - Entanglement dimensions: 89.2% accuracy +> - Experience dimensions: only 78.3% accuracy > > This suggests Design and Entanglement are more fundamental than Experience. > -> The core insight: Protocols fundamentally divide between -> vernacular/emergent/malleable forms and institutional/standardized/static -> forms, with the elite↔vernacular dimension being the strongest predictor of -> all other characteristics. +> The core insight: Protocols fundamentally divide between vernacular/emergent/malleable forms and institutional/standardized/static forms, with the elite↔vernacular dimension being the strongest predictor of all other characteristics. Comments: @@ -223,8 +215,6 @@ Comments: * Strange that "Ethereum Proof of Work" appears in the "Vernacular/Emergent" family - - ## Conclusions ### Improvements to the bicorder @@ -256,8 +246,7 @@ Claude report on possible improvements: > Tier 3 - Supplementary (<5%): > - All remaining 17 dimensions > -> Recommendation: Reorganize the tool to present Tier 1 dimensions first, or -> mark them as "core diagnostics" vs. "supplementary diagnostics." +> Recommendation: Reorganize the tool to present Tier 1 dimensions first, or mark them as "core diagnostics" vs. "supplementary diagnostics." > > 2. Consider Reducing Low-Value Dimensions > @@ -269,27 +258,22 @@ Claude report on possible improvements: > - Design_durable_vs_ephemeral (1.2% importance, σ=2.41) > - Entanglement_defensible_vs_exposed (1.1% importance, σ=2.41) > -> Recommendation: Either remove these or combine into composite measures. Going -> from 23→18 dimensions would reduce analyst burden by ~20% with minimal -> information loss. +> Recommendation: Either remove these or combine into composite measures. Going from 23→18 dimensions would reduce analyst burden by ~20% with minimal information loss. > > 3. Add Composite Scores 📊 > > Since PC1 explains the main variance, create derived metrics: > > "Protocol Type Score" (based on PC1 loadings): -> Score = elite_vs_vernacular(0.36) + static_vs_malleable(0.33) + -> flocking_vs_swarming(0.31) - self-enforcing_vs_enforced(0.29) +> Score = elite_vs_vernacular(0.36) + static_vs_malleable(0.33) + flocking_vs_swarming(0.31) - self-enforcing_vs_enforced(0.29) > - High score = Institutional/Standardized > - Low score = Vernacular/Emergent > > "Protocol Completeness Score" (based on PC2): -> Score = sufficient_vs_insufficient(0.43) + crystallized_vs_contested(0.38) - -> Kafka_vs_Whitehead(0.36) +> Score = sufficient_vs_insufficient(0.43) + crystallized_vs_contested(0.38) - Kafka_vs_Whitehead(0.36) > - Measures how "finished" vs. "kafkaesque" a protocol feels > -> Recommendation: Display these composite scores alongside individual dimensions -> to provide quick high-level insights. +> Recommendation: Display these composite scores alongside individual dimensions to provide quick high-level insights. > > 4. Highlight Key Correlations 🔗 > @@ -311,12 +295,10 @@ Claude report on possible improvements: > > Two dimension pairs show moderate correlation within the same category: > -> 1. Entanglement_abstract_vs_embodied ↔ Entanglement_flocking_vs_swarming -> (r=-0.56) +> 1. Entanglement_abstract_vs_embodied ↔ Entanglement_flocking_vs_swarming (r=-0.56) > 2. Design_documenting_vs_enabling ↔ Design_static_vs_malleable (r=0.53) > -> Recommendation: Consider merging these or making one primary and the other -> optional. +> Recommendation: Consider merging these or making one primary and the other optional. > > 6. Rebalance Categories ⚖️ > @@ -340,8 +322,7 @@ Claude report on possible improvements: > - Entanglement_exclusive_vs_non-exclusive: 93% of protocols rate 7-9 > - Design_technical_vs_social: Mean=7.6, heavily skewed toward "social" > -> Recommendation: Consider revising these gradient definitions or endpoints to -> achieve better distribution. +> Recommendation: Consider revising these gradient definitions or endpoints to achieve better distribution. > > 8. Create Shortened Version 🎯 > @@ -360,14 +341,12 @@ Claude report on possible improvements: > > 9. Add Comparison Features > -> The network analysis shows some protocols are highly "central" (similar to -> many others): +> The network analysis shows some protocols are highly "central" (similar to many others): > - VPN Usage (Circumvention Protocol) > - Access Check-in > - Quadratic Voting > -> Recommendation: After rating a protocol, show: "This protocol is most similar -> to: [X, Y, Z]" based on dimensional proximity. +> Recommendation: After rating a protocol, show: "This protocol is most similar to: [X, Y, Z]" based on dimensional proximity. > Summary of Recommendations @@ -381,8 +360,7 @@ Claude report on possible improvements: > 5. 💡 Add contextual tooltips about correlations > 6. 📊 Show similar protocols after assessment -> This would make the tool ~20% faster to use while maintaining 95%+ of its -> discriminative power. +> This would make the tool ~20% faster to use while maintaining 95%+ of its discriminative power. Questions: diff --git a/analysis/bicorder_analyze.py b/analysis/bicorder_analyze.py old mode 100755 new mode 100644 diff --git a/analysis/bicorder_batch.py b/analysis/bicorder_batch.py old mode 100755 new mode 100644 diff --git a/analysis/bicorder_init.py b/analysis/bicorder_init.py old mode 100755 new mode 100644 diff --git a/analysis/bicorder_query.py b/analysis/bicorder_query.py old mode 100755 new mode 100644 diff --git a/analysis/chunk.sh b/analysis/chunk.sh old mode 100644 new mode 100755 diff --git a/analysis/multivariate_analysis.py b/analysis/multivariate_analysis.py old mode 100755 new mode 100644