Chapter 11: ConceptMap-Text — Model Building and Clustering

Chapter 11: ConceptMap-Text — Model Building and Clustering — ThinkNavi User Manual

Chapter 11: ConceptMap-Text — Model Building and Clustering

11.1 Step 4: Feature Settings

A step for adjusting each dimension’s weight (influence). Emphasize specific analysis axes or dampen noisy dimensions.

Operations

Sliders are displayed for each dimension:

Parameter	Range	Default	Description
Dimension Weight	0.0-2.0	1.0	Each dimension’s influence on model building

Slider right (>1.0): Emphasize this dimension. Differences along this axis are amplified
Slider left (<1.0): De-emphasize this dimension. Differences along this axis are reduced
0.0: Completely ignore this dimension

Steps:

Adjust sliders as needed
Defaults (all 1.0) are usually sufficient
“Reset All” button restores all dimensions to 1.0
“Save Weights” button confirms settings

Usage Examples:

Want to emphasize “Theory ↔ Practice” dimension → Set its weight to 1.5
A dimension seems noisy and meaningless → Set its weight to 0.3
Want all dimensions equally weighted → Keep defaults

11.2 Step 5: Model Building

Learn the concept network using the GNG (Growing Neural Gas) algorithm. GNG is a neural network algorithm that adaptively places nodes (concept representative points) according to data distribution.

Parameter List

Basic Parameters:

Parameter	Range	Default	Description
Max Nodes	10-500	Data count × 0.6	Maximum nodes GNG places. More creates finer models but may add noise
Max Iterations	100-50,000	4,000	Number of learning iterations. More ensures convergence but increases processing time
Lambda	1-200	20	Interval for inserting new nodes. Smaller → more frequent insertion → more nodes
Max Age	5-200	50	Threshold for deleting unused edges. Smaller → more aggressive pruning → sparser network

Algorithm Selection:

Algorithm	Description	Recommended Use
Default (GNG)	Standard GNG. Hard assignment (each data assigned to nearest single node)	Normal use (recommended)
Fuzzy	Fuzzy membership. Each data probabilistically belongs to multiple nodes	Data with ambiguous boundaries
Enhanced Fuzzy	Extended version with repulsion and merge features	Advanced analysis

Fuzzy Additional Parameters:

Parameter	Range	Default	Description
Temperature	0.1-2.0	0.5	Fuzziness. Higher → membership more evenly distributed

Enhanced Fuzzy Additional Parameters:

Parameter	Description
Temperature End	Temperature value at the end of learning
Fuzzifier	Fuzzy membership function parameter
Repulsion Beta	Repulsion force strength between nodes
Merge Epsilon	Merge threshold for nodes too close together
Inertia Alpha	Node movement inertia

Steps

Set parameters (defaults are usually sufficient)
Click “Build Model”
A progress bar is displayed during building
After completion, a 2D preview of the built network is shown

2D Preview Guide:

Each circle is a GNG node. Larger nodes have more assigned data
Lines between nodes are MST (Minimum Spanning Tree) edges
Dropdown selects X-axis and Y-axis dimensions to view the network from different angles

Parameter Tuning Guidelines:

Too many coarse nodes → Reduce Max Nodes
Too few broad nodes → Increase Max Nodes
Rule of thumb: Set Max Nodes to 50-70% of data row count
50 rows of data → Max Nodes 25-35
200 rows of data → Max Nodes 100-140

Credit Cost: 20 credits for model building (10 credits with your own API key)

11.3 Step 6: Clustering

Classify the built GNG nodes into thematic groups (clusters).

Clustering Methods

Method	Description	Features
Ward	Hierarchical agglomerative. Minimizes within-cluster variance	Considers MST structure. Most stable. Recommended
K-Means	Centroid-based clustering	Fast. Suited for spherical clusters
HDBSCAN	Density-based. Auto-detects number of clusters	Suited for irregular cluster shapes
Hierarchical	General hierarchical clustering	Suited for dendrogram analysis
DBSCAN	Density-based. Detects noise	Suited for data with many outliers

Settings

Setting	Description
Number of Clusters	Specify via dropdown. Auto-recommended value also shown
Strict Connectivity	When checked, only MST-connected nodes can belong to the same cluster (Ward only)
Min Cluster Size	Minimum nodes per cluster (HDBSCAN / DBSCAN)
EPS	Density threshold (DBSCAN)

Cluster Count Guidelines:

30 nodes or fewer → 3-5 clusters
30-100 nodes → 5-8 clusters
100+ nodes → 7-12 clusters

Steps

Select a clustering method
Specify number of clusters (or use auto-recommended value)
Click “Run Clustering”
Resulting clusters are color-coded in the preview

Cluster Labeling

Auto-Labeling:

Click “Auto Label”
AI analyzes data assigned to each cluster’s nodes and suggests theme names
Suggested labels auto-fill each cluster’s input field

Manual Labeling:

Enter custom names directly in each cluster’s input field

Label Examples:

“Technological Innovation and Implementation Challenges”
“User-Centered Design”
“Organizational Culture and Change Management”
“Market Trends and Business Models”

11.4 Troubleshooting

Issue	Cause and Solution
Too many nodes, hard to read	Reduce Max Nodes in Step 5 and rebuild
Cluster assignments don’t match intuition	Change the number of clusters and re-run. Try methods other than Ward (K-Means, HDBSCAN)
Auto-labels are off-target	AI suggestions are reference only. Manually correct based on your understanding of the data
“Model Build” takes too long	Max Iterations may be too high. 4000 is sufficient in most cases
Going back and re-running clears later steps	By design. Re-running from dimension reduction resets feature settings and everything after. Save important results beforehand

11.5 Model Building Tips — Understanding and Optimizing the Pipeline

ConceptMap-Text model building follows this pipeline. Understanding each step’s role and impact helps build better models.

Pipeline Overview

Text Data
  ↓ OpenAI Embedding (1536-3072 dimensions)
  ↓ UMAP Dimension Reduction (3-8 dimensions)
  ↓ GNG Learning (places nodes in reduced-dimension space)
  ↓ MST Connection (connects nodes via minimum spanning tree)
  ↓ Clustering (classifies nodes into groups)

The key point is that GNG-MST learns in the multi-dimensional space (3-8 dimensions) after UMAP dimension reduction. The 3D graph in the Explorer selects 3 dimensions for display, but the model itself is built with more dimensions.

UMAP’s Role and Impact

UMAP is not just preprocessing — it determines the very structure of the space GNG learns in. Changing UMAP parameters:

Changes distance relationships between concepts → Changes GNG node placement
Changes cluster separation → Changes clustering results

This means the same data can produce different models depending on UMAP settings.

Practical UMAP Parameter Guidelines:

Goal	n_neighbors	min_dist	Effect
Default	15	0.1	Good for most cases
Clearly separate clusters	5-10	0.01-0.05	Emphasizes local structure. Small concept groups separate easily
Uniform distribution	20-30	0.3-0.5	GNG nodes spread evenly across the space
Emphasize global structure	30-50	0.3-0.5	Capture overall trends. See the big picture of concepts

Why uniform distribution matters: With default settings, semantically different concepts may be pulled extremely far apart. This causes GNG nodes to cluster in dense areas while leaving distant regions uncovered. Increasing min_dist or n_neighbors makes the space more uniform, allowing GNG to cover the entire dataset more evenly.

Node Count Considerations

GNG’s Max Nodes setting determines the model’s “granularity.”

Fewer nodes (about 1/3 of data count):

Easier to grasp the “big picture” of knowledge
Similar concepts merge into one node, creating a “summary-like” model
Easier to see relationships between clusters
Suitable for publishing as Mindware (coarser concept granularity leads to more natural conversations in chat)

More nodes (close to or exceeding data count):

Can distinguish fine conceptual differences
Useful for research where nuance matters
However, differences between nodes become small, making interpretation harder

Difference from SOM (Self-Organizing Maps): In SOM, even with more nodes than data points, nodes spread evenly across the map and empty nodes form smooth gradients. In GNG-MST, nodes gather where data exists. This is GNG’s correct behavior, directly reflecting “the structure the data speaks.” While MST visualization may show nodes clustering locally, each node’s values are distinct.

Criteria for a Good Model

There is no theoretical formula for the “optimal number of nodes.” Instead, use these practical criteria:

If clusters in the Explorer view have intuitively understandable meanings and the differences between clusters can be clearly explained, the model is appropriate.

Specific checkpoints:

Cluster interpretability: Can you assign natural theme names to each cluster?
Inter-cluster distinction: Do adjacent clusters have different themes?
Node representativeness: Is the data assigned to each node consistent with its position (dimension values)?
Overall coverage: Are important themes from the original data reflected in the model?

Iterative Improvement Process

The optimal model may not be achieved on the first try. The following iterative process is recommended:

Build with default settings first — Grasp the overall picture
Cluster and explore — Check model granularity and structure
Adjust as needed:
Clusters too large → Increase node count or decrease UMAP n_neighbors
Too many nodes, hard to interpret → Decrease node count
Certain concept groups don’t separate → Decrease UMAP min_dist
“Save” to preserve results — Save and compare results with different settings

Always save important results before changing parameters. Going back and re-running resets subsequent steps.

Chapter 11: ConceptMap-Text — Model Building and Clustering