Chapter 8: Concept Extraction

Chapter 8: Concept Extraction — ThinkNavi User Manual

Chapter 8: Concept Extraction

Concept Extraction automatically extracts concept structures (concept names, logical structures, implications, etc.) from text data using AI. It’s the primary data preparation method for building models with ConceptMap-Text.

8.1 Concept Extraction Overview

Text (books, papers, manuals, etc.) is split into “chunks” (fragments), and AI extracts the following concept elements from each chunk:

Common Fields (Full / Compact modes):

Field Name	Description	Bilingual
chapter_section	Chapter/section name	No
key_terms	Domain-specific keywords (5-10)	No
concept_name	Short label capturing the essence of the concept (5-15 words)	Yes
trigger_context	Human challenge/need addressed by this concept (15-30 words)	Yes
abstract_structure	Logical mechanism expressed with variables A, B, C (15-40 words)	Yes
action_implication	Action suggestion in “If X then Y” format (15-30 words)	Yes

Full Mode Additional Fields:

Field Name	Description	Bilingual
summary	Faithful summary of text content (120-180 words)	Yes
logical_structure	Causal relationships and reasoning chains (120-180 words)	Yes
implication	Conclusions, applications, consequences (120-180 words)	Yes

Each field is generated in both English (_en) and Japanese (_jp) versions (except chapter_section and key_terms).

8.2 Text Input

Direct Input:

Select the “Concept Extraction” tab in the left sidebar
Paste text into the text area
Enter book title and author name (used as CSV metadata)

File Upload:

Upload a PDF using the “Select File” button
Text is automatically extracted from the PDF
PDF page numbers are recorded for each chunk

Supported Formats: PDF

8.3 Chunk Splitting Settings

Configure how text is split.

Chunk Mode Selection:

By Number of Chunks: Splits the entire text into a specified number of equal parts. Example: 30 chunks splits text into 30 equal parts
By Chunk Size: Splits at specified character intervals. Example: 4000 characters splits roughly every 4000 characters

How Chunk Splitting Works (internal process):

For PDFs, first split by page boundaries
Further split by paragraph breaks (blank lines)
If no paragraphs exist, split by line breaks
If no line breaks, split at sentence endings (。.!?)
Split paragraphs are combined into chunks using the specified size as a target
If the last chunk is too small (below 20% of target), it’s merged with the previous chunk

Recommended Settings:

Full book (300 pages): 30-50 chunks
Single paper (20 pages): 5-10 chunks
Manual (100 pages): 20-30 chunks

8.4 Extraction Settings

Include Source Text: When checked, the original text fragments (source_text column) are included in the output CSV. Useful when you want to reference original text during model building.

Extraction Mode:

Full: Detailed analysis including summary, logical structure, and implications. Approximately 500-800 tokens per chunk response
Compact: Only concept_name, trigger_context, abstract_structure, action_implication. Faster and lower cost

8.5 Browser Processing — Real-time Extraction

Run concept extraction in real-time in the browser.

Steps:

Complete text input, chunk settings, and extraction settings
Click the “Start Extraction” button
Each chunk’s processing is displayed in real-time
After completion, the results table is shown
Save results with “Save as CSV”

Note: The browser must remain open. Closing the browser interrupts processing.

Progress Display:

Currently processing chunk number (e.g., “Processing chunk 15/30…”)
Preview of processed chunks
Estimated remaining time

8.6 Server Processing — Background Extraction

Process on the server side in the background. Processing continues even if you close the browser.

Steps:

Complete text input, chunk settings, and extraction settings
Click the “Process on Server” button
A job starts and a job ID is displayed
Progress is polled automatically every 5 seconds
After completion, the result CSV is automatically saved to the project

Progress Display:

Status badge: “Processing” / “Completed” / “Failed”
Progress: “15/30 chunks processed”
Estimated remaining time

Benefits of Server Processing:

Processing continues even if you close the browser
Work on other tasks in parallel
More resilient to network disconnections (server retries)

8.7 Error Handling and Retry

Concept extraction internally handles errors as follows:

Retry: Each chunk is retried up to 2 times
Consecutive Error Limit: If 3 consecutive chunks fail, the entire job stops as failed
Credit Check: Credit balance is checked before starting and before each chunk. If credits run out mid-process, the CSV is saved up to the processed chunks and the rest stops
Partial Result Saving: Even when interrupted by errors, the CSV for successfully processed chunks is saved

8.8 Extraction Result CSV Structure

Output CSV column structure:

Column	Description
book	Entered book title
author	Entered author name
chunk_index	Chunk number (0-based)
page_number	Page number for PDFs
source_text	Original text fragment (when “Include Source Text” is on)
chapter_section	Chapter/section name
key_terms	Key terms (comma-separated)
concept_name_en	Concept name (English)
concept_name_jp	Concept name (Japanese)
trigger_context_en	Trigger context (English)
trigger_context_jp	Trigger context (Japanese)
abstract_structure_en	Abstract structure (English)
abstract_structure_jp	Abstract structure (Japanese)
action_implication_en	Action implication (English)
action_implication_jp	Action implication (Japanese)
summary_en	Summary (English, Full mode only)
summary_jp	Summary (Japanese, Full mode only)
logical_structure_en	Logical structure (English, Full mode only)
logical_structure_jp	Logical structure (Japanese, Full mode only)
implication_en	Implication (English, Full mode only)
implication_jp	Implication (Japanese, Full mode only)

chapter_section Auto-fill: Empty chapter_section cells are automatically filled with the previous chunk’s chapter_section value. Placeholders like “N/A” and “None” are automatically removed.

8.9 Concept Extraction Troubleshooting

Issue	Cause and Solution
Only 1 chunk created	Text has no paragraph breaks (blank lines), so the entire text becomes one chunk. Switch to “By Number of Chunks” mode
Japanese extraction results are unnatural	Depends on AI model response quality. Building models based on the English versions of concept names and trigger contexts is recommended
“Credits exhausted” stops processing midway	Credits are insufficient. Purchase additional credits and reprocess the remaining chunks. Processed CSVs have been saved
PDF text not correctly extracted	Image-based PDFs (scanned documents) cannot have text extracted. Use OCR-processed PDFs
Server processing status shows “Failed”	Check the error message. For temporary API errors, wait and re-run
“Confirm: Unsaved items” dialog	Current extraction results have not been saved to CSV. “Continue” to discard and start new extraction, or “Cancel” to go back and save

Chapter 8: Concept Extraction