Chapter 8: Concept Extraction

Chapter 8: Concept Extraction — ThinkNavi User Manual

Chapter 8: Concept Extraction

Concept Extraction automatically extracts concept structures (concept names, logical structures, implications, etc.) from text data using AI. It’s the primary data preparation method for building models with ConceptMap-Text.

8.1 Concept Extraction Overview

Text (books, papers, manuals, etc.) is split into “chunks” (fragments), and AI extracts the following concept elements from each chunk:

Common Fields (Full / Compact modes):

Field NameDescriptionBilingual
chapter_sectionChapter/section nameNo
key_termsDomain-specific keywords (5-10)No
concept_nameShort label capturing the essence of the concept (5-15 words)Yes
trigger_contextHuman challenge/need addressed by this concept (15-30 words)Yes
abstract_structureLogical mechanism expressed with variables A, B, C (15-40 words)Yes
action_implicationAction suggestion in “If X then Y” format (15-30 words)Yes

Full Mode Additional Fields:

Field NameDescriptionBilingual
summaryFaithful summary of text content (120-180 words)Yes
logical_structureCausal relationships and reasoning chains (120-180 words)Yes
implicationConclusions, applications, consequences (120-180 words)Yes

Each field is generated in both English (_en) and Japanese (_jp) versions (except chapter_section and key_terms).

8.2 Text Input

Direct Input:

  1. Select the “Concept Extraction” tab in the left sidebar
  2. Paste text into the text area
  3. Enter book title and author name (used as CSV metadata)

File Upload:

  1. Upload a PDF using the “Select File” button
  2. Text is automatically extracted from the PDF
  3. PDF page numbers are recorded for each chunk

Supported Formats: PDF

8.3 Chunk Splitting Settings

Configure how text is split.

Chunk Mode Selection:

  • By Number of Chunks: Splits the entire text into a specified number of equal parts. Example: 30 chunks splits text into 30 equal parts
  • By Chunk Size: Splits at specified character intervals. Example: 4000 characters splits roughly every 4000 characters

How Chunk Splitting Works (internal process):

  1. For PDFs, first split by page boundaries
  2. Further split by paragraph breaks (blank lines)
  3. If no paragraphs exist, split by line breaks
  4. If no line breaks, split at sentence endings (。.!?)
  5. Split paragraphs are combined into chunks using the specified size as a target
  6. If the last chunk is too small (below 20% of target), it’s merged with the previous chunk

Recommended Settings:

  • Full book (300 pages): 30-50 chunks
  • Single paper (20 pages): 5-10 chunks
  • Manual (100 pages): 20-30 chunks

8.4 Extraction Settings

Include Source Text: When checked, the original text fragments (source_text column) are included in the output CSV. Useful when you want to reference original text during model building.

Extraction Mode:

  • Full: Detailed analysis including summary, logical structure, and implications. Approximately 500-800 tokens per chunk response
  • Compact: Only concept_name, trigger_context, abstract_structure, action_implication. Faster and lower cost

8.5 Browser Processing — Real-time Extraction

Run concept extraction in real-time in the browser.

Steps:

  1. Complete text input, chunk settings, and extraction settings
  2. Click the “Start Extraction” button
  3. Each chunk’s processing is displayed in real-time
  4. After completion, the results table is shown
  5. Save results with “Save as CSV”

Note: The browser must remain open. Closing the browser interrupts processing.

Progress Display:

  • Currently processing chunk number (e.g., “Processing chunk 15/30…”)
  • Preview of processed chunks
  • Estimated remaining time

8.6 Server Processing — Background Extraction

Process on the server side in the background. Processing continues even if you close the browser.

Steps:

  1. Complete text input, chunk settings, and extraction settings
  2. Click the “Process on Server” button
  3. A job starts and a job ID is displayed
  4. Progress is polled automatically every 5 seconds
  5. After completion, the result CSV is automatically saved to the project

Progress Display:

  • Status badge: “Processing” / “Completed” / “Failed”
  • Progress: “15/30 chunks processed”
  • Estimated remaining time

Benefits of Server Processing:

  • Processing continues even if you close the browser
  • Work on other tasks in parallel
  • More resilient to network disconnections (server retries)

8.7 Error Handling and Retry

Concept extraction internally handles errors as follows:

  • Retry: Each chunk is retried up to 2 times
  • Consecutive Error Limit: If 3 consecutive chunks fail, the entire job stops as failed
  • Credit Check: Credit balance is checked before starting and before each chunk. If credits run out mid-process, the CSV is saved up to the processed chunks and the rest stops
  • Partial Result Saving: Even when interrupted by errors, the CSV for successfully processed chunks is saved

8.8 Extraction Result CSV Structure

Output CSV column structure:

ColumnDescription
bookEntered book title
authorEntered author name
chunk_indexChunk number (0-based)
page_numberPage number for PDFs
source_textOriginal text fragment (when “Include Source Text” is on)
chapter_sectionChapter/section name
key_termsKey terms (comma-separated)
concept_name_enConcept name (English)
concept_name_jpConcept name (Japanese)
trigger_context_enTrigger context (English)
trigger_context_jpTrigger context (Japanese)
abstract_structure_enAbstract structure (English)
abstract_structure_jpAbstract structure (Japanese)
action_implication_enAction implication (English)
action_implication_jpAction implication (Japanese)
summary_enSummary (English, Full mode only)
summary_jpSummary (Japanese, Full mode only)
logical_structure_enLogical structure (English, Full mode only)
logical_structure_jpLogical structure (Japanese, Full mode only)
implication_enImplication (English, Full mode only)
implication_jpImplication (Japanese, Full mode only)

chapter_section Auto-fill: Empty chapter_section cells are automatically filled with the previous chunk’s chapter_section value. Placeholders like “N/A” and “None” are automatically removed.

8.9 Concept Extraction Troubleshooting

IssueCause and Solution
Only 1 chunk createdText has no paragraph breaks (blank lines), so the entire text becomes one chunk. Switch to “By Number of Chunks” mode
Japanese extraction results are unnaturalDepends on AI model response quality. Building models based on the English versions of concept names and trigger contexts is recommended
“Credits exhausted” stops processing midwayCredits are insufficient. Purchase additional credits and reprocess the remaining chunks. Processed CSVs have been saved
PDF text not correctly extractedImage-based PDFs (scanned documents) cannot have text extracted. Use OCR-processed PDFs
Server processing status shows “Failed”Check the error message. For temporary API errors, wait and re-run
“Confirm: Unsaved items” dialogCurrent extraction results have not been saved to CSV. “Continue” to discard and start new extraction, or “Cancel” to go back and save