Chapter 8: Concept Extraction
Concept Extraction automatically extracts concept structures (concept names, logical structures, implications, etc.) from text data using AI. It’s the primary data preparation method for building models with ConceptMap-Text.
8.1 Concept Extraction Overview
Text (books, papers, manuals, etc.) is split into “chunks” (fragments), and AI extracts the following concept elements from each chunk:
Common Fields (Full / Compact modes):
| Field Name | Description | Bilingual |
|---|---|---|
| chapter_section | Chapter/section name | No |
| key_terms | Domain-specific keywords (5-10) | No |
| concept_name | Short label capturing the essence of the concept (5-15 words) | Yes |
| trigger_context | Human challenge/need addressed by this concept (15-30 words) | Yes |
| abstract_structure | Logical mechanism expressed with variables A, B, C (15-40 words) | Yes |
| action_implication | Action suggestion in “If X then Y” format (15-30 words) | Yes |
Full Mode Additional Fields:
| Field Name | Description | Bilingual |
|---|---|---|
| summary | Faithful summary of text content (120-180 words) | Yes |
| logical_structure | Causal relationships and reasoning chains (120-180 words) | Yes |
| implication | Conclusions, applications, consequences (120-180 words) | Yes |
Each field is generated in both English (_en) and Japanese (_jp) versions (except chapter_section and key_terms).
8.2 Text Input
Direct Input:
- Select the “Concept Extraction” tab in the left sidebar
- Paste text into the text area
- Enter book title and author name (used as CSV metadata)
File Upload:
- Upload a PDF using the “Select File” button
- Text is automatically extracted from the PDF
- PDF page numbers are recorded for each chunk
Supported Formats: PDF
8.3 Chunk Splitting Settings
Configure how text is split.
Chunk Mode Selection:
- By Number of Chunks: Splits the entire text into a specified number of equal parts. Example: 30 chunks splits text into 30 equal parts
- By Chunk Size: Splits at specified character intervals. Example: 4000 characters splits roughly every 4000 characters
How Chunk Splitting Works (internal process):
- For PDFs, first split by page boundaries
- Further split by paragraph breaks (blank lines)
- If no paragraphs exist, split by line breaks
- If no line breaks, split at sentence endings (。.!?)
- Split paragraphs are combined into chunks using the specified size as a target
- If the last chunk is too small (below 20% of target), it’s merged with the previous chunk
Recommended Settings:
- Full book (300 pages): 30-50 chunks
- Single paper (20 pages): 5-10 chunks
- Manual (100 pages): 20-30 chunks
8.4 Extraction Settings
Include Source Text: When checked, the original text fragments (source_text column) are included in the output CSV. Useful when you want to reference original text during model building.
Extraction Mode:
- Full: Detailed analysis including summary, logical structure, and implications. Approximately 500-800 tokens per chunk response
- Compact: Only concept_name, trigger_context, abstract_structure, action_implication. Faster and lower cost
8.5 Browser Processing — Real-time Extraction
Run concept extraction in real-time in the browser.
Steps:
- Complete text input, chunk settings, and extraction settings
- Click the “Start Extraction” button
- Each chunk’s processing is displayed in real-time
- After completion, the results table is shown
- Save results with “Save as CSV”
Note: The browser must remain open. Closing the browser interrupts processing.
Progress Display:
- Currently processing chunk number (e.g., “Processing chunk 15/30…”)
- Preview of processed chunks
- Estimated remaining time
8.6 Server Processing — Background Extraction
Process on the server side in the background. Processing continues even if you close the browser.
Steps:
- Complete text input, chunk settings, and extraction settings
- Click the “Process on Server” button
- A job starts and a job ID is displayed
- Progress is polled automatically every 5 seconds
- After completion, the result CSV is automatically saved to the project
Progress Display:
- Status badge: “Processing” / “Completed” / “Failed”
- Progress: “15/30 chunks processed”
- Estimated remaining time
Benefits of Server Processing:
- Processing continues even if you close the browser
- Work on other tasks in parallel
- More resilient to network disconnections (server retries)
8.7 Error Handling and Retry
Concept extraction internally handles errors as follows:
- Retry: Each chunk is retried up to 2 times
- Consecutive Error Limit: If 3 consecutive chunks fail, the entire job stops as failed
- Credit Check: Credit balance is checked before starting and before each chunk. If credits run out mid-process, the CSV is saved up to the processed chunks and the rest stops
- Partial Result Saving: Even when interrupted by errors, the CSV for successfully processed chunks is saved
8.8 Extraction Result CSV Structure
Output CSV column structure:
| Column | Description |
|---|---|
| book | Entered book title |
| author | Entered author name |
| chunk_index | Chunk number (0-based) |
| page_number | Page number for PDFs |
| source_text | Original text fragment (when “Include Source Text” is on) |
| chapter_section | Chapter/section name |
| key_terms | Key terms (comma-separated) |
| concept_name_en | Concept name (English) |
| concept_name_jp | Concept name (Japanese) |
| trigger_context_en | Trigger context (English) |
| trigger_context_jp | Trigger context (Japanese) |
| abstract_structure_en | Abstract structure (English) |
| abstract_structure_jp | Abstract structure (Japanese) |
| action_implication_en | Action implication (English) |
| action_implication_jp | Action implication (Japanese) |
| summary_en | Summary (English, Full mode only) |
| summary_jp | Summary (Japanese, Full mode only) |
| logical_structure_en | Logical structure (English, Full mode only) |
| logical_structure_jp | Logical structure (Japanese, Full mode only) |
| implication_en | Implication (English, Full mode only) |
| implication_jp | Implication (Japanese, Full mode only) |
chapter_section Auto-fill: Empty chapter_section cells are automatically filled with the previous chunk’s chapter_section value. Placeholders like “N/A” and “None” are automatically removed.
8.9 Concept Extraction Troubleshooting
| Issue | Cause and Solution |
|---|---|
| Only 1 chunk created | Text has no paragraph breaks (blank lines), so the entire text becomes one chunk. Switch to “By Number of Chunks” mode |
| Japanese extraction results are unnatural | Depends on AI model response quality. Building models based on the English versions of concept names and trigger contexts is recommended |
| “Credits exhausted” stops processing midway | Credits are insufficient. Purchase additional credits and reprocess the remaining chunks. Processed CSVs have been saved |
| PDF text not correctly extracted | Image-based PDFs (scanned documents) cannot have text extracted. Use OCR-processed PDFs |
| Server processing status shows “Failed” | Check the error message. For temporary API errors, wait and re-run |
| “Confirm: Unsaved items” dialog | Current extraction results have not been saved to CSV. “Continue” to discard and start new extraction, or “Cancel” to go back and save |