The root cause of RAG/GraphRAG not working properly

Share this post on:

Most of us naturally believe that the world we see is the world as it truly is.

There is a desk in front of us.
There is a dog.
There is a company.
There is a customer.
There is a good product.
There is a bad decision.

When we speak this way, we tend to assume that things such as “desk,” “dog,” “company,” “customer,” “good product,” and “bad decision” exist in the world with clear labels already attached to them.

But if we think more carefully, the matter is not so simple.

Where, for example, does a “desk” begin and end? Is a folding table a desk? If a cardboard box is used as a work surface, is it a desk? If a board is placed on the floor and a computer is put on top of it, is that a desk?

The same applies to “dog.” A Chihuahua and a Saint Bernard look completely different, yet we call both of them dogs. On the other hand, a robot that looks like a dog may not be considered a dog at all.

In other words, when we perceive the world, we are not simply receiving what appears before our eyes. We group together countless differences, treat similar things as belonging to the same category, give them names, and classify them according to use, experience, and purpose.

This act of grouping is what we call a concept.

A concept is not something that exists in the world with a label already attached to it. It is a unit of recognition that human beings create in order to understand, remember, judge, and communicate about the world.

This does not mean that everything is an illusion. Desks, dogs, and companies do exist. But what we call a desk, what we regard as a dog, and where we draw the boundary of a company’s problem are shaped by human perception, culture, experience, and purpose.

Yet most people are not very conscious of this.

We often mistake the concepts we use for the structure of reality itself. We say, “This is the correct classification,” “This is the objective meaning,” or “This is the fact.” But behind many of these judgments lies a human act of interpretation.

This illusion has made AI and data science more difficult to understand.

In machine learning, we attach labels to images, texts, and other data, and then train AI systems on those labels. A cat image is labeled “cat,” a dog image is labeled “dog,” an unwanted email is labeled “spam,” and a positive review is labeled “high rating.”

These labels are often called “ground truth” or “teacher signals.” Because of this terminology, many people feel as if these labels represent objective and absolute truth.

But in many cases, labels are human conceptual judgments.

Is a certain sentence “offensive”?
Is a certain review “positive”?
Is a certain customer a “high-value customer”?
Is a certain business idea “promising”?
Is a certain action a “risk”?

These are not facts that exist in nature in a pure and self-evident form. They depend on what human beings value, the context in which they judge, and the purpose for which they classify.

In this sense, AI is not simply learning objective facts. In many cases, it is learning how human beings divide, name, and give meaning to the world.

This is especially important when we think about today’s large language models.

LLMs learn from vast amounts of text how human beings use words, relate things to one another, and construct meaning. In other words, we may say that LLMs have learned a great deal about the conceptual world of human beings.

However, when we try to use this power in practical business settings, we often rely on embeddings.

An embedding is a technique that converts words, sentences, or documents into numerical vectors. Texts with similar meanings tend to be placed near each other in vector space. This makes it possible for AI systems to search for “documents close to this question” or “information related to this text.”

This is an extremely useful technology. But there is a trap here.

Many people think of embedding space as a “semantic space.” It is true that items located close to one another often have similar meanings. But this represents the closeness of meaning, not the concept itself.

Human concepts are not merely distances.

A concept has a center. It has a periphery. It has exceptions. It has use cases. It has context. It has criteria for judgment. It has history. It has value.

Consider the concept of a “customer.”

Some people may call anyone who has purchased a product once a customer. Others may reserve the word for people with whom there is an ongoing relationship. A customer from the perspective of the sales department, a customer from the perspective of the support team, and a customer from the perspective of management may all mean slightly different things, even though the same word is being used.

Embedding search can find texts that are close to the word “customer.” But by itself, it cannot fully handle questions such as: What does “customer” mean for this particular company? How is a high-value customer different from a one-time buyer? How are customer satisfaction and profitability related?

This is one reason why ordinary RAG systems often fail to work as well as expected.

RAG retrieves fragments of documents that seem relevant to a question and passes them to an LLM to generate an answer. But when retrieval is based mainly on fragments and similarity scores, the AI often fails to grasp the deeper conceptual structure behind the documents.

A human reader can look at multiple documents and understand that “these are discussing the same problem from different angles,” “this case is an exception,” or “this company uses this term in a special way.” But simple vector search tends to treat information as isolated fragments.

In other words, within embedding space, what human beings had organized as concepts is once again broken apart into pieces. Similar items may be located near one another, but deciding where to draw the boundary, how to group them, and what kind of conceptual unit they form requires another level of processing.

GraphRAG is one attempt to address this problem.

GraphRAG extracts people, organizations, places, events, and relationships from documents, and structures them as a graph. This makes it easier to handle connections between pieces of information than with fragment-based search alone.

This is an important step forward. But GraphRAG also has limitations.

What GraphRAG is good at is representing how one thing relates to another. In other words, it structures relationships among existing entities. This is close to an ontological organization of information.

Human concepts, however, are not limited to that.

Human concepts are not only about what exists. They are also about how human beings perceive things, why they group them in a certain way, and what kinds of judgments they use those groupings for.

Take the word “complaint,” for example. In one company, a complaint may be seen as a troublesome objection. In another company, it may be seen as an important clue for product improvement. The event itself may be the same, but the concept through which it is understood can completely change the organization’s behavior.

For AI to truly support human work, it is not enough to retrieve data. Nor is it enough to connect entities in a graph.

What is needed is a way to clarify the concepts through which human beings understand the world.

A concept is a tool for compressing information.
A concept is a criterion for judgment.
A concept is a framework for organizing experience.
A concept is a map through which human beings understand the world.

In the age of AI, the important question is not only whether AI understands meaning. The more important question may be: through what concepts do we human beings understand the world?

This matters because AI is learning human concepts.

But if human beings themselves are not aware of their own concepts, the output of AI will also remain vague. If we mistake teacher signals for objective facts, embedding space for meaning itself, and search results for knowledge, AI will not be able to realize its full potential.

What we need in the next stage of AI use is not simply more data. Nor is it simply larger models.

What we need is a way to design, structure, and update concepts.

How do human beings divide the world? What do they regard as the same? What do they regard as different? What criteria do they use when making decisions? By making these things explicit, AI can become not just a search engine, but a tool that supports human thinking.

The age of AI does not only question the intelligence of machines.

It also questions how deeply we understand our own concepts.

True AI utilization begins there.

Share this post on:

Leave a Reply

Your email address will not be published. Required fields are marked *