DocMaster — User Guide

1

Upload Your Documents

Head to the live demo and drag-and-drop your documents into the upload area. DocMaster will automatically parse each document, extract its structure, and build a semantic index — all in the background.

What happens behind the scenes: Each document is parsed with MinerU to extract headings, paragraphs, figures, and tables. A hierarchical document tree is built, embeddings are computed, and a FAISS index is created for fast retrieval.

Tip: You can upload multiple documents at once. The system handles collections of up to 50 MB per file.

2

Explore Document Structure

Select any uploaded document from the dropdown in the left panel and click Load Tree. The interactive tree viewer reveals how DocMaster understands your document's hierarchy.

What you'll see: A visual tree showing sections, subsections, text blocks, figures, and tables — each color-coded by type.

Interact: Click any node to expand its content, summary, and metadata. Use the toggle arrows to explore branches.

3

Write a Filter Condition

In the right panel, describe what you're looking for in plain English. Think of it as asking: "Which of my documents match this description?"

Example conditions you can try:

"Find contracts with non-compete clauses longer than 12 months and jurisdiction outside California"
"Identify companies mentioning declining margins AND supply chain disruption in the same reporting period"
"Papers proposing retrieval-augmented generation (RAG) methods with efficiency improvements over dense retrieval"

Click Filter All Documents and watch the system evaluate each document using three complementary strategies: Document Tree traversal, Hyperedge search, and a Combined approach.

4

Review Your Results

The results panel shows you exactly which documents matched — and how they matched. Three summary cards display match counts per strategy, followed by a detailed per-document breakdown.

Understanding the strategies:

Document Tree — traverses the hierarchy top-down, pruning irrelevant branches early (fast, structure-aware)
Hyperedge — finds cross-section semantic relationships that tree traversal might miss (deep, relation-aware)
Combined — fuses both signals for the most reliable result

Token Usage: Check the metrics section to see how many LLM tokens each strategy consumed — you'll notice tree-based pruning is significantly more efficient.

5

Ask Questions with RAG Q&A

Now that you've identified relevant documents, use the built-in chat to ask follow-up questions. DocMaster retrieves the most relevant passages and generates answers with source citations.

Query scope: Toggle between All documents or Matched documents only to focus your questions on filtered results.

Pro tip: Try asking comparative questions like "What methods are used across the matched papers?" to leverage the multi-document context.

Keyboard shortcut: Press Enter to send, Shift+Enter for a new line.

Get Started with DocMaster

Demo System Layout

Five Steps to Document Intelligence

Upload Your Documents

Explore Document Structure

Write a Filter Condition

Review Your Results

Ask Questions with RAG Q&A

Advanced Settings

Filter Top K

RAG Top K

Ready to get started?