Get Started with DocMaster

DocMaster makes it easy to filter and analyze large document collections using natural language. Follow these five simple steps to go from raw documents to intelligent insights.

Demo System Layout

The DocMaster web interface: users can issue natural-language filter queries, explore the document-tree index, tune hyperparameters, and compare filtering results side by side.

Demo system layout

Five Steps to Document Intelligence

Each step builds on the last. You can complete the entire workflow in under a minute.

1

Upload Your Documents

Head to the live demo and drag-and-drop your documents into the upload area. DocMaster will automatically parse each document, extract its structure, and build a semantic index — all in the background.

What happens behind the scenes: Each document is parsed with MinerU to extract headings, paragraphs, figures, and tables. A hierarchical document tree is built, embeddings are computed, and a FAISS index is created for fast retrieval.

Tip: You can upload multiple documents at once. The system handles collections of up to 50 MB per file.
2

Explore Document Structure

Select any uploaded document from the dropdown in the left panel and click Load Tree. The interactive tree viewer reveals how DocMaster understands your document's hierarchy.

What you'll see: A visual tree showing sections, subsections, text blocks, figures, and tables — each color-coded by type.

Interact: Click any node to expand its content, summary, and metadata. Use the toggle arrows to explore branches.
3

Write a Filter Condition

In the right panel, describe what you're looking for in plain English. Think of it as asking: "Which of my documents match this description?"

Example conditions you can try:
  • "Find contracts with non-compete clauses longer than 12 months and jurisdiction outside California"
  • "Identify companies mentioning declining margins AND supply chain disruption in the same reporting period"
  • "Papers proposing retrieval-augmented generation (RAG) methods with efficiency improvements over dense retrieval"

Click Filter All Documents and watch the system evaluate each document using three complementary strategies: Document Tree traversal, Hyperedge search, and a Combined approach.

4

Review Your Results

The results panel shows you exactly which documents matched — and how they matched. Three summary cards display match counts per strategy, followed by a detailed per-document breakdown.

Understanding the strategies:
  • Document Tree — traverses the hierarchy top-down, pruning irrelevant branches early (fast, structure-aware)
  • Hyperedge — finds cross-section semantic relationships that tree traversal might miss (deep, relation-aware)
  • Combined — fuses both signals for the most reliable result

Token Usage: Check the metrics section to see how many LLM tokens each strategy consumed — you'll notice tree-based pruning is significantly more efficient.
5

Ask Questions with RAG Q&A

Now that you've identified relevant documents, use the built-in chat to ask follow-up questions. DocMaster retrieves the most relevant passages and generates answers with source citations.

Query scope: Toggle between All documents or Matched documents only to focus your questions on filtered results.

Pro tip: Try asking comparative questions like "What methods are used across the matched papers?" to leverage the multi-document context.

Keyboard shortcut: Press Enter to send, Shift+Enter for a new line.

Advanced Settings

Fine-tune the system via the collapsible settings panel on the demo page. Most users won't need to change these.

Filter Top K

Controls how many top chunks are retrieved to make the filter decision. Higher values provide more evidence but use more LLM tokens. Default: 10.

RAG Top K

Controls how many passages are retrieved for Q&A answers. Higher values give broader context. Default: 5.

Ready to get started?

Upload your first document and experience DocMaster's filtering and Q&A in action.

Try the Live Demo