VLDB 2026 — Demo Track

DocMaster: A Hierarchical Structure-Aware System for Document Analysis

We present DocMaster, a hierarchical structure-aware document analysis system. DocMaster parses documents into hierarchical document trees preserving original layouts and constructs a structure-aware semantic index that enables accurate document filtering and in-depth analysis.

Key Features

DocMaster combines structural and semantic analysis to deliver efficient, accurate document analysis.

Document Tree Search

Builds a hierarchical tree from structural elements. The LLM traverses top-down, pruning irrelevant branches to minimise token usage.

Hyperedge Search

Extracts cross-chunk semantic relationships as hyperedges. FAISS-based retrieval finds the most relevant hyperedges for relation-aware filtering.

Combined Strategy

Aggregates evidence from both strategies. Fusing structural and relational signals achieves higher precision and recall than either alone.

Token-Efficient Design

Tree-traversal prunes irrelevant subtrees early, significantly reducing LLM token consumption compared to naive full-document retrieval.

RAG Q&A Integration

After filtering, perform retrieval-augmented Q&A over matched documents using the same indexed embeddings for seamless document intelligence.

Multi-Document Collections

Upload an entire collection of documents. The system processes, indexes, and evaluates the filter condition across all documents in a single query.

System at a Glance

From document ingestion to semantic filtering and RAG-powered analysis.

Explore More