DocMaster: A Hierarchical Structure-Aware System for Document Analysis

Capabilities

Key Features

DocMaster combines structural and semantic analysis to deliver efficient, accurate document analysis.

Builds a hierarchical tree from structural elements. The LLM traverses top-down, pruning irrelevant branches to minimise token usage.

Extracts cross-chunk semantic relationships as hyperedges. FAISS-based retrieval finds the most relevant hyperedges for relation-aware filtering.

Aggregates evidence from both strategies. Fusing structural and relational signals achieves higher precision and recall than either alone.

Tree-traversal prunes irrelevant subtrees early, significantly reducing LLM token consumption compared to naive full-document retrieval.

After filtering, perform retrieval-augmented Q&A over matched documents using the same indexed embeddings for seamless document intelligence.

Upload an entire collection of documents. The system processes, indexes, and evaluates the filter condition across all documents in a single query.

How It Works

From document ingestion to semantic filtering and RAG-powered analysis.