System Architecture & Pipeline

DocMaster proceeds through four stages: (1) document parsing, (2) document tree construction, (3) semantic index construction, and (4) filtering and retrieval.

Workflow Pipeline

From document upload to semantic filtering and RAG-powered question answering.

Workflow pipeline of DocMaster

Index Construction

The user uploads AI papers; each PDF is parsed into a hierarchical document tree, enriched with semantic indices (PC-KMeans clusters and hyper-edges), and queried through tri-modal retrieval for filtering. The user then asks follow-up questions about the filtered papers.

Index construction pipeline

Demo System Layout

The DocMaster web interface: users can issue natural-language filter queries, explore the document-tree index, tune hyperparameters, and compare filtering results side by side.

Demo system layout