DocMaster proceeds through four stages: (1) document parsing, (2) document tree construction, (3) semantic index construction, and (4) filtering and retrieval.
From document upload to semantic filtering and RAG-powered question answering.
The user uploads AI papers; each PDF is parsed into a hierarchical document tree, enriched with semantic indices (PC-KMeans clusters and hyper-edges), and queried through tri-modal retrieval for filtering. The user then asks follow-up questions about the filtered papers.
The DocMaster web interface: users can issue natural-language filter queries, explore the document-tree index, tune hyperparameters, and compare filtering results side by side.