Three domain-specific scenarios demonstrating DocMaster's semantic filtering pipeline and structure-aware RAG over large document collections.
Domain Legal
Corpus Employment agreements, NDAs, vendor contracts, SLAs
N 1,200
Expected output ~85 docs
Semantic Filter
Tree
0
of 1,200
Hyperedge
0
of 1,200
Combined
0
of 1,200
vendor_agreement_acme.pdf
TreeHyperComb
employment_nda_ny.pdf
TreeHyperComb
sla_contract_texas.pdf
TreeHyperComb
consulting_agreement_fl.pdf
TreeHyperComb
employment_ca_standard.pdf
TreeHyperComb
+ 1,195 more documents
Observation: Hyperedge captures consulting_agreement_fl.pdf missed by Tree traversal — the non-compete duration is in an appendix, not a top-level clause heading.
Follow-up Q&A over 85 filtered documents
Summarize key risk patterns across these contracts.
Most contracts combine long non-compete durations (18–24 months) with employer-favorable jurisdictions (NY, TX). Key patterns:
Broad scope — “industry-wide” restrictions rather than role-specific, increasing enforceability risk
Extended duration — 60% specify 18+ month non-compete periods
Jurisdiction stacking — multiple non-CA jurisdictions referenced to maximize enforceability
Corpus Earnings call transcripts, 10-Ks, analyst reports
N 500
Expected output ~40 docs
Semantic Filter
Tree
0
of 500
Hyperedge
0
of 500
Combined
0
of 500
acme_corp_10K_2024.pdf
TreeHyperComb
globaltech_earnings_q3.pdf
TreeHyperComb
midwest_mfg_analyst.pdf
TreeHyperComb
+ 497 more documents
Observation: Tree misses midwest_mfg_analyst.pdf where margin decline appears in the financial summary and supply chain risks in a separate section. Hyperedge links cross-section evidence.
Follow-up Q&A over 40 filtered documents
What are the main drivers behind margin decline?
Dominant drivers across the matched filings:
Logistics cost increases — freight and shipping up 15–30% YoY
Observation: Tree misses adaptive_retrieval_icml.pdf where the RAG method is in the approach section and the efficiency comparison against dense retrieval is in experiments. Hyperedge links cross-section evidence.
Follow-up Q&A over 120 filtered papers
What techniques are commonly used to improve efficiency?
Common techniques across the matched papers:
Hierarchical indexing — multi-level structures to reduce search space