VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction
Businesses often need to query visually rich documents (VRDs), e.g., purchase receipts, medical records, and insurance forms, among many other forms from multiple vendors, to make informed decisions. As such, several techniques have been proposed to automatically extract independent entities of interest from VRDs such as extracting price tags from purchase receipts, etc. However, for extracting semantically linked entities, such as finding corresponding price tags for each item, these techniques either have limited capability in handling new layouts, e.g., template-based approaches, or require extensive amounts of pre-training data and do not perform well, e.g., deep-learning approaches.
In this work, we introduce a program synthesis method, namely VRDSynth, to automatically generate programs to extract entity relations from multilingual VRDs. Two key novelties, which empower VRDSynth to tackle flexible layouts while requiring no pre-training data for extracting entity relations, include: (1) a new domain-specific language (DSL) to effectively capture the spatial and textual relations between document entities, and (2) a novel synthesis algorithm that makes use of frequent spatial relations between entities to construct initial programs, equivalent reduction to prune the search space, and a combination of positive, negative, and mutually exclusive programs to improve the coverage of programs.
We evaluate our method on two popular VRD understanding benchmarks, namely FUNSD and XFUND, on the semantic entity linking task, consisting of 1,600 forms in 8 different languages. Experiments show that VRDSynth, despite having no prior pre-training data, outperforms the state-of-the-art pre-trained deep-learning approach, namely LayoutXLM, in 5 out of 8 languages. Noticeably, VRDSynth achieved an improvement of 42% over LayoutXLM in terms of F1 score on FUNSD while being complementary to LayoutXLM in 7/8 languages. Regarding efficiency, VRDSynth significantly improves the memory footprint required for storage and inference over LayoutXLM (1M and 380MB versus that of 1.48GB and 3GB required by LayoutXLM), while maintaining similar time efficiency despite the speed differences between the languages used for implementation (Python vs C++).
Wed 18 SepDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
15:30 - 17:10 | Static Analysis and VerificationTechnical Papers at EI 3 Sahulka Chair(s): Jian Zhang Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences | ||
15:30 20mTalk | Learning to Check LTL Satisfiability and to Generate Traces via Differentiable Trace Checking Technical Papers Weilin Luo Sun Yat-sen University, Pingjia Liang Sun Yat-sen University, Junming Qiu Sun Yat-sen University, Polong Chen Sun Yat-sen University, Hai Wan Sun Yat-sen University, Jianfeng Du Guangdong University of Foreign Studies, Weiyuan Fang Sun Yat-sen University DOI | ||
15:50 20mTalk | Interprocedural Path Complexity Analysis Technical Papers Mira Kaniyur Harvey Mudd College, Ana Cavalcante-Studart Harvey Mudd College, Yihan Yang Harvey Mudd College, Sangeon Park Harvey Mudd College, David Chen Harvey Mudd College, Duy Lam Harvey Mudd College, Lucas Bang Harvey Mudd College DOI | ||
16:10 20mTalk | VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction Technical Papers Thanh-Dat Nguyen University of Melbourne, Tung Do-Viet Cinnamon AI, Hung Nguyen-Duy Independent Researcher, Tuan-Hai Luu Cinnamon AI, Hung Le Deakin University, Xuan-Bach D. Le University of Melbourne, Patanamon Thongtanunam University of Melbourne DOI Pre-print | ||
16:30 20mTalk | Characterizing and Detecting Program Representation Faults of Static Analysis Frameworks Technical Papers Huaien Zhang Hong Kong Polytechnic University; Southern University of Science and Technology, Yu Pei Hong Kong Polytechnic University, Shuyun Liang Southern University of Science and Technology, Zezhong Xing Southern University of Science and Technology, Shin Hwei Tan Concordia University DOI | ||
16:50 20mTalk | API Misuse Detection via Probabilistic Graphical Model Technical Papers Yunlong Ma Beihang University, Wentong Tian Beihang University, Xiang Gao Beihang University, Hailong Sun Beihang University, Li Li Beihang University DOI |