Selected research publications spanning intelligent assistants, knowledge graphs, and data integration.
For a full list, see my
Google Scholar and
DBLP profiles.
Knowledge Graphs
Taxonomy & Ontology
Xinyang Zhang, Chenwei Zhang, Xian Li, Xin Luna Dong, Jingbo Shang, Christos Faloutsos, Jiawei Han.
OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision.WebConf, 2022.
[Link]
Xinyang Zhang, Chenwei Zhang, Xin Luna Dong, Jingbo Shang, Jiawei Han.
Minimally-Supervised Structure-Rich Text Categorization via Learning on Text-Rich Networks.WebConf, 2021.
[Link]
Xin Luna Dong, Xiang He, Andrey Kan, Xian Li, Yan Liang, Jun Ma, Yifan Ethan Xu, Chenwei Zhang, Tong Zhao, Gabriel Blanco Saldana, Saurabh Deshpande, Alexandre Michetti Manduca, Jay Ren, Surender Pal Singh, Fan Xiao, Haw-Shiuan Chang, Giannis Karamanolakis, Yuning Mao, Yaqing Wang, Christos Faloutsos, Andrew McCallum, Jiawei Han.
AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types.KDD, 2020.
[Link]
Knowledge Extraction
Rongmei Lin, Xiang He, Jie Feng, Nasser Zalmout, Yan Liang, Li Xiong, Xin Luna Dong.
PAM: Understanding Product Images in Cross Product Category Attribute Extraction.SigKDD, 2021.
[Link]
Jun Yan, Nasser Zalmout, Yan Liang, Christan Grant, Xiang Ren, Xin Luna Dong.
AdaTag: Multi-Attribute Value Extraction from Product Profiles with Adaptive Decoding.ACL, 2021.
[Link]
Daheng Wang, Prashant Shiralkar, Colin Lockard, Binxuan Huang, Xin Luna Dong, Meng Jiang.
TCN: Table Convolutional Network for Web Table Interpretation.WebConf, 2021.
[Link]
Giannis Karamanolakis, Jun Ma, Xin Luna Dong.
TXtract: Taxonomy-Aware Knowledge Extraction for Thousands of Product Categories.ACL, 2020.
[Link]
Colin Lockard, Prashant Shiralkar, Hannaneh Hajishirzi, Xin Luna Dong.
ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages.ACL, 2020.
[Link]
Colin Lockard, Prashant Shiralkar, Xin Luna Dong.
OpenCeres: When Open Information Extraction Meets the Semi-Structured Web.NAACL, 2019.
[PDF]
Xiaolan Wang, Xin Luna Dong, Yang Li, Alexandra Meliou.
MIDAS: Finding the Right Web Sources to Fill Knowledge Gaps.ICDE, 2019.
[PDF][Slides]
Colin Lockard, Xin Luna Dong, Arash Einolghozati, Prashant Shiralkar.
Ceres: Distantly Supervised Relation Extraction from the Semi-Structured Web.VLDB, 2018.
[Link]
Guineng Zheng, Subhabrata Mukherjee, Xin Luna Dong, Feifei Li.
OpenTag: Open Attribute Value Extraction from Product Profiles.SigKDD, 2018.
[Link]
Disheng Qiu, Luciano Barbosa, Xin Luna Dong, Yanyan Shen, Divesh Srivastava.
DEXTER: Large-Scale Discovery and Extraction of Product Specifications on the Web.VLDB, 2016.
[PDF]
Knowledge Integration
Di Jin, Bunyamin Sisman, Hao Wei, Xin Luna Dong, Danai Koutra.
Deep Transfer Learning for Multi-Source Entity Linkage via Domain Adaptation.VLDB, 2022.
[Link]
Zhengbao Jiang, Jialong Han, Bunyamin Sisman, Xin Luna Dong.
CoRI: Collective Relation Integration with Data Augmentation for Open Information Extraction.ACL, 2021.
[Link]
Zhengyang Wang, Bunyamin Sisman, Hao Wei, Xin Luna Dong, Shuiwang Ji.
CorDEL: A Contrastive Deep Learning Approach for Entity Linkage.ICDM, 2020.
[Link]
Wei Zhang, Hao Wei, Bunyamin Sisman, Xin Luna Dong, Christos Faloutsos, David Page.
AutoBlock: A Hands-Off Blocking Framework for Entity Matching.WSDM, 2020.
[Link]
Varun R. Embar, Bunyamin Sisman, Hao Wei, Xin Luna Dong, Christos Faloutsos, Lise Getoor.
Contrastive Entity Linkage: Mining Variational Attributes from Large Catalogs for Entity Linkage.AKBC, 2020.
[Link]
Dongxu Zhang, Subhabrata Mukherjee, Colin Lockard, Xin Luna Dong, Andrew McCallum.
OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference.NAACL, 2019.
[Link]
Rakshit Trivedi, Bunyamin Sisman, Jun Ma, Christos Faloutsos, Hongyuan Zha, Xin Luna Dong.
LinkNBed: Multi-Graph Representation Learning with Entity Linkage.ACL, 2018.
[Link]
Knowledge Fusion, Cleaning & Evaluation
Kewei Cheng, Xian Li, Zhengyang Wang, Chenwei Zhang, Binxuan Huang, Yifan Xu, Xin Luna Dong, Yizhou Sun.
Tab-Cleaner: Weakly Supervised Tabular Data Cleaning via Pre-Training for E-Commerce Catalog.ACL, 2023.
[Link]
Yaqing Wang, Yifan Ethan Xu, Xian Li, Xin Luna Dong, Jing Gao.
Automatic Validation of Textual Attribute Values in eCommerce Catalog by Learning with Limited Labeled Data.KDD, 2020.
[Link]
Junyang Gao, Xian Li, Yifan Ethan Xu, Bunyamin Sisman, Xin Luna Dong, and Jun Yang.
Efficient Knowledge Graph Accuracy Evaluation.VLDB, 2019.
[Link]
Furong Li, Xin Luna Dong, Anno Largen, and Yang Li.
Knowledge Verification for Long Tail Verticals.VLDB, 2017.
[PDF][Report]
Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, and Wei Zhang.
Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources.VLDB, 2015.
[PDF][Slides]
Xiaolan Wang, Xin Luna Dong, Alexandra Meliou.
Data X-Ray: A Diagnostic Tool for Data Errors.Sigmod, 2015.
[PDF][Slides][Demo]
Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang.
Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion.SIGKDD, 2014.
[PDF]
Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Kevin Murphy, Shaohua Sun, and Wei Zhang.
From Data Fusion to Knowledge Fusion.VLDB, 2014.
[PDF][Slides]
Knowledge Mining & Search
Liqiang Xiao, Jun Ma, Xin Luna Dong, Pascual Martinez-Gomez, Nasser Zalmout, Wei Chen, Tong Zhao, Hao He, Yaohui Jin.
End-to-End Conversational Search for Online Shopping with Utterance Transfer.EMNLP, 2021.
[Link]
Yikun Xian, Tong Zhao, Jin Li, Jim Chan, Andrey Kan, Jun Ma, Xin Luna Dong, Christos Faloutsos, George Karypis, S. Muthukrishnan, Yongfeng Zhang.
EX3: Explainable Attribute-Aware Item-Set Recommendations.RecSys, 2021.
[Link]
Junheng Hao, Tong Zhao, Jin Li, Xin Luna Dong, Christos Faloutsos, Yizhou Sun, Wei Wang.
P-Companion: A Principled Framework for Diversified Complementary Product Recommendation.CIKM, 2020.
[Link]
Namyong Park, Andrey Kan, Christos Faloutsos, Xin Luna Dong.
J-Recs: Principled and Scalable Recommendation Justification.ICDM, 2020.
[Link]
Namyong Park, Andrey Kan, Tong Zhao, Christos Faloutsos, Xin Luna Dong.
MultiImport: Inferring Node Importance in a Knowledge Graph from Multiple Input Signals.SigKDD, 2020.
[Link]
Namyong Park, Andrey Kan, Xin Luna Dong, Tong Zhao, and Christos Faloutsos.
Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks.SigKDD, 2019.
[Link]
Qi Song, Yinghui Wu, and Xin Luna Dong.
Mining Summaries for Knowledge Graph Search.ICDM, 2016.
[PDF]
Tim Althoff, Xin Luna Dong, Kevin Murphy, Safa Alai, Van Dang, and Wei Zhang.
TimeMachine: Timeline Generation for Knowledge-Base Entities.SIGKDD, 2015.
[PDF][Slides]
Data Integration
Big Data Integration
Theodoros Rekatsinas, Xin Luna Dong, Lise Getoor, Divesh Srivastava.
Finding Quality in Quantity: The Challenge of Discovering Valuable Sources for Integration.CIDR, 2015.
[PDF][Slides]
Theodoros Rekatsinas, Xin Luna Dong, Divesh Srivastava.
Characterizing and Selecting Fresh Data Sources.VLDB, 2014.
[PDF][Slides]
Xin Luna Dong, Barna Saha, and Divesh Srivastava.
Less Is More: Selecting Sources Wisely for Integration.VLDB, 2013.
[PDF][Report]
Mariam Salloum, Xin Luna Dong, Divesh Srivastava, Vassilis J. Tsotras.
Online Ordering of Overlapping Data Sources.VLDB, 2014.
[PDF][Slides]
Data Fusion
Ravali Pochampally, Anish Das Sarma, Xin Luna Dong, Alexandra Meliou, and Divesh Srivastava.
Fusing Data with Correlations.Sigmod, 2014.
[PDF][Slides]
Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, and Divesh Srivastava.
Scaling Up Copy Detection.ICDE, 2015.
[PDF][Report]
Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, and Divesh Srivastava.
Truth Finding on the Deep Web: Is the Problem Solved?VLDB, 2013.
[PDF][Report][Slides]
Xin Luna Dong and Divesh Srivastava.
Compact Explanation of Data Fusion Decisions.WWW, 2013.
[PDF][Report][Slides]
Xuan Liu, Xin Luna Dong, Beng Chin Ooi, and Divesh Srivastava.
Online Data Fusion.VLDB, 2011.
[PDF][Slides]
Anish Das Sarma, Xin Luna Dong, Alon Halevy.
Data Integration with Dependent Sources.EDBT, 2011.
[PDF][Slides]
Xin Luna Dong, Laure Berti-Equille, Yifan Hu, and Divesh Srivastava.
Global Detection of Complex Copying Relationships Between Sources.VLDB, 2010.
[PDF][Slides]
Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava.
Truth Discovery and Copying Detection in a Dynamic World.VLDB, 2009.
[PDF][Slides]
Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava.
Integrating Conflicting Data: The Role of Source Dependence.VLDB, 2009.
[PDF][Slides]
Laure Berti-Equille, Anish Das Sarma, Xin Luna Dong, Amelie Marian, and Divesh Srivastava.
Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence.CIDR, 2009.
[PDF][Slides]
Record Linkage
Wenfei Fan, Zhe Fan, Chao Tian, and Xin Luna Dong.
Keys for Graphs.VLDB, 2015.
[PDF][Slides]
Pei Li, Xin Luna Dong, Songtao Guo, Andrea Maurino, and Divesh Srivastava.
Robust Group Linkage.WWW, 2015.
[PDF][Slides][Report]
Anja Gruenheid, Xin Luna Dong, and Divesh Srivastava.
Incremental Record Linkage.VLDB, 2014.
[PDF][Report][Slides]
Pei Li, Xin Luna Dong, Andrea Maurino, and Divesh Srivastava.
Linking Temporal Records.VLDB, 2011.
[PDF][Slides][Journal]
Songtao Guo, Xin Luna Dong, Divesh Srivastava, and Remi Zajac.
Record Linkage with Uniqueness Constraints and Erroneous Values.VLDB, 2010.
[PDF][Slides]
Xin Dong, Alon Y. Halevy, and Jayant Madhavan.
Reference Reconciliation in Complex Information Spaces.SIGMOD, 2005.
[PDF][Slides]
Schema Mapping & Dataspaces
Anish Das Sarma, Xin Dong, and Alon Y. Halevy.
Bootstrapping Pay-As-You-Go Data Integration Systems.SIGMOD, 2008.
[PDF]
Xin Dong, Alon Y. Halevy, and Cong Yu.
Data Integration with Uncertainties.VLDB, 2007.
Best Papers of VLDB 2007[PDF][Slides][Journal]
Xin Dong and Alon Y. Halevy.
Indexing Dataspaces.SIGMOD, 2007.
[PDF][Slides]
Xin Dong and Alon Y. Halevy.
A Platform for Personal Information Management and Integration.CIDR, 2005.
[PDF][Slides]
Xin Dong, Alon Y. Halevy, Jayant Madhavan, Ema Nemes, and Jun Zhang.
Similarity Search for Web Services.VLDB, 2004.
[PDF][Slides]
Xin Dong, Alon Y. Halevy, and Igor Tatarinov.
Containment of Nested XML Queries.VLDB, 2004.
[PDF][Slides][Tech-report]