Professional Life:

 

Xin Luna Dong
lunadong@fb.com
9845 Willows Rd.
Redmond, WA 98052
Tel: (650) 788-0228

 


 

Xin Luna Dong

 

I am a Principal Scientist at Meta Reality Labs, leading the ML efforts in building an intelligent personal assistant. We innovate and productionize techniques on contextual AI, multi-modal conversations, search, question answering, recommendation and personalization, knowledge collection and mining.

 

Prior to joining Meta, I spent nearly a decade working on knowledge graphs at Amazon and Google. Before that, I spent another decade working on data integration and cleaning at AT&T Labs and at Univ. of Washington, where I received my Ph.D in Computer Science.

 

I have the great honor to be awarded ACM Fellow, IEEE Fellow, VLDB Women in Database Research Award, and the VLDB Early Career Research Contribution Award for my contributions in "knowledge graph construction and data integration". I am an ACM Distinguished Speaker. You can find an interview of me for People of ACM (2024), a podcast in ACM ByteCast (2024), and an interview at IEEE Industry Leaders in Signal Processing and Machine Learning Series (2021). 

You can find my (possibly out-of-date) C.V. here and resume here.

 


 

Research Areas and Selected Publications

 

Below is a list of my projects and selected papers categorized by research area. You can find the full list of my publications here, my DBLP entry here, and my Google Scholar entry here.

 

Intelligent Assistant

 

  • Projects 

SGWearables AI Assistant We are building AI assistants for wearable devices such as Ray-ban Meta, developing techniques to enable trustworthy, multi-modal, and personalized assistants.
[Tutorial][ Wall Street video] [Talk1] [Talk2]

PGCRAG We are building a comprehensive benchmark for RAG techniques, as an important step towards increasing LLM trustworthiness through RAG (Retrieval Augmented Generation) techniques.
[KDD Cup Competition] [Paper] [Talk] [News 1, 2, 3] [Vision of Dual Neural KG]

  • Tutorials
    • Xin Luna Dong, Seungwhan Shane Moon, Yifan Ethan Xu, Kshitiz Malik, Zhou Yu. Towards next-generation intelligent assistants leveraging LLM techniques. Tutorial in WebConf'2023, KDD'2023. [Website]

     

  • Papers on trustworthy assistants
    • Xiao Yang, Kai Sun, Hao Xin, Yushi Sun, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Yifan Ethan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar, Wen-tau Scott Yih, Xin Luna Dong. CRAG--Comprehensive RAG Benchmark In NeurIPS, 2024. [Link][Hugging Face Daily Papers][Poster]
    • Yushi Sun, Hao Xin, Kai Sun, Yifan Ethan Xu, Xiao Yang, Xin Luna Dong, Nan Tang, Lei Chen. Are Large Language Models A Good Replacement of Taxonomies? In PVLDB, 2024. [Link]
    • Kai Sun, Yifan Ethan Xu, Hanwen Zha, Yue Liu, Xin Luna Dong. Head-to-Tail: How knowledgeable are Large Language Models? A.K.A. Will LLMs replace knowledge graphs? In NAACL, 2024. [Link]

     

  • Papers on multi-modal assistants
    • Xindi Wu, Uriel Singer, Zhaojiang Lin, Xide Xia, Andrea Madotto, Yifan Ethan Xu, Paul A. Crook, Xin Luna Dong, Shane Moon. Corgi: Cached Memory Guided Video Generation. In WACV, 2025. [Link]
    • Jielin Qiu, Andrea Madotoo, Zhaojiang Lin, Paul A. Crook, Yifan Ethan Xu, Xin Luna Dong, Christos Faloutsos, Lei Li, Babak Damavandi, Seungwhan Moon. SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM. In EMNLP, 2024. [Link]
    • Ashish Shenoy, Yichao Lu, Srihari Jayakumar, Debojeet Chatterjee, Mohsen Moslehpour, Pierce Chuang, Abhay Harpale, Vikas Bhardwaj, Di Xu, Shicong Zhao, Longfang Zhao, Ankit Ramchandani, Xin Luna Dong, Anuj Kumar. Lumos : Empowering Multimodal LLMs with Scene Text Recognition. In SigKDD, 2024. [Link]

     

  • Papers on personal memory search and personalization
    • Wang Bill Zhu, Deqing Fu, Kai Sun, Yi Lu, Zhaojiang Lin, Seungwhan Moon, Kanika Narang, Mustafa Canim, Yue Liu, Anuj Kumar, Xin Luna Dong. VisualLens: Personalization through Visual History. In arXiv, 2024. [Link]

     

  • Papers on general voice assistants
    • Zekun Li, Zhiyu Zoey Chen, Mike Ross, Patrick Huber, Seungwhan Moon, Zhaojiang Lin, Xin Luna Dong, Adithya Sagar, Xifeng Yan, Paul A. Crook. Large Language Models as Zero-shot Dialogue State Tracker through Function Calling. In ACL, 2024. [Link]

 

Knowledge collection, fusion, mining, and search

 

  • Position paper: Xin Luna Dong. Generations of Knowledge Graphs: The crazy ideas and the business impact. VLDB, 2023. Invited paper for VLDB Women in Database Research Award. [Paper][Talk]
  • Book: Gerhard Weikum, Xin Luna Dong, Simon Razniewski and Fabian Suchanek. Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases. Barnes&Noble, 2021. [Link]
  • Interview at SEMANTiCS, 2023. [Video]
  • Podcast: Building the Product Knowledge Graph at Amazon. This Week in Machine Learning & AI (TWIML), 2021. [Video]
  • Benchmark: Extended SWDE Benchmark for knowledge extraction from semi-structured websites
  • Benchmark: DI2KG Benchmark for knowledge integration

 

  • Projects 

PGAmazon Product Graph We are building an authoritative knowledge graph for every product in the world, with the goal of answering any question about products and related knowledge.
[Podcast][Amazon Blog 1, Blog 2] [Talk1] [Talk2]

SonyaCeres extracts knowledge from semi-structured websites, which contains huge volume of factual knowledge. It supports both ClosedIE and OpenIE, where the former identifies new facts and new entities, and the latter adds new relationships and even new domains. [Talk][Benchmark]

KVSonyaKnowledge Vault / Knowledge-based Trust—Knowledge fusion and trustworthiness evaluation KV collects knowledge from the Web for building a probabilistic knowledge base. KBT evaluates Web source quality from a new angle--correctness of factual information. [Talk 1][Talk 2][Talk 3]

Quotes from Washington Posts [1, 2, 3]: Still, even the possibility of a search engine that evaluates truth is a pretty incredible breakthrough. And it definitely gives new meaning to the phrase "let me Google that for you."

  • Tutorials
    • [e-Commerce] Nasser Zalmout, Chenwei Zhang, Xian Li, Yan Liang, Xin Luna Dong. All you need to know to build a product knowledge graph. Tutorial in KDD'2021. [Website]
    • Colin Lockard, Prashant Shiralkar, Xin Luna Dong, Hannaneh Hajishirzi. Multi-modal information extraction from text, semi-structured, and tabular data on the web. Tutorial in VLDB'2025, WSDM'2020, ACL'2020, KDD'2020. [Website]
    • Xin Luna Dong, Christos Faloustos, Andrey Kan, Jun Ma, Subhabrata Mukherjee. Graph and tensor mining: for fun and for profit. Tutorial in SigKDD'18. [Website]
    • Xin Luna Dong, Christos Faloustos, Xian Li, Subhabrata Mukherjee, Prashant Shiralkar. Fact checking: theory and practice. Tutorial in SigKDD'18. [Website]
    • Xin Luna Dong and Divesh Srivastava. Knowledge curation and knowledge fusion: challenges, models, and applications. Tutorial in Sigmod'15. [PDF][Presentation]

     

  • Papers on taxonomy and ontology
    • [e-Commerce] Xinyang Zhang, Chenwei Zhang, Xian Li, Xin Luna Dong, Jingbo Shang, Christos Faloutsos, Jiawei Han. OA-Mine: Open-world attribute mining for e-Commerce products with weak supervision. In WebConf, 2022. [Link]
    • [e-Commerce] Xinyang Zhang, Chenwei Zhang, Xin Luna Dong, Jingbo Shang, Jiawei Han. Minimally-supervised structure-rich text categorization via learning on text-rich networks. In WebConf, 2021. [Link]
    • [e-Commerce] Yuning Mao, Tong Zhao, Andrey Kan, Chenwei Zhang, Xin Luna Dong, Christos Faloutsos, Jiawei Han. Octet: Online catalog taxonomy enrichment with self-supervision. In SigKDD, 2020. [Link]
    • [e-Commerce] Xin Luna Dong, Xiang He, Andrey Kan, Xian Li, Yan Liang, Jun Ma, Yifan Ethan Xu, Chenwei Zhang, Tong Zhao, Gabriel Blanco Saldana, Saurabh Deshpande, Alexandre Michetti Manduca, Jay Ren, Surender Pal Singh, Fan Xiao, Haw-Shiuan Chang, Giannis Karamanolakis, Yuning Mao, Yaqing Wang, Christos Faloutsos, Andrew McCallum, Jiawei Han. AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types. In KDD, 2020. [Link]

     

  • Papers on knowledge extraction
    • [e-Commerce] Rongmei Lin, Xiang He, Jie Feng, Nasser Zalmout, Yan Liang, Li Xiong, Xin Luna Dong. PAM: Understanding product images in cross product category attribute extraction. In SigKDD, 2021. [Link]
    • [e-Commerce] Jun Yan, Nasser Zalmout, Yan Liang, Christan Grant, Xiang Ren, Xin Luna Dong. AdaTag: Multi-attribute value extraction from product profiles with adaptive decoding. In ACL, 2021. [Link]
    • Daheng Wang, Prashant Shiralkar, Colin Lockard, Binxuan Huang, Xin Luna Dong, Meng Jiang. TCN: Table convolutional network for web table interpretation. In WebConf, 2021. [Link]
    • [e-Commerce] Giannis Karamanolakis, Jun Ma, Xin Luna Dong. TXtract: Taxonomy-aware knowledge extraction for thousands of product categories. In ACL, 2020. [Link]
    • Colin Lockard, Prashant Shiralkar, Hannaneh Hajishirzi, Xin Luna Dong. ZeroShotCeres: Zero-shot relation extraction from semi-structured webpages. In ACL, 2020. [Link]
    • Colin Lockard, Prashant Shiralkar, Xin Luna Dong. OpenCeres: When open information extraction meets the semi-structured web. In NAACL, 2019. [Link]
    • Xiaolan Wang, Xin Luna Dong, Yang Li, Alexandra Meliou. MIDAS: Finding the right web sources to fill knowledge gaps. In ICDE, 2019. [PDF][Presentation]
    • Colin Lockard, Xin Luna Dong, Arash Einolghozati, Prashant Shiralkar. Ceres: Distantly supervised relation extraction from the semi-structured web. In VLDB, 2018. [Link]
    • [e-Commerce] Guineng Zheng, Subhabrata Mukherjee, Xin Luna Dong, Feifei Li. OpenTag: Open attribute value extraction from product profiles. In SigKDD, 2018. [Link]
    • [e-Commerce] Disheng Qiu, Luciano Barbosa, Xin Luna Dong, Yanyan Shen, Divesh Srivastava. DEXTER: Large-scale discovery and extraction of product specifications on the Web. In VLDB, 2016. [PDF]

     

  • Papers on knowledge integration
    • Di Jin, Bunyamin Sisman, Hao Wei, Xin Luna Dong, Danai Koutra. Deep transfer learning for multi-source entity linkage via domain adaptation. In VLDB, 2022. [Link]
    • Zhengbao Jiang, Jialong Han, Bunyamin Sisman, Xin Luna Dong. CoRI: Collective relation integration with data augmentation for open information extraction. In ACL, 2021. [Link]
    • Zhengyang Wang, Bunyamin Sisman, Hao Wei, Xin Luna Dong, Shuiwang Ji. CorDEL: A contrastive deep learning approach for entity linkage. In ICDM, 2020. [Link]
    • Qi Zhu, Hao Wei, Bunyamin Sisman, Xin Luna Dong, Christos Faloutsos, Jiawei Han. Collective multi-type entity alignment between knowledge graphs. In WebConf, 2020. [Link]
    • Wei Zhang, Hao Wei, Bunyamin Sisman, Xin Luna Dong, Christos Faloutsos, David Page. AutoBlock: A Hands-off Blocking Framework for Entity Matching. In WSDM, 2020. [Link]
    • [e-Commerce] Varun R. Embar, Bunyamin Sisman, Hao Wei, Xin Luna Dong, Christos Faloutsos, Lise Getoor. Contrastive Entity Linkage: Mining Variational Attributes from Large Catalogs for Entity Linkage. In AKBC, 2020. [Link]
    • Dongxu Zhang, Subhabrata Mukherjee, Colin Lockard, Xin Luna Dong, Andrew McCallum. OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference. In NAACL, 2019. [Link]
    • Rakshit Trivedi, Bunyamin Sisman, Jun Ma, Christos Faloustos, Hongyuan Zha, Xin Luna Dong. LinkNBed: Multi-graph representation learning with entity linkage. In ACL, 2018. [Link]

     

  • Papers on knowledge fusion, cleaning and evaluation
    • [e-Commerce] Kewei Cheng, Xian Li, Zhengyang Wang, Chenwei Zhang, Binxuan Huang, Yifan Xu, Xin Luna Dong, Yizhou Sun. Tab-Cleaner: Weakly supervised tabular data cleaning via pre-training for E-commerce catalog. In ACL, 2023. [Link]
    • [e-Commerce] Kewei Cheng, Xian Li, Yifan Xu, Xin Luna Dong, Yizhou Sun. PGE: Robust product graph embedding learning for error detection. In VLDB, 2022. [Link]
    • [e-Commerce] Yaqing Wang, Yifan Ethan Xu, Xian Li, Xin Luna Dong, Jing Gao. Automatic validation of textual attribute values in eCommerce Catalog by learning with limited labeled data. In KDD, 2020. [Link]
    • Junyang Gao, Xian Li, Yifan Ethan Xu, Bunyamin Sisman, Xin Luna Dong, and Jun Yang. Efficient knowledge graph accuracy evaluation. In VLDB, 2019. [Link]
    • Furong Li, Xin Luna Dong, Anno Largen, and Yang Li. Knowledge verification for long tail verticals. In VLDB, 2017. [PDF] [Report]
    • Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, and Wei Zhang. Knowledge-based trust: estimating the trustworthiness of web sources. In VLDB, 2015. [PDF][Presentation]
    • Xiaolan Wang, Xin Luna Dong, Alexandra Meliou. Data X-Ray: A diagnostic tool for data errors. In Sigmod, 2015. [PDF][Presentation][Demo]
    • Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. Knowledge Vault: A Web-scale approach to probabilistic knowledge fusion. In SIGKDD, 2014. [PDF]
    • Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Kevin Murphy, Shaohua Sun, and Wei Zhang. From data fusion to knowledge fusion. In VLDB, 2014. [PDF][Presentation]

     

  • Papers on knowledge mining and search
    • [e-Commerce] Liqiang Xiao, Jun Ma, Xin Luna Dong, Pascual Martinez-Gomez, Nasser Zalmout, Wei Chen, Tong Zhao, Hao He, Yaohui Jin. End-to-end conversational search for online shopping with utterance transfer. In EMNLP, 2021. [Link]
    • [e-Commerce] Yikun Xian, Tong Zhao, Jin Li, Jim Chan, Andrey Kan, Jun Ma, Xin Luna Dong, Christos Faloutsos, George Karypis, S. Muthukrishnan, Yongfeng Zhang. EX3: Explainable Attribute-aware Item-set Recommendations. In RecSys, 2021. [Link]
    • [e-Commerce] Junheng Hao, Tong Zhao, Jin Li, Xin Luna Dong, Christos Faloutsos, Yizhou Sun, Wei Wang. P-Companion: A principled framework for diversified complementary product recommendation. In CIKM, 2020. [Link]
    • Namyong Park, Andrey Kan, Christos Faloutsos, Xin Luna Dong. J-Recs: Principled and scalable recommendation justification. In ICDM, 2020. [Link]
    • Namyong Park, Andrey Kan, Tong Zhao, Christos Faloutsos, Xin Luna Dong. MultiImport: Inferring node importance in a knowledge graph from multiple input signals. In SigKDD, 2020. [Link]
    • Namyong Park, Andrey Kan, Xin Luna Dong, Tong Zhao, and Christos Faloutsos. Estimating node importance in knowledge graphs using graph neural networks. In SigKDD, 2019. [Link]
    • Qi Song, Yinghui Wu, and Xin Luna Dong. Mining summaries for knowledge graph search. In ICDM, 2016. [PDF]
    • Tim Althoff, Xin Luna Dong, Kevin Murphy, Safa Alai, Van Dang, and Wei Zhang. TimeMachine: Timeline generation for knowledge-base entities. In SIGKDD 2015. [PDF][Presentation]

 

 

Data integration (Aspects of data integration)

 

  • Book: Xin Luna Dong and Divesh Srivastava. Big Data Integration (Synthesis Lectures on Data Management). Morgan Claypool Publishers. 2015. [Link]
  • Sigmod blog interview: Courting ML: Witnessing the marriage of relational & web data systems to machine learning. 2018. [Link]
  • Sigmod blog interview: The elephant in the room: getting value from Big Data. 2015. [Link]

  •  

  • Projects

 

  • Tutorials
    • Xin Luna Dong and Theodoros Rekatsinas: Data integration and machine learning: a natural synergy. Tutorial in Sigmod'2018, VLDB'2018, KDD'2019. [Slides][Sigmod video]
    • Xin Luna Dong and Wang-Chiew Tan: A Time Machine for Information: Looking Back to Look Forward. Tutorial in VLDB, 2015. [PDF][Slides][Survey]
    • Xin Luna Dong and Divesh Srivastava. Big data integration. Tutorial in ICDE'13, VLDB'13. [PDF] [Slides (short)] [Slides (long)]
    • Xin Luna Dong and Divesh Srivastava. Large-Scale Copy Detection. Tutorial in ICDE'12, DASFAA'12, Sigmod'11. [PDF][Presentation]
    • Xin Luna Dong and Felix Naumann. Data fusion--Resolving data conflicts for integration. In VLDB, 2009. [PDF][Presentation]

 

  • Papers on big data integration
    • Theodoros Rehatsinas, Xin Luna Dong, Lise Getoor, Divesh Srivastava. Finding Quality in Quantity: The Challenge of Discovering Valuable Sources for Integration In CIDR, 2015. [PDF][Presentation]
    • Theodoros Rehatsinas, Xin Luna Dong, Divesh Srivastava. Characterizing and selecting fresh data sources. In VLDB, 2014. [PDF][Presentation]
    • Xin Luna Dong, Barna Saha, and Divesh Srivastava. Less is more: Selecting sources wisely for integration. In VLDB, 2013. [PDF][Report] [Slides (short)] [Slides (long)]
    • Mariam Salloum, Xin Luna Dong, Divesh Srivastava, Vassilis J. Tsotras. Online ordering of overlapping data sources. In VLDB, 2014. [PDF][Presentation]
  • Papers on data fusion (Resolving value heterogeneity)
    • Ravali Pochampally, Anish Das Sarma, Xin Luna Dong, Alexandra Meliou, and Divesh Srivastava. Fusing data with correlations. In Sigmod, 2014. [PDF][Presentation][Poster]
    • Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, and Divesh Srivastava. Scaling up Copy Detection. In ICDE, 2015. [PDF][Report]
    • Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, and Divesh Srivastava. Truth finding on the Deep Web: Is the problem solved? In VLDB, 2013. [PDF][Report][Presentation]
    • Xin Luna Dong and Divesh Srivastava. Compact explanation of data fusion decisions. In WWW, 2013. [PDF][Report][Presentation]
    • Xuan Liu, Xin Luna Dong, Beng Chin Ooi, and Divesh Srivastava: Online data fusion. In VLDB, 2011. [PDF][Presentation]
    • Anish Das Sarma, Xin Luna Dong, Alon Halevy. Data integration with dependent sources. In EDBT, 2011. [PDF][Presentation]
    • Xin Luna Dong, Laure Berti-EquilleYifan Hu, and Divesh Srivastava. Global detection of complex copying relationships between sources. In VLDB, 2010. [PDF][Presentation]
    • Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. Truth discovery and copying detection in a dynamic world. In VLDB, 2009. [PDF][Presentation]
    • Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. Integrating conflicting data: the role of source dependence. In VLDB, 2009. [PDF][Presentation]
    • Laure Berti-EquilleAnish Das Sarma, Xin Luna Dong, Amelie Marian, and Divesh Srivastava. Sailing the information ocean with awareness of currents: discovery and application of source dependence. In CIDR, 2009. [PDF][Presentation]

 

  • Papers on record linkage (Resolving instance heterogeneity)
    • Wenfei Fan, Zhe Fan, Chao Tian, and Xin Luna Dong. Keys for Graphs. In VLDB 2015. [PDF][Presentation]
    • Pei Li, Xin Luna Dong, Songtao Guo, Andrea Maurino, and Divesh Srivastava. Robust group linkage. In WWW 2015. [PDF][Presentation][Report]
    • Anja Gruenheid, Xin Luna Dong, and Divesh Srivastava. Incremental record linkage. In VLDB 2014. [PDF][Report][Presentation]
    • Pei Li, Xin Luna Dong, Andrea Maurino, and Divesh Srivastava. Linking Temporal Records. In VLDB 2011. [PDF][Presentation][Journal version]
    • Songtao Guo, Xin Luna Dong, Divesh Srivastava, and Remi Zajac. Record Linkage with Uniqueness Constraints and Erroneous Values. In VLDB, 2010. [PDF][Presentation]
    • Xin Dong, Alon Y. Halevy and Jayant Madhavan: Reference Reconciliation in Complex Information Spaces. In SIGMOD 2005. [PDF][Presentation]

 

  • Papers on schema mapping and Dataspaces (Resolving structure heterogeneity)
    • Anish Das Sarma, Xin Dong, and Alon Y. Halevy: Bootstrapping Pay-as-you-go Data Integration Systems. In SIGMOD, 2008. [PDF]
    • Xin Dong, Alon Y. Halevy and Cong Yu: Data Integration with Uncertainties. In VLDB, 2007. [PDF][Presentation][DBClip][JournalVersion in "Best papers of VLDB 2007"]
    • Xin Dong and Alon Y. Halevy: Indexing Dataspaces. In SIGMOD, 2007. [PDF][Presentation]
    • Xin Dong and Alon Y. Halevy: A Platform for Personal Information Management and Integration. In CIDR 2005. [PDF][Presentation]
    • Xin Dong, Alon Y. Halevy, Jayant MadhavanEma Nemes and Jun Zhang: Similarity Search for Web Services. In VLDB 2004. [PDF][Presentation]
    • Xin Dong, Alon Y. Halevy and Igor Tatarinov: Containment of Nested XML Queries. In VLDB 2004. [PDF][Presentation][Tech-report]

 

 


 

 

Recent Talks

  • Where are we in the journey to a Knowledgeable Assistant? [PPT]

o   Keynote at International Conference on Data Science and Management of Data (CODS-COMAD). Jodhpur, India, December, 2024.

o   Keynote at International Conference on Web Information Systems Engineering (WISE). Doha, Qatar, December, 2024.

o   Invited talk at Oracle Cloud AI Distinguished Speaker Series. Online, November, 2024.

o   Keynote at ACL Workshop on Knowledge Graphs and LLMs (KaLLM). Bangkok, Thailand, August 2024.

o   Keynote at Sigmod. Santiago, Chili, June 2024.

o   Keynote at WSDM Industry Day. Merida, Mexico, March 2024.

o   Keynote at KDD Workshop on Knowledge Augmented Methods for NLP, Long Beach, CA, August 2023.

  • Next-Generation Intelligent Assistants for Wearable Devices [PPT1] [PPT2]

o   Invited talk at UPenn Database Group, Philadelphia, Pennsylvania October 2024.

o   KDD Applied Data Science (ADS) Invited Talk, Barcelona, Spain August 2024.

o   Keynote at WSDM Workshop on LLMs for Individuals, Groups, and Society (LLM-IGS), Merida, Mexico, March 2024.

o   Keynote at KDD Workshop on Multimodal Learning, Long Beach, CA, August 2023.

o   Invited talk at Stanford Data Science Affiliates Program--Graph Learning Workshop, Palo Alto, CA, September 2022.

o   Keynote at RecSys Knowledge-Aware Recommender System (KaRS) Workshop, Seattle, WA, September 2022.

o   Invited talk at ML Lab of Danish Business Authority, Virtual, September 2022.

o   Invited talk at NAACL Workshop on Structured and Unstructured Knowledge Integrating (SUKI), Seattle, WA, July 2022.

o   Keynote at SIAM International Conference on Data Mining (SDM), Online, April 2022.

o   Invited talk at Trustworthy Data Science and AI Seminar Series, Simon Fraser University, April 2022.

  • Generations of Knowledge Graphs: The Crazy Ideas and the Business Impact [PPT]

o   Invited talk at WSDM workshop on Interactive and Scalable Information Retrieval Methods for E-Commerce(ISIR-eComm), Merida, Mexico, March 2024.

o   Keynote at Semantics Conference, Leipzig, Germany, September 2023.

o   VLDB Women in Database Research Award talk at VLDB, Vancouver, CA, August 2023.

o   Keynote at North East Database Day, Boston, MA, March 2023.

o   Keynote at The Extraction and Knowledge Management (EGC) Conference, Lyon, France, January 2023.

 

Previous Talks

  • Zero to One Billion: The Path to a Rich Product Knowledge Graph. Northwest DB Society Annual Meeting 2022 Keynote, AAAI DGL'2022 Keynote, WSDM MLoG'2022 keynote, AKBC Unstructured/Structured KBs Workshop 2021 keynote, SigKDD DQAML'2021 keynote, SigIR'2021 Industry keynote, ESWC'2021 keynote, Lecture at Stanford CS520 Knowlege Graphs--Data Models, Knowledge Acquisition, Inference and Applications 2021. [PPT]
  • Ceres: Harvesting Knowledge from Semi-Structured Web. SigKDD WIT'2021 Invited talk, TAC-KBP'2021 keynote, UCSB NLP Seminar invited talk 2021, CIKM'2020 keynote, AKBC'2020 invited talk, Invited talk at Northwest DB Society Annual Meeting 2020, SigKDD workshop on Truth Discovery and Fact Checking 2019 keynote, SigIR EARS'2019 keynote, Sigmod'2019 Amsterdam Data Science invited talk, ICML LLD'2019 invited talk, SigKDD MLG'2018 keynote. [PPT1] [PPT2]
  • Self-Driving Product Understanding for Thousands of Categories. PHKG'2021 Keynote, DeMaL'2021 Keynote, NorthEast Univ. DATA lab speaker series 2020, KG & E-Commerce Workshop 2020 Keynote, KR2ML'2019 Invited talk. [PPT]
  • Knowledge Graph And Machine Learning: A Natural Synergy. Lecture at Stanford CS520 Knowlege Graphs--How should AI explicitly represent knowledge, 2020 [PPT, Class notes by course organizer]
  • Building A Broad Knowledge Graph for Products. ICDE'19 Keynote, SigIR eCOM'19 Keynote, AI NEXTCon Seattle'2019 Invited talk, EMNLP FEVER workshop'2019 Invited talk, Duke CS Colloquium 2018, Berkeley RISE Seminar 2018, SigKDD ADS Invited talk 2018, VLDB PhD workshop'2018 Keynote [PPT]
  • Challenges and Innovations in Building a Product Knowledge Graph. GRADES'18 Keynote, BIG'18 Invited talk, MoDas'18 Invited talk, KBCOM'18 Invited talk, Northwest DB Day'18 Invited talk, AKBC'17 Invited talk. [PPT]
  • Leaving No Valuable Data Behind: the Crazy Ideas and the Business. AMW'17 Keynote, Machine Learning Conference Seattle'17 Invited talk, Distinguished speaker series: Oxford Women in CS 2017, Methods to Manage Heterogenous Big Data and Polystore Databases Workshop 2016 Keynote, VLDB Early Career Research Contribution Award talk 2016 [PPT]
  • How Far Are We from Collecting the Knowledge in the World. WebDB'16 Keynote, ICWE'16 Keynote [PPT]
  • Knowledge Fusion and Knowledge-Based Trust. Quora Invited talk'15, Stanford Computer Systems Colloquium (EE380)'15 Invited talk, NorCal DB Day'15 [PPT]
  • From Data Fusion to Knowledge Fusion. WISA'14 Keynote, APWeb'14 Tutorial, WACCK'14 Keynote, DEOS'14 Keynote. [PPT]
  • Truth Finding on the Deep Web. WAIM'13 Distinguished Young Lecturer Series, DESWEB'13 Keynote. [PPT]
  • Linking Records w. Value Diversity. [PPT]
  • Develop Your Big Ideas. Sigmod new-researcher symposium'11. [PPT]
  • Large-Scale Copy Detection. Tutorial at DASFAA'12, ICDE'12, Sigmod'11. [PPT]
  • Solomon: Seeking the Truth Via Copying Detection. BEWEB'11 Invited talk,  QDB'10 Keynote. [PPT][Video]
  • Sailing the information ocean with awareness of currents: discovery and application of source dependence. Invited talk at Person Validation and Entity Resolution Conference'11 (US Census Bureau),  ISAT "What's Data Worth?" Workshop'10, NDBC'09, SKG'09. [PPT]
  • Data fusion--Resolving data conflicts for integration. Tutorial at VLDB'09, NDBC'09.[PPT]
  • Data integration with uncertainty. [PPT]
  • Managing a space of heterogeneous data. [PPT]
  • Semex: A platform for personal information management and integration. [PPT]

 

 


 

Patents

  • Similar but Different (SBD): Presenting Item Recommendations in Dynamically Generated Groups with Explanations. Andrey Kan, Christos Faloustos, and Xin Dong. United States Patent 10.891.676, issued 1/2021.
  • Providing User-Interactive Graphical Timelines. Xin Dong, Tim Althoff, Kevin Murphy, Safa Alai, Van Dang, and Wei Zhang. United States Patent 20160313876, issued 10/27/2016.
  • Method and Apparatus for Exploring and Selecting Data Sources. Xin Dong and Divesh Srivastava. United States Patent 20130138480, issued 5/30/2013.
  • Detecting Dependence Between Sources in Truth Discovery. Xin Dong, Laure Berti-Equille, Divesh Srivastava. United States Patent 8190546, issued 5/29/2012.
  • Securing Database Content. Su Chen, Xin Dong, Laks Lakshmanan, and Divesh Srivastava. United States Patent 20120036136, 2/9/2012.
  • Online Data Fusion. Xuan Liu, Xin Dong, Ben Chin Ooi and Divesh Srivastava. United States Patent 20130144843, issued 12/5/2011.
  • Minimal difference query and view matching. Raghav Kaushik, Venkatesh Ganti and Xin Dong. United States Patent 7251646, issued 7/31/2007.
  • Method and apparatus for updating XML views of relational data. Philip L. Bohannon, Xin Dong, Henry F. Korth, Suryanarayan Perinkulam. United States Patent 20050165866, issued 7/28/2005.

 

 


 

Recent Professional Activities

  • Board of Trustees of the VLDB Endowment.
  • Chair of DBCares 2018-2023, Member of DBCares 2023-, .
  • Member of WSDM Test-of-time Award Committee 2025, VLDB Conference Strategy Committee, SDM Award Committee 2024, WSDM Best-Paper Award Committee 2024, PVLDB Advisory Committee 2018-2023, KDD Best-Paper Award Committee 2022, KDD 10-Year-Best-Paper Award Committee 2021, TCDE Award Committee 2017-2019, CIKM Best Paper Award Committee 2017.
  • PC Co-chair of KDD'22 Applied Data Science (ADS) Track, WSDM'22, VLDB'21, Sigmod'18.
  • Co-chair of ICDE'25 Panel Track, SigKDD'20 Applied Data Science Invited Talks, VLDB'19 Tutorial track, ICDE'19 Industry track, APWeb'16 Industry track, WAIM'15, APWeb'14 Distinguished Lecture Series, CIKM'13 Demo track, Workshop of Integrating Big data in the DIMACS workshop seriesACM SIGMOD New Researcher Symposium'12-13Sigmod/PODS Ph.D. Symposium'12-13, QDB'12 [Report], WebDB'10 [Report]SKG'09 [Issue].
  • PC area chair / senior member in Sigmod'26, VLDB'25, CIKM'24, Sigmod'24, WebConf'23, AAAI'23, VLDB'20, IJCAI'19, CIKM'19, APWEB-WAIM'19, AKBC'19, CIKM'18, Sigmod'17, Sigmod'15, ICDE'13, CIKM'11.
  • Associate editor of IEEE Data Engineering Bulletin 12/2024, 12/2023, 9/2011, SigKDD Explorations, VLDB Journal, TKDE, JDIQ "Wed Data Quality" special issue.
  • PC member in SigKDD'19, PVLDB'18, PVLDB'17, PVLDB'16, Sigmod'16, PVLDB'15, PVLDB'14,  Sigmod'14, PVLDB'13, Sigmod'12, VLDB'12, EDBT'12 Industry track, Sigmod'11, PVLDB'11, WAIM'11,AMW'11, PVLDB'10, ICDE'10, WWW'10, VLDB Demo'10, NTII'10, PVLDB'09, VLDB'09, CIKM'09, WebDB'09, WWW'08, CIKM'08, VLDB Demo'08, WebDB'08.
  • Referee for Nature, VLDB Journal, TODS, TCS, TOIT, TOIS, TKDE, IS.
  • NSF panelist, 2011.
  • NIH contract reviewer, 2008.

 

 


 

Resources

  • Here is a long and growing list of papers in database, IR and AI that I have collected during my research and my readings.
  • Here is a collection of wisdoms on career, research, life, etc.

 

 

 

 

Personal Life:

 

Xin Luna Dong 董欣 
lunadong@gmail.com
Tel: (201) 650-3494


 

In my personal life, I am

 


 

Here are what I learned about research from my life.

  • Challenge yourself.

"I want to prove P<>NP!"

"Got it! It's because of the 'N'!"

  • Work hard.

    

  • Don't offend reviewers.

 

 

 

 

 

 

 

 

 

 

 

Last update: 3/2013