Professional Life:


Xin Luna Dong
345 Boren Ave.
Seattle, WA 98109
Tel: (206) 577-8580


My STEM video from AT&T



Xin Luna Dong


I am a principal scientist at Amazon since July 2016, leading the efforts to build Amazon Product Knowledge Graph, and managing the scientist team to conduct research on knowledge management, data cleaning and integration, information extraction, graph mining and embedding, knowledge-based search and recommendation.


Prior to joining Amazon, I worked for Google, AT&T Labs - Research. I received my Ph.D. in Computer Science and Engineering at Univ. of Washington. Before coming to the United States, I obtained a M.S. in Computer Science at Peking University, and a B.S. in Computer Science at Nankai University in China. I have the great honor to be awarded ACM Distinguished Member and the VLDB Early Career Research Contribution Award for "advancing the state of the art of knowledge fusion". 


You can find my (possibly out-of-date) C.V. here and resume here. 



Research Areas and Selected Publications


Below is a list of my projects and selected papers categorized by research area. You can find the full list of my publications here, my DBLP entry here, and my Google Scholar entry here.


Knowledge collection, fusion, mining, and search


  • Projects 

PGAmazon Product Graph We are building an authoritative knowledge graph for every product in the world, with the goal of answering any question about products and related knowledge. [Amazon Blog][Talk1][Talk2]

KVSonyaKnowledge Vault / Knowledge-based Trust—Knowledge fusion and trustworthiness evaluation KV collects knowledge from the Web for building a probabilistic knowledge base. KBT evaluates Web source quality from a new angle--correctness of factual information. [Talk 1][Talk 2][Talk 3]

Quotes from Washington Posts [1, 2, 3]: Still, even the possibility of a search engine that evaluates truth is a pretty incredible breakthrough. And it definitely gives new meaning to the phrase “let me Google that for you.”

  • Tutorials
    • Xin Luna Dong, Christos Faloustos, Andrey Kan, Jun Ma, Subhabrata Mukherjee. Graph and tensor mining: for fun and for profit. Tutorial in SigKDD'18. [Website]
    • Xin Luna Dong, Christos Faloustos, Xian Li, Subhabrata Mukherjee, Prashant Shiralkar. Fact checking: theory and practice. Tutorial in SigKDD'18. [Website]
    • Xin Luna Dong and Divesh Srivastava. Knowledge curation and knowledge fusion: challenges, models, and applications. Tutorial in Sigmod'15. [PDF][Presentation]


  • Papers on knowledge extraction
    • Colin Lockard, Xin Luna Dong, Arash Einolghozati, Prashant Shiralkar. Ceres: Distantly supervised relation extraction from the semi-structured web. In VLDB, 2018. [Link]
    • Guineng Zheng, Subhabrata Mukherjee, Xin Luna Dong, Feifei Li. OpenTag: Open attribute value extraction from product profiles. In SigKDD, 2018. [Link]
    • Disheng Qiu, Luciano Barbosa, Xin Luna Dong, Yanyan Shen, Divesh Srivastava. DEXTER: Large-scale discovery and extraction of product specifications on the Web. In VLDB, 2016. [PDF]


  • Papers on knowledge fusion
    • Furong Li, Xin Luna Dong, Anno Largen, and Yang Li. Knowledge verification for long tail verticals. In VLDB, 2017. [PDF] [Report]
    • Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, and Wei Zhang. Knowledge-based trust: estimating the trustworthiness of web sources. In VLDB, 2015. [PDF][Presentation]
    • Xiaolan Wang, Xin Luna Dong, Alexandra Meliou. Data X-Ray: A diagnostic tool for data errors. In Sigmod, 2015. [PDF][Presentation][Demo]
    • Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. Knowledge Vault: A Web-scale approach to probabilistic knowledge fusion. In SIGKDD, 2014. [PDF]
    • Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Kevin Murphy, Shaohua Sun, and Wei Zhang. From data fusion to knowledge fusion. In VLDB, 2014. [PDF][Presentation]


  • Papers on knowledge mining and embedding
    • LinkNBed: Multi-graph representation learning with entity linkage. In ACL, 2018. [PDF]
    • Qi Song, Yinghui Wu, and Xin Luna Dong. Mining summaries for knowledge graph search. In ICDM, 2016. [PDF]
    • Tim Althoff, Xin Luna Dong, Kevin Murphy, Safa Alai, Van Dang, and Wei Zhang. TimeMachine: Timeline generation for knowledge-base entities. In SIGKDD 2015. [PDF][Presentation]



Data integration (Aspects of data integration)


  • Book: Xin Luna Dong and Divesh Srivastava. Big Data Integration (Synthesis Lectures on Data Management). Morgan Claypool Publishers. 2015. [Link]
  • Sigmod blog interview: Courting ML: Witnessing the marriage of relational & web data systems to machine learning. 2018. [Link]
  • Sigmod blog interview: The elephant in the room: getting value from Big Data. 2015. [Link]

  • Projects


  • Tutorials
    • Xin Luna Dong and Theodoros Rekatsinas: Data integration and machine learning: a natural synergy. Tutorial in Sigmod'2018, VLDB'2018. [Website][Sigmod video]
    • Xin Luna Dong and Wang-Chiew Tan: A Time Machine for Information: Looking Back to Look Forward. Tutorial in VLDB, 2015. [PDF][Slides][Survey]
    • Xin Luna Dong and Divesh Srivastava. Big data integration. Tutorial in ICDE'13, VLDB'13. [PDF] [Slides (short)] [Slides (long)]
    • Xin Luna Dong and Divesh Srivastava. Large-Scale Copy Detection. Tutorial in ICDE'12, DASFAA'12, Sigmod'11. [PDF][Presentation]
    • Xin Luna Dong and Felix Naumann. Data fusion--Resolving data conflicts for integration. In VLDB, 2009. [PDF][Presentation]


  • Papers on big data integration
    • Theodoros Rehatsinas, Xin Luna Dong, Lise Getoor, Divesh Srivastava. Finding Quality in Quantity: The Challenge of Discovering Valuable Sources for Integration In CIDR, 2015. [PDF][Presentation]
    • Theodoros Rehatsinas, Xin Luna Dong, Divesh Srivastava. Characterizing and selecting fresh data sources. In VLDB, 2014. [PDF][Presentation]
    • Xin Luna Dong, Barna Saha, and Divesh Srivastava. Less is more: Selecting sources wisely for integration. In VLDB, 2013. [PDF][Report] [Slides (short)] [Slides (long)]
    • Mariam Salloum, Xin Luna Dong, Divesh Srivastava, Vassilis J. Tsotras. Online ordering of overlapping data sources. In VLDB, 2014. [PDF][Presentation]
  • Papers on data fusion (Resolving value heterogeneity)
    • Ravali Pochampally, Anish Das Sarma, Xin Luna Dong, Alexandra Meliou, and Divesh Srivastava. Fusing data with correlations. In Sigmod, 2014. [PDF][Presentation][Poster]
    • Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, and Divesh Srivastava. Scaling up Copy Detection. In ICDE, 2015. [PDF][Report]
    • Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, and Divesh Srivastava. Truth finding on the Deep Web: Is the problem solved? In VLDB, 2013. [PDF][Report][Presentation]
    • Xin Luna Dong and Divesh Srivastava. Compact explanation of data fusion decisions. In WWW, 2013. [PDF][Report][Presentation]
    • Xuan Liu, Xin Luna Dong, Beng Chin Ooi, and Divesh Srivastava: Online data fusion. In VLDB, 2011. [PDF][Presentation]
    • Anish Das Sarma, Xin Luna Dong, Alon Halevy. Data integration with dependent sources. In EDBT, 2011. [PDF][Presentation]
    • Xin Luna Dong, Laure Berti-EquilleYifan Hu, and Divesh Srivastava. Global detection of complex copying relationships between sources. In VLDB, 2010. [PDF][Presentation]
    • Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. Truth discovery and copying detection in a dynamic world. In VLDB, 2009. [PDF][Presentation]
    • Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. Integrating conflicting data: the role of source dependence. In VLDB, 2009. [PDF][Presentation]
    • Laure Berti-EquilleAnish Das Sarma, Xin Luna Dong, Amelie Marian, and Divesh Srivastava. Sailing the information ocean with awareness of currents: discovery and application of source dependence. In CIDR, 2009. [PDF][Presentation]


  • Papers on record linkage (Resolving instance heterogeneity)
    • Wenfei Fan, Zhe Fan, Chao Tian, and Xin Luna Dong. Keys for Graphs. In VLDB 2015. [PDF][Presentation]
    • Pei Li, Xin Luna Dong, Songtao Guo, Andrea Maurino, and Divesh Srivastava. Robust group linkage. In WWW 2015. [PDF][Presentation][Report]
    • Anja Gruenheid, Xin Luna Dong, and Divesh Srivastava. Incremental record linkage. In VLDB 2014. [PDF][Report][Presentation]
    • Pei Li, Xin Luna Dong, Andrea Maurino, and Divesh Srivastava. Linking Temporal Records. In VLDB 2011. [PDF][Presentation][JournalVersion]
    • Songtao Guo, Xin Luna Dong, Divesh Srivastava, and Remi Zajac. Record Linkage with Uniqueness Constraints and Erroneous Values. In VLDB, 2010. [PDF][Presentation]
    • Xin Dong, Alon Y. Halevy and Jayant Madhavan: Reference Reconciliation in Complex Information Spaces. In SIGMOD 2005. [PDF][Presentation]


  • Papers on schema mapping and Dataspaces (Resolving structure heterogeneity)
    • Anish Das Sarma, Xin Dong, and Alon Y. Halevy: Bootstrapping Pay-as-you-go Data Integration Systems. In SIGMOD, 2008. [PDF]
    • Xin Dong, Alon Y. Halevy and Cong Yu: Data Integration with Uncertainties. In VLDB, 2007. [PDF][Presentation][DBClip][JournalVersion in "Best papers of VLDB 2007"]
    • Xin Dong and Alon Y. Halevy: Indexing Dataspaces. In SIGMOD, 2007. [PDF][Presentation]
    • Xin Dong and Alon Y. Halevy: A Platform for Personal Information Management and Integration. In CIDR 2005. [PDF][Presentation]
    • Xin Dong, Alon Y. Halevy, Jayant MadhavanEma Nemes and Jun Zhang: Similarity Search for Web Services. In VLDB 2004. [PDF][Presentation]
    • Xin Dong, Alon Y. Halevy and Igor Tatarinov: Containment of Nested XML Queries. In VLDB 2004. [PDF][Presentation][Tech-report]





Recent Talks

  • Building a Broad Knowledge Graph for Products [PPT]

o   Keynote at IEEE International Conference on Data Engineering (ICDE), Macau, China, April 2019.

o   Invited talk at AI NEXTCon Seattle, WA, January 2019.

o   Invited talk at Amazon Research Days, Seattle, November 2018.

o   Invited talk at Fact Extraction and VERification (FEVER) Workshop, Brussels, Belgium, November 2018.

o   Invited talk at Joint Computer Science Colloquium / Everything Data Sciene Seminar at Duke, NC, October 2018.

o   Invited talk at Berkeley RISE Seminar, CA, October 2018.

o   Invited talk at Applied Data Science Invited Talks at SigKDD, London, UK, August 2018.

o   Keynote at PhD Workshop at VLDB, Rio de Janeiro, Brazil, August 2018.

  • Harvesting Knowledge from Semi-Structured Web Data [PPT]

o   Keynote at Mining and Learning with Graphs (MLG) Workshop, London, UK, August 2018.

  • Challenges and Innovations in Building a Product Knowledge Graph [PPT]

o   Keynote at Graph Data Management Experiences & Systems (GRADES) Workshop, Houston, TX, June 2018.

o   Invited talk at BigData Innovators Gathering (BIG) at WWW, Lyon, France, April 2018.

o   Invited talk at MoDas Workshop, Eilat, Isreal, February 2018.

o   Invited talk at Workshop on Knowledge Base Construction, Reasoning, and Mining (KBCOM), Los Angeles, CA, February 2018.

o   Invited talk at Northwest DB Day, Seattle, WA, Jan 2018.

o   Invited talk at Workshop on Automated Knowledge Based Construction (AKBC), Long Beach, CA, December 2017.


Previous Talks

  • Leaving No Valuable Data Behind: the Crazy Ideas and the Business. AMW'17 Keynote, Machine Learning Conference Seattle'17 Invited talk, Distinguished speaker series: Oxford Women in CS 2017, Methods to Manage Heterogenous Big Data and Polystore Databases Workshop 2016 Keynote, VLDB Early Career Research Contribution Award talk 2016 [PPT]
  • How Far Are We from Collecting the Knowledge in the World. WebDB'16 Keynote, ICWE'16 Keynote [PPT]
  • Knowledge Fusion and Knowledge-Based Trust. Quora invited talk'15, Stanford Computer Systems Colloquium (EE380)'15 invited talk, NorCal DB Day'15 [PPT]
  • From Data Fusion to Knowledge Fusion. WISA'14 Keynote, APWeb'14 Tutorial, WACCK'14 Keynote, DEOS'14 Keynote. [PPT]
  • Truth Finding on the Deep Web. WAIM'13 Distinguished Young Lecturer Series, DESWEB'13 Keynote. [PPT]
  • Linking Records w. Value Diversity. [PPT]
  • Develop Your Big Ideas. Sigmod new-researcher symposium'11. [PPT]
  • Large-Scale Copy Detection. Tutorial at DASFAA'12, ICDE'12, Sigmod'11. [PPT]
  • Solomon: Seeking the Truth Via Copying Detection. BEWEB'11 invited talk,  QDB'10 keynote talk. [PPT][Video]
  • Sailing the information ocean with awareness of currents: discovery and application of source dependence. Invited talk at Person Validation and Entity Resolution Conference'11 (US Census Bureau),  ISAT "What's Data Worth?" Workshop'10, NDBC'09, SKG'09. [PPT]
  • Data fusion--Resolving data conflicts for integration. Tutorial at VLDB'09, NDBC'09.[PPT]
  • Data integration with uncertainty. [PPT]
  • Managing a space of heterogeneous data. [PPT]
  • Semex: A platform for personal information management and integration. [PPT]





  • Similar but Different (SBD): Presenting Item Recommendations in Dynamically Generated Groups with Explanations. Andrey Kan, Christos Faloustos, and Xin Dong. United States Patent, filed 7/2018, to be issued.
  • Providing User-Interactive Graphical Timelines. Xin Dong, Tim Althoff, Kevin Murphy, Safa Alai, Van Dang, and Wei Zhang. United States Patent, filed 9/2015, to be issued.
  • Method and Apparatus for Exploring and Selecting Data Sources. Xin Dong and Divesh Srivastava. United States Patent 20130138480, issued 5/30/2013.
  • Online Data Fusion. Xuan Liu, Xin Dong, Ben Chin Ooi and Divesh Srivastava. United States Patent, 20130144843, issued 12/5/2011.
  • Update Certificates. Su Chen, Xin Dong, Laks Lakshmanan, and Divesh Srivastava. United States Patent, filed 9/2010, to be issued.
  • Detecting Dependence Between Sources in Truth Discovery. Xin Dong, Laure Berti-Equille, Divesh Srivastava. United States Patent 8190546, issued 5/29/2012.
  • Minimal difference query and view matching. Raghav Kaushik, Venkatesh Ganti and Xin Dong. United States Patent 7251646, issued 7/31/2007.
  • Method and apparatus for updating XML views of relational data. Philip L. Bohannon, Xin
    Dong, Henry F. Korth, Suryanarayan Perinkulam. United States Patent 20050165866, filed Jan 28, 2004, to be issued.




Recent Professional Activities

  • Board of Trustees of the VLDB Endowment.
  • Chair of DBCares.
  • Member of VLDB Conference Strategy Committee, PVLDB Advisory Committee, TCDE Award 2017-2019, CIKM Best Paper Award 2017.
  • Co-chair of VLDB'21, VLDB'19 Tutorial track, ICDE'19 Industry track, Sigmod'18, APWeb'16 Industry track, WAIM'15, APWeb'14 Distinguished Lecture Series, CIKM'13 Demo track, Workshop of Integrating Big data in the DIMACS workshop seriesACM SIGMOD New Researcher Symposium'12-13Sigmod/PODS Ph.D. Symposium'12-13, QDB'12 [Report], WebDB'10 [Report]SKG'09 [Issue].
  • PC area chair in VLDB'20, APWEB-WAIM'19, AKBC'19, CIKM'18, Sigmod'17, Sigmod'15, ICDE'13, CIKM'11.
  • Associate editor of SigKDD Explorations, VLDB Journal, TKDE, JDIQ "Wed Data Quality" special issue, IEEE Data Engineering Bulletin 9/2011.
  • PC member in  SigKDD'19, PVLDB'18, PVLDB'17, PVLDB'16, Sigmod'16, PVLDB'15, PVLDB'14,  Sigmod'14, PVLDB'13, Sigmod'12, VLDB'12, EDBT'12 Industry track, Sigmod'11, PVLDB'11, WAIM'11,AMW'11, PVLDB'10, ICDE'10, WWW'10, VLDB Demo'10, NTII'10, PVLDB'09, VLDB'09, CIKM'09, WebDB'09, WWW'08, CIKM'08, VLDB Demo'08, WebDB'08.
  • Referee for VLDB Journal, TODS, TCS, TOIT, TOIS, TKDE, IS.
  • NSF panelist, 2011.
  • NIH contract reviewer, 2008.





  • Here is a long and growing list of papers in database, IR and AI that I have collected during my research and my readings.
  • Here is a collection of wisdoms on career, research, life, etc.





Personal Life:


Xin Luna Dong 董欣
Tel: (201) 650-3494


In my personal life, I am



Here are what I learned about research from my life.

  • Challenge yourself.

"I want to prove P<>NP!"

"Got it! It's because of the 'N'!"

  • Work hard.


  • No matter how deep the slope is, you can take only one turn at a time.


  • Don't offend reviewers.












Last update: 3/2013