Professional Life:

 

Xin Luna Dong
lunadong@google.com
1600 Amphitheater Pkwy
Mountain View, CA 94043
Tel: (650)253-5271

 

My STEM video from AT&T

 


 

Xin Luna Dong

 

I am a research scientist at Google since Jan 2013. I work on enriching and cleaning knowledge for Google knowledge graph. My research interest includes data integration, data cleaning, and knowledge management.

 

Prior to joining Google, I worked for AT&T Labs - Research. I received my Ph.D. in Computer Science and Engineering at Univ. of Washington. Before coming to the United States, I obtained a M.S. in Computer Science at Peking University, and a B.S. in Computer Science at Nankai University in China.

 

You can find my C.V. here. 

 


 

Research Areas and Selected Publications

 

My research areas correspond to the different aspects of data integration. Below is a list of my projects and selected papers categorized by research area. You can find the full list of my publications here, my DBLP entry here, and my Google Scholar entry here.

Big data integration

 

  • Sigmod blog interview: The elephant in the room: getting value from Big Data. [Link]
  • Book: Xin Luna Dong and Divesh Srivastava. Big Data Integration (Synthesis Lectures on Data Management). Morgan Claypool Publishers. 2015. [Link]
  • Tutorials
  • Project: Alexander—Source exploration and selection

In the big data environment, we often lack resources to manage data rather than lacking data themselves. Alexander aims at helping administrators explore available data sources and select the sources to balance the quality of integration and the cost of integration.

  • Papers
    • Theodoros Rehatsinas, Xin Luna Dong, Lise Getoor, Divesh Srivastava. Finding Quality in Quantity: The Challenge of Discovering Valuable Sources for Integration In CIDR, 2015. [PDF][Presentation]
    • Mariam Salloum, Xin Luna Dong, Divesh Srivastava, Vassilis J. Tsotras. Online ordering of overlapping data sources. In VLDB, 2014. [PDF][Presentation]
    • Theodoros Rehatsinas, Xin Luna Dong, Divesh Srivastava. Characterizing and selecting fresh data sources. In VLDB, 2014. [PDF][Presentation]
    • Xin Luna Dong, Barna Saha, and Divesh Srivastava. Less is more: Selecting sources wisely for integration. In VLDB, 2013. [PDF][Report] [Slides (short)] [Slides (long)]

 

 

Time machine

     

  • Project: Time Machine--Managing and exploring temporal information

Our time machine collects, integrates, and cleans temporal information available on the Web. It then generates a timeline of events and relations for entities, and mines models and trends out of it. The moonshot goal is to record and preserve history accurately, and to help people "look back" and so as to "look forward". [Demo]

  • Tutorial
    • Xin Luna Dong and Wang-Chiew Tan: A Time Machine for Information: Looking Back to Look Forward. Tutorial in VLDB, 2015. [PDF]
  • Papers
    • Tim Althoff, Xin Luna Dong, Kevin Murphy, Safa Alai, Van Dang, and Wei Zhang. TimeMachine: Timeline generation for knowledge-base entities. In SIGKDD 2015. [PDF][Presentation]
    • Theodoros Rehatsinas, Xin Luna Dong, Divesh Srivastava. Characterizing and selecting fresh data sources. In VLDB, 2014. [PDF][Presentation]
    • Pei Li, Xin Luna Dong, Andrea Maurino, and Divesh Srivastava. Linking Temporal Records. In VLDB 2011. [PDF][Presentation][JournalVersion]
    • Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. Truth discovery and copying detection in a dynamic world. In VLDB, 2009. [PDF][Presentation]

 

 

Data fusion and knowledge fusion (Resolving value heterogeneity)

 

  • Projects: 

KVSonyaKnowledge Vault / Knowledge-based Trust—Knowledge fusion and trustworthiness evaluation KV collects knowledge from the Web for building a probabilistic knowledge base. KBT evaluates Web source quality from a new angle--correctness of factual information. [Talk 1][Talk 2]

Quotes from Washington Posts [1, 2, 3]: Still, even the possibility of a search engine that evaluates truth is a pretty incredible breakthrough. And it definitely gives new meaning to the phrase “let me Google that for you.”

LEAD Technologies Inc. V1.01Solomon—Truth discovery w. copy detection The Web has eased the ability to publish and spread false information across multiple sources, making it hard to separate the wheat from the chaff. Solomon aims at detecting copying between data sources and leveraging such knowledge for deciding truth from conflicting information. [Vision paper][Talk][Demo][Data sets]

 

  • Tutorials
    • Xin Luna Dong and Divesh Srivastava. Knowledge curation and knowledge fusion: challenges, models, and applications. Tutorial in Sigmod'15. [PDF][Presentation]
    • Xin Luna Dong and Divesh Srivastava. Large-Scale Copy Detection. Tutorial in ICDE'12, DASFAA'12, Sigmod'11. [PDF][Presentation]
    • Xin Luna Dong and Felix Naumann. Data fusion--Resolving data conflicts for integration. In VLDB, 2009. [PDF][Presentation]
  • Papers on knowledge fusion
    • Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, and Wei Zhang. Knowledge-based trust: estimating the trustworthiness of web sources. In VLDB, 2015. [PDF][Presentation]
    • Xiaolan Wang, Xin Luna Dong, Alexandra Meliou. Data X-Ray: A diagnostic tool for data errors. In Sigmod, 2015. [PDF][Presentation][Demo]
    • Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. Knowledge Vault: A Web-scale approach to probabilistic knowledge fusion. In SIGKDD, 2014. [PDF]
    • Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Kevin Murphy, Shaohua Sun, and Wei Zhang. From data fusion to knowledge fusion. In VLDB, 2014. [PDF][Presentation]
    • Ravali Pochampally, Anish Das Sarma, Xin Luna Dong, Alexandra Meliou, and Divesh Srivastava. Fusing data with correlations. In Sigmod, 2014. [PDF][Presentation][Poster]
  • Papers on data fusion
    • Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, and Divesh Srivastava. Scaling up Copy Detection. In ICDE, 2015. [PDF][Report]
    • Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, and Divesh Srivastava. Truth finding on the Deep Web: Is the problem solved? In VLDB, 2013. [PDF][Report][Presentation]
    • Xin Luna Dong and Divesh Srivastava. Compact explanation of data fusion decisions. In WWW, 2013. [PDF][Report][Presentation]
    • Xuan Liu, Xin Luna Dong, Beng Chin Ooi, and Divesh Srivastava: Online data fusion. In VLDB, 2011. [PDF][Presentation]
    • Anish Das Sarma, Xin Luna Dong, Alon Halevy. Data integration with dependent sources. In EDBT, 2011. [PDF][Presentation]
    • Xin Luna Dong, Laure Berti-EquilleYifan Hu, and Divesh Srivastava. Global detection of complex copying relationships between sources. In VLDB, 2010. [PDF][Presentation]
    • Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. Truth discovery and copying detection in a dynamic world. In VLDB, 2009. [PDF][Presentation]
    • Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. Integrating conflicting data: the role of source dependence. In VLDB, 2009. [PDF][Presentation]
    • Laure Berti-EquilleAnish Das Sarma, Xin Luna Dong, Amelie Marian, and Divesh Srivastava. Sailing the information ocean with awareness of currents: discovery and application of source dependence. In CIDR, 2009. [PDF][Presentation]

 

 

Record linkage (Resolving instance heterogeneity)

 

  • Project: Chronos— Linking records w. diverse values

In many data sets records that refer to the same real-world entity can have very diverse values, because of erroneous valuesvalue evolution over time, or "local" properties for members in the same group. We study how to link records with tolerance to fairly high diversity of values. [Talk][Demo]

 

  • Papers
    • Wenfei Fan, Zhe Fan, Chao Tian, and Xin Luna Dong. Keys for Graphs. In VLDB 2015. [PDF][Presentation]
    • Pei Li, Xin Luna Dong, Songtao Guo, Andrea Maurino, and Divesh Srivastava. Robust group linkage. In WWW 2015. [PDF][Presentation][Report]
    • Anja Gruenheid, Xin Luna Dong, and Divesh Srivastava. Incremental record linkage. In VLDB 2014. [PDF][Report][Presentation]
    • Pei Li, Xin Luna Dong, Andrea Maurino, and Divesh Srivastava. Linking Temporal Records. In VLDB 2011. [PDF][Presentation][JournalVersion]
    • Songtao Guo, Xin Luna Dong, Divesh Srivastava, and Remi Zajac. Record Linkage with Uniqueness Constraints and Erroneous Values. In VLDB, 2010. [PDF][Presentation]
    • Xin Dong, Alon Y. Halevy and Jayant Madhavan: Reference Reconciliation in Complex Information Spaces. In SIGMOD 2005. [PDF][Presentation]

 

 

Schema mapping and Dataspaces (Resolving structure heterogeneity)

 

  • Previous projects

 

 

  • Papers
    • Anish Das Sarma, Xin Dong, and Alon Y. Halevy: Bootstrapping Pay-as-you-go Data Integration Systems. In SIGMOD, 2008. [PDF]
    • Xin Dong, Alon Y. Halevy and Cong Yu: Data Integration with Uncertainties. In VLDB, 2007. [PDF][Presentation][DBClip][JournalVersion in "Best papers of VLDB 2007"]
    • Xin Dong and Alon Y. Halevy: Indexing Dataspaces. In SIGMOD, 2007. [PDF][Presentation]
    • Xin Dong and Alon Y. Halevy: A Platform for Personal Information Management and Integration. In CIDR 2005. [PDF][Presentation]
    • Xin Dong, Alon Y. Halevy, Jayant MadhavanEma Nemes and Jun Zhang: Similarity Search for Web Services. In VLDB 2004. [PDF][Presentation]
    • Xin Dong, Alon Y. Halevy and Igor Tatarinov: Containment of Nested XML Queries. In VLDB 2004. [PDF][Presentation][Tech-report]

 

 


 

 

Recent Talks

  • Knowledge Fusion and Knowledge-Based Trust. [PPT]

o   Invited talk at Quora, Palo Alto, CA, May 2015.

o   Invited talk at Stanford Computer Systems Colloquium (EE380), Palo Alto, CA, Apr 2015.

o   Invited talk at NorCal DB Day 2015, Santa Cruz, CA, Apr 2015.

  • From Data Fusion to Knowledge Fusion. [PPT]

o   Keynote at WISA (Web Information Systems and Applications Conference), Tianjin, China, Sep 2014.

o   Tutorial at APWeb, Changsha, China, Sep 2014.

o   Keynote at WACCK (Workshop on Automatic Construction and Curation of Knowledge-bases), Snowbird, Utah, Jun 2014.

o   Keynote at DEOS (Data Extraction and Object Search), Seoul, Korea, Apr 2014.

 

Previous Talks

  • Truth Finding on the Deep Web. WAIM'13 Distinguished Young Lecturer Series, DESWEB'13 Keynote. [PPT]
  • Linking Records w. Value Diversity. [PPT]
  • Develop Your Big Ideas. Sigmod new-researcher symposium'11. [PPT]
  • Large-Scale Copy Detection. Tutorial at DASFAA'12, ICDE'12, Sigmod'11. [PPT]
  • Solomon: Seeking the Truth Via Copying Detection. BEWEB'11 invited talk,  QDB'10 keynote talk. [PPT][Video]
  • Sailing the information ocean with awareness of currents: discovery and application of source dependence. Invited talk at Person Validation and Entity Resolution Conference'11 (US Census Bureau),  ISAT "What's Data Worth?" Workshop'10, NDBC'09, SKG'09. [PPT]
  • Data fusion--Resolving data conflicts for integration. Tutorial at VLDB'09, NDBC'09.[PPT]
  • Data integration with uncertainty. [PPT]
  • Managing a space of heterogeneous data. [PPT]
  • Semex: A platform for personal information management and integration. [PPT]

 

 


 

Patents

  • Method and Apparatus for Exploring and Selecting Data Sources. Xin Dong and Divesh Srivastava. United States Patent, filed 12/2011, to be issued.
  • Online Data Fusion. Xuan Liu, Xin Dong, Ben Chin Ooi and Divesh Srivastava. United States Patent, filed 12/2011, to be issued.
  • Update Certificates. Su Chen, Xin Dong, Laks Lakshmanan, and Divesh Srivastava. United States Patent, filed 9/2010, to be issued.
  • Detecting Dependence Between Sources in Truth Discovery. Xin Dong, Laure Berti-Equille, Divesh Srivastava. United States Patent 8190546, issued 5/29/2012.
  • Minimal difference query and view matching. Raghav Kaushik, Venkatesh Ganti and Xin Dong. United States Patent 7251646, issued 7/31/2007.
  • Method and apparatus for updating XML views of relational data. Philip L. Bohannon, Xin
    Dong, Henry F. Korth, Suryanarayan Perinkulam. United States Patent 20050165866, filed Jan 28, 2004, to be issued.

 

 


 

Recent Professional Activities

 

 


 

Resources

  • Here is a long and growing list of papers in database, IR and AI that I have collected during my research and my readings.
  • Here is a collection of wisdoms on career, research, life, etc.

 

 

 

 

Personal Life:

 

Xin Luna Dong 董欣 
lunadong@gmail.com
Tel: (201)650-3494


 

In my personal life, I am

 


 

Here are what I learned about research from my life.

  • Challenge yourself.

"I want to prove P<>NP!"

"Got it! It's because of the 'N'!"

  • Work hard.

    

  • No matter how deep the slope is, you can take only one turn at a time.

 

















  • Don't offend reviewers.

 

 

 

 

 

 

 

 

 

 

 

Last update: 3/2013