Benchmarks, datasets, and patents.
Benchmarks for AI assistants on wearable devices, including WearVQA and WearVox for visual QA and voice assistant evaluation.
Comprehensive RAG Benchmark for question answering — used in KDD Cup 2024. Evaluates factual QA across 8 domains with diverse question types.
Multi-modal multi-turn extension of CRAG — powers KDD Cup 2025. Covers visual QA, multi-turn dialogue, and complex reasoning.
Benchmark for knowledge extraction from semi-structured websites, extended from the original SWDE dataset.
Benchmark for knowledge integration — entity matching, schema alignment, and knowledge graph completion.
Curated datasets for data fusion research: flight, stock, book, restaurant, and weather data with ground truth.