Resources

Reading List

History and General Background

  • Relational Database
    • E.F. Codd.  A Relational Model of Data for Large Shared Data Banks.  CACM 13(6), 1970, pp. 377-387. (Must read. First paper about RDB.)
    • Michael Stonebraker.  Operating System Support for Database Management. CACM 24(7), 1981, pp. 412-418. (Must read)
    • D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J. Gray, W.F. King III, B. G. Lindsay, R.A. Lorie, J. W. Mehl, T. G. Price, G. R. Putzolu, P. G. Selinger, M. Schkolnick, D. R. Slutz, I. L. Traiger, B. W. Wade, R. A. Yost. A History and Evaluation of System R. CACM 24(10), 1981, pp. 632-646.
    • O. G. Tsatalos, M. H. Solomon, and Y. Ioannidis. The GMAP: A versatile Tool for Physical Data Independence. VLDB 1994. (Must-read. Dependence using views)
  • Object-oriented
    • Sort-of-survey: M.J. Carey, D.J DeWitt.  Of Objects and Databases: A Decade of Turmoil.  VLDB 1996. (Must read.)
    • M. Stonebraker, G. Kemnitz.  The Postgres Next Generation Database Management System. CACM 34(10), 1991, pp. 78-92. (Must read. Important paper about ORDB.)
    • L. M. Haas, W.  Chang, G. M. Lohman, J. McPherson, P. F. Wilms, G. Lapis, B. G. Lindsay, H. Pirahesh, M. J. Carey, E. J. Shekita: Starburst Mid-Flight: As the Dust Clears. TKDE 2(1), 1990, pp. 143-160.
    • S. Abiteboul and A. Bonner. Objects and Views. SIGMOD 1991.
    • M.J. Carey, D.J. DeWitt, J. Naughton.  The OO7 Benchmark.  SIGMOD 1993.
  • Temporal Database
    • Temporal DB: C. Zaniolo, S. Ceri, C. Faloutsos, R. T. Snodgrass, V.S. Subrahmanian, R. Zicari.  Advanced Database Systems.  Chapter 5 (Overview of Temporal Databases), pp. 99-121. (Must-read)
  • Network/Hierarchical Databases
    • E. H. Sibley: Development of Data-Base Technology. ACM Computing Surveys, Vol 8 No 1, 1976. 
    • Dennis Tsichritzis and Frederick Lochovsky: Hierarchical Data-Base Management: A Survey. ACM Computing Surveys, Vol 8 No 1, 1976.
    • Ann Michaels, Benjamin Mittman, C. Robert Carlson: A Comparison of the Relational and CODASYL Approaches to Data-Base Management. ACM Computing Surveys, Vol 8 No 1, 1976.
  • Document Databases
    • Arjan Loeffen: Text databases: a survey of text models and systems. SIGMOD Record, Vol 23 #1, 1994.
    • Mariano P. Consens and Tova Milo: Optimizing queries on files. SIGMOD 1994.

Up to top

Theory of Databases

  • Halpern, Harper, Immerman, Kolaitis, Vardi and Vianu. On the Unusual Effectiveness of Logic in Computer Science. Bulletin of Symbolic Logic, July 2002.

  • Christos Papadimitriou. Database metatheory: asking the big queries. PODS 95.

Dependency, Constraints and Triggers

  • Normal forms and design theory
    • J. Ullman. Database and Knowledge Base Systems, vol. I. Chapter 7 (Design Theory).  (Must-read)
    • Marcelo Arenas, Leonid Libkin: Normalizing XML documents. PODS 2002. (Must-read)
    • Marcelo Arenas, Leonid Libkin: An Information-Theoretic Approach to Normal Forms for Relational and XML Data. PODS 2003. (PODS best paper).
  • Dependency and chase rule
    • J. Ullman. Database and Knowledge Base Systems, vol. I. Chapter 7.11 (Generalized Dependencies). (Must-read)
    • S. Abiteboul, R. Hull, V. Vianu.  Foundations of Databases. Chapter 8 (Functional and Join Dependency), Chapter 9 (Inclusion Dependency). (Must-read)
  • Constraints and triggers
    • S. Ceri, R. Cochrane, J. Widom. Practical Applications of Triggers and Constraints: Successes and Lingering Issues. VLDB 2000. (Must-read)
  • Discovering constraints
    • Y. Huhtala, J. Karkkainen, P. Porkka, and H. Toivonen. TANE: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 1999
    • I. F. Ilyas, V. Markl, P. J. Haas, P. G. Brown, and A. Aboulnaga. Cords: Automatic generation of correlation statistics in DB2. In VLDB'04.

Up to top

Query Evaluation

  • J. Ullman. Database and Knowledge Base Systems, vol. I.  Chapter 3 (Logic as a Data Model). (Must-read)
  • J. Ullman. Database and Knowledge Base Systems, vol. II.  Chapters 12 (Top-Down Evaluation), 13 (Magic Sets). (Must-read)
  • Christos H. Papadimitriou, Mihalis Yannakakis: On the Complexity of Database Queries. PODS'1997 (PODS best paper)
  • P. Buneman, S. A. Naqvi, V. Tannen, L. Wong: Principles of Programming with Complex Objects and Collection Types. TCS 149(1): 3–48 (1995).
  • Serge Abiteboul and Paris C. Kanellakis. Object Identity as a Query Language Primitive. JACM 1998.

Up to top

Query Containment

  • Theoretical Results
    • Conjunctive queries
      • A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational databases. In Proc. of STOC, 1977.

      • A. Aho, Y. Sagiv, and J. D. Ullman. Equivalence of  relational expressions. SIAM Journal of computing, (8)2:218-246, 1979.

    • Acyclic queries: M. Yannakakis. Algorithms for acyclic database schemes. VLDB, 82-94, 1981.

    • Union: Y. Sagiv and M. Yannakakis. Equivalence among relational expressions with the union and diference operators. Journal of the ACM, 27(4):633-655, 1980.

    • Negation: A. Y. Levy and Y. Sagiv. Queries independent of updates. In Proc. of VLDB, 1993.

    • Arithmetic comparisons

      • A. Klug. On conjunctive queries containing in-equalities. Journal of the ACM, 35(1):146-160, 1988.

      • R. van der Meyden. The complexity of querying indefinite data about linearly ordered domains. In Proc. of PODS, pages 331-345, San Diego, CA., 1992.

      • X. Zhang and M. Z. Ozsoyoglu. On efficient reasoning with implication constraints. In Proc. of DOOD, 1993.

      • A. Gupta, Y. Sagiv, J. D. Ullman, and J. Widom. Constraint checking with partial information. In Proc. of PODS, 45-55, Minneapolis, Minnesota, 1994.

    • Recursive queries

      • O. Shmueli. Equivalence of datalog queries is undecidable. Journal of Logic Programming, 15:231-241, 1993.

      • S. Chaudhuri and M. Vardi. On the equivalence of recursive and nonrecursive datalog programs. In Proc. of PODS, 55-66, San Diego, CA., 1992.

    • Queries over bags

      • S. Chaudhuri and M. Vardi. Optimizing real conjunctive queries. In Proc. of PODS, 1993.

      • Y. E. Ioannidis and R. Ramakrishnan. Containment of conjunctive queries: Beyond relations as sets. ACM Transactions on Database Systems, 20(3):288-324, 1995.

    • XPaths

      • G. Miklau and D. Suciu. Containment and equivalence for an XPath fragment. In Proc. of PODS, 2002.

      • S. Amer-Yahia, S. Cho, L.V.S.Lakshmanan, and D.Srivastava. Minimization of tree pattern queries. In Proc. of SIGMOD, 2001.

      • T. Milo and D. Suciu. Index structures for path expressions. In Proc. of ICDT, pages 277{295, 1999.

      • Alin Deutsch and Val Tannen. Containment and Integrity Constratints for XPath Fragment. Tech Report. 2001.

      • D. Florescu, A. Levy, and D. Suciu. Query containment for conjunctive queries with regular expressions. In Proc.of PODS, Seattle,WA, 1998.

    • Nested Queries

      • Xin Dong, Alon Halevy, Igor Tatarinov. Containment of Nested XML Queries. VLDB 2004.

      • A. Y. Levy and D. Suciu. Deciding containment for queries with complex objects and aggregations. In Proc. of PODS, Tucson, Arizona., 1997.

  • Application in Query Optimization
    • S. Abiteboul, R. Hull, V. Vianu.  Foundations of Databases. Chapter 6, Sections 6.2 (Global Optimizations) and 6.4 (Computing with Acyclic Joins).
    • J. Ullman. Database and Knowledge Base Systems, vol. II. Chapter 14 (Containment).

Up to top

Answering Queries Using Views

  • Survey
    • A.Y. Halevy.  Answering Queries Using Views: A Survey.  VLDB Journal, 10(4). (Must-read)
    • Jeffrey D. Ullman. Information Integration Using Logical Views. ICDT 1997. (Must-read)
  • Theoretical Results

    • S. Abiteboul, O. M. Duschka.  Complexity of Answering Queries Using Materialized Views.  PODS 1998. (Must-read)
    • Alon Levy, Alberto Mendelzon, Yehoshua Sagiv and Divesh Srivastava. Answering queries using views. PODS 1995.
    • Oliver M. Duschka and Michael R. Genesereth. Answering recursive queries using views. PODS 1997.
  • Practical Algorithms

    • Buckets: A. Y. Levy, A. Rajaraman, J. J. Ordille.  Querying Heterogeneous Information Sources Using Source Descriptions. VLDB 1996.
    • Inverse Rules: Oliver M. Duschka, Michael R. Genesereth, and Alon Halevy. Recursive query plans for data integration. Journal of Logic Programming, 1999.
    • MiniCon: R. Pottinger, A.Y. Halevy.  MiniCon: A Scalable Algorithm for Answering Queries Using Views. VLDB Journal 10(2-3), pp 182-198, 2001.
  • Query Rewriting for XML Data
    • M. Y. Vardi: Constraint Satisfaction and Database Theory: A Tutorial. PODS 2000, pp. 76–85.
    • Yannis Papakonstantinou and Vasilis Vassalos. Query rewriting for semistructured data. Sigmod 1999.
    • Alin Deutsch and Val Tannen. Reformulation of XML queries and constraints. ICDT 2003.
    • Marcelo Arenas and Leonid Libkin. XML data exchange: consistency and query answering. PODS 2005. (PODS Best-paper)

    Up to top

Data Storage and Indexes

  • Raghu Ramakrishnan and Johannes Gehrke. Database Management Systems (2nd ed.) Chpt 7-10 (Data storage and indexing). (Must-read).
  • C. Zaniolo, S. Ceri, C. Faloutsos, R. T. Snodgrass, V.S. Subrahmanian, R. Zicari.  Advanced Database Systems, Chapter 11 (Traditional Indexing Methods), pp. 269-294.
  • J. M. Hellerstein, J. F. Naughton, A. Pfeffer: Generalized Search Trees for Database Systems. VLDB 1995 (Must-read)
  • OO Indexing
    • Elisa Bertino and Won Kim: Indexing Techniques for Queries on Nested Objects. TKDE 1989.
    • Alfons Kemper, Guido Moerkotte: Access Support in Object Bases. SIGMOD 1990.
    • Paris Kanellakis, Sridhar Ramaswamy, Darren Vengroff, and Jeffrey Vitter: Indexing for Data Models with Constraints and Classes. PODS 1993.

Up to top

XML Indexing

  • Structure Indexes
    • DataGuides:
      • Roy Goldman and Jennifer Widom. DataGuides: Enabling query formulation and optimization in semistructured databases. VLDB 1997.
      • Roy Goldman and Jennifer Widom. Approximate DataGuides. Workshop on Query Processing for semistructured data and non-standard data formats. 1999.
      • Svetlozar Nestorov, Jeffrey Ullman, Janet Wiener and Sudarshan Chawathe. Representative objects: concise representations of semistructured, hierarchical data. ICDE 1997.
    • 1-index: Tova Milo and Dan Suciu. Index structures for path expressions. ICDT 1999.
    • Adaptive index: C. Chung, J. Min and K. Shim. APEX: An adaptive path index for XML data. Sigmod 2002.
    • A(k)-index: Raghav Kaushik, Pradeep Shenoy, Philip Bohannon and Ehud Gudes. Exploiting local similarity for indexing paths in graph-structured data. ICDE 2002.
    • D(k)-index: Qun Chen, Andrew Lim and Kian Win Ong. D(k)-index: An adaptive structural summary for graph-structured data. Sigmod 2003.
    • M(k)-index: Hao He and Jun Yang. Multiresolution Indexing of XML for Frequent Queries. ICDE 2004.
    • F&B-index for branching: Raghav Kaushik, Philip Bohannon, Jeffrey F. Naughton, and Henry F. Korth. Covering indexes for branching path queries. Sigmod 2002.
    • F&B-index: Wei Wang, Hongzhi Wang, Hongjun Lu, Haifeng Jiang, Xuemin Lin, and Jianzhong Li. Efficient Processing of XML Path Queries Using the Disk-based F&B Index. VLDB 2005.
    • A. Halverson et al. Mixed mode XML query processing. VLDB 2003.
  • Index for Structural Matching:
    • Structural joins
      • Flavio Rizzolo and Alberto Mendelzon. Indexing XML Data with ToXin. WebDB 2001.
      • Quanzhong Li and Bongki Moon. Indexing and querying XML data for regular path expressions. VLDB 2001.
      • Shurug Al-Khalifa, H.V. Jagadish, Nick Koudas, Jignesh M. Patel, Divesh Srivastava and Yuqing Wu. Structural Joins: A primitive for efficient XML query pattern matching. ICDE 2002.
      • Shu-yao Chien, Zografoula Vagena, Donghui Zhang, Vassilis J. Tsotras, Carlo Zaniolo. Efficient structural joins on indexed XML documents. VLDB 2002.
      • Haifeng Jiang, Hongjun Lu, Wei Wang, and Beng Chin Ooi. XR-Tree: Indexing XML data for efficient structural joins. ICDE 2003.
      • Wei Wang, Haifeng Jiang, Hongjun Lu, and Jeffrey Xu Yu. PBiTree Coding and Efficient Processing of Containment joins. ICDE 2003.
      • C. Zhang, J. Nauthton, D. Dewitt, Q. Luo, and G. Lohman. On supporting containment queries in relational database management systems. Sigmod 2001.
    • Holistic twig joins
      • Nicolas Bruno, Nick Koudas, and Divesh Srivastava. Holistic Twig Joins: Optimal XML pattern matching. Sigmod 2002. (First paper on Holistic Twig Joins)
      • H. Jiang, W. Wang, H. Lu, J. Yu. Holistic Twig joins on indexed XML documents. VLDB 2003.
      • H. Jiang, W. Wang, and H. Lu. Efficient processing of XML Twig queries with or-predicates. Sigmod 2004.
      • Ting Chen, Jiaheng Lu and Tok Wang Ling. On Boosting Holism in XML Twig Pattern Matching Using Structural Indexing Techniques. VLDB 2005.
      • B. Yang, M. Fontoura, E. Shekita, S. Rajagopalan, and K. S. Beyer. Virtual cursors for XML joins. CIKM, 2004.
      • Marcus Fontoura, Vanja Josifovski, Eugene Shekita, and Beverly Yang. Optimizing cursor movement in holistic Twig joins. CIKM 05.
  • Indexing Structure and Values
    • B. Cooper, N. Sample, M.J.Franklin, G.R.Hjaltason, M. Shadmon: A Fast Index for Semistructured Data. VLDB 2001, pp. 341-350. (Must-read)
    • Haixun Wang, Sanghyun Park, Wei Fan, Philip S. Yu. ViST: A dynamic index method for querying XML data by tree structures. Sigmod 2003.
    • Raghav Kaushik, Rajasekar Krishnamurthy, Jeffrey F. Naughton, and Raghu Ramakrishnan. On the integration of structure indexes and inverted lists. Sigmod 2004.

Up to top

Query Execution

  • Survey: G. Graefe.  Query Evaluation Techniques for Large Databases.  ACM Computing Surveys 25(2), 1993, pp. 73-170.  [Website] [Slides] (Must-read)

  • Raghu Ramakrishnan and Johannes Gehrke. Database Management Systems (2nd ed.) Chpt 11 (sorting) Chpt 12 (Evaluation). (Must-read).

  • A. Ailamaki, D. J. DeWitt, M. D. Hill, D. A. Wood: DBMSs on a Modern Processor: Where Does Time Go? VLDB 1999. (Must-read)

  • J. Hellerstein, P. Hass, H. Wang: Online Aggregation. SIGMOD 1997.

  • Chris Nyberg, Tom Barclay, Zarka Cvetanovic, Jim Gray, and David Lomet. AlphaSort: A RISC Machine Sort. SIGMOD'1994 (SIGMOD best paper)
  • Anastassia Ailamaki, David DeWitt, Mark Hill, and Marios Skounakis. Weaving relations for cache performance. VLDB'2001 (VLDB best paper)

Up to top

Query Optimization

  • Survey
    • S. Chaudhuri.  An Overview of Query Optimization in Relational Systems.  PODS 1998. (Must-read)
    • Y. Ioannidis.  Query Optimization Handbook for Computer Science (CRC Press), chapter 45.
  • Classical Systems
    • P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, T. Price.  Access Path Selection in a Relational Database Management System.  SIGMOD 1979.  (Must-read)
    • L. M. Haas, J. C. Freytag, G. M. Lohman, H. Pirahesh.  Extensible Query Processing in Starburst. SIGMOD 1989.  (Must-read)
    • G.  Graefe, W. J. McKenna: The Volcano Optimizer Generator: Extensibility and Efficient Search. ICDE 1993.  (Must-read)
  • Query optimization for data integration
    • Laura M. Haas, Donald Kossmann, Edward L. Wimmers, Jun Yang: Optimizing queries across diverse data sources. VLDB 1997.
    • Serge Abiteboul, Hector Barcia-Molina, Yannis Papakonstantinou, Ramana Yerneni: Fusion queries over Internet databases. EDBT 1998.
  • Distributed query optimization
    • D. Kossmann, K. Stocker: Iterative dynamic programming: a new class of query optimization algorithms. TODS 25(1): 43-82 (2000).
    • L. F. Mackert, G. M. Lohman: R* Optimizer Validation and Performance Evaluation for Distributed Queries. VLDB 1986.

Up to top

Database Statistics

  • Survey

    • Barbara, W. DuMouchel, C. Faloutsos, P.J. Haas, J.M. Hellerstein, Y. Ioannidis, H.V. Jagadish, T. Johnson, R. Ng, V. Poosala, K.A. Ross, K.C. Sevcik.  The New Jersey Data Reduction Report.  Data Engineering Bulletin 20(4), 1997, pp. 3-45. (Must-read)

  • Statistical Models
    • Projection Estimation
      • Rafiul Ahad, K.V.Bapa Rao and Dennis Mcleod: On estimating the cardinality of the projection of a database relation. ACM Transactions on Databases, 14(1), 28-40, 1989.
      • Jane Fedorowicz: Database evaluation using multiple regression techniques. 1984.
    • Selectivity Estimation
      • Volker Markl, Nimrod Megiddo, Marcel Kutsch, Tam Minh Tran, Peter J. Haas, Utkarsh Srivastava: Consistently Estimating the Selectivity of Conjuncts of Predicates. VLDB 05.
      • Lise Getoor, Benjamin Taskar, Daphne Koller: Selectivity Estimation using Probabilistic Models. SIGMOD 01.
      • Viswanath Poosala and Yannis E. Ioannidis: Selectivity estimation without the attribute value independence assumption. VLDB 1997.
    • Join-size Estimation
      • Y. E. Ioannidis, S. Christodoulakis. On the Propagation of Errors in the Size of Join Results.  SIGMOD 1991. (Must-read)
      • Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy: Join Synopses for Approximate Query Answering. SIGMOD 99.
      • Noga Alon, Phillip B. Gibbons, Yossi Matias, and Mario Szegedy: Tracking join and self-join sizes in limited storage. PODS 1999.
      • Wei Sun, Yibei Ling, Naphtali Rishe and Yi Deng: An instant and accurate size estimation method for joins and selection in a retrieval-intensive environment. Sigmod 1993.
      • Allen Van Gelder: Multiple join size estimation by virtual domains. PODS 1993.
      • Arun Swami and K. Bernhard Schiefer: On the estimation of join result sizes.
      • Stavros Christodoulakis: Estimating block transfers and join sizes. 1983.
  • Histogram

    • Yannis E. Ioannidis: The History of Histograms (abridged). VLDB, 03. (10-year award paper; Must-read).
      (The original paper from VLDB 93: Yannis E. Ioannidis: Universality of Serial Histograms. VLDB, 93.)
    • Amol Deshpande, Minos N. Garofalakis, Rajeev Rastogi: Independence is Good: Dependency-Based Histogram Synopses for High-Dimensional Data. SIGMOD 01.

    • Ashraf Aboulnaga and Surajit Chaudhuri: Self-tuning histograms: building histograms without looking at data. 1999.

    • Phillip B. Gibbons, Yossi Matias and Viswannath Poosala: Fast incremental maintenance of approximate histograms. VLDB 1997.

    • Viswannath Poosala, Yannis E. Ioannidis, Peter J. Haas and Eugene J. Shekita. Improved histograms for selectivity estimation of range predicates. 1996.

    • M. Muralikrishna and David J Dewitt: Equi-depth histograms for estimating selectivity factors for multi-dimensional queries. 1988.

    • Clifford A. Lynch: Selectivity estimation and query optimization in large databases with highly skewed distributions of column values. VLDB 1988.

    • Gregory Platetsky-Shapiro and Charles Connell: Accurate estimation of the number of tuples satisfying a condition. 1984.

  • Sampling

    • Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya: On Random Sampling over Joins. SIGMOD 99.

    • Frank Olken. Random Sampling from Databases. Ph.D. Thesis, 1993.
    • Sumit Ganguly, Phillip B. Gibbons, Yossi Matias and Avi Silberschatz: Bifocal sampling for skew-resistant join size estimation. 1996.

    • Peter J. Haas and Arun N. Swami: Sampling-based selectivity estimation for joins using augmented frequent value statistics. 1995.

    • Peter J. Hass, Jeffrey F. Naughton and Arun N. Swami. On the relative cost of sampling for join selectivity estimation. Sigmod 1994.

    • Peter J. Haas and Arun N. Swami: Sequential sampling procedures for query size estimation. Sigmod 1992.

    • Richard J. Lipton, Jeffrey F. Naughton and Donovan A. Schneider: Practical selectivity estimation through adaptive sampling. 1992.

    • Wen-chi Hou and Gultekin Ozsoyoglu: Statistical estimators for aggregate relational algebra queries. ACM Transactions on Database Systems, 16(4), 600-654, 1991.

  • Adaptive Estimation

    • Chung-Min Chen and Nick Roussopoulos. Adaptive selectivity estimation using query feedback. SIGMOD 94'(Must-read)
    • J. M. Hellerstein, V. Ramman, and B. Ramman. Online Dynamic Reordering for Interactive Data Processing. VLDB 99'.
  • Modeling Uncertainty in Estimation
    • Brian Babcock, Surajit Chaudhuri: Towards a Robust Query Optimizer: A Principled and Practical Approach. SIGMOD Conference 2005: 119-130

Up to top

Adaptive Query Optimization

  • Survey

    • J.M. Hellerstein, M.J. Franklin, S. Chandrasekaran, A. Deshpande, K. Hildrum, S.  Madden, V.  Raman, M.A. Shah. Adaptive Query Processing: Technology in Evolution. IEEE Data Engineering Bulletin 23(2), 2000, pp. 7-18. (Must-read)

  • Systems

    • Z.G. Ives, A.Y. Levy, D.S. Weld, D. Florescu, M. Friedman. Adaptive Query Processing for Internet Applications.  IEEE Data Engineering Bulletin 23(2), 2000. (Must-read)
    • J. Naughton, D. DeWitt, D. Maier, J. Chen, L. Galanis, K. Tufte, J. Kang, Q. Luo, N. Prakash, F. Tian, J. Shanmugasundaram, C. Zhang, R. Ramamurthy, B. Jackson, Y. Wang, A. Gupta, R. Chen. The Niagara Internet Query System.
  • Late Binding

    • Waqar Hasan and Hamid Pirahesh. Query Rewrite Optimization in Starburst. Research Report RJ 6367 , IBM Almaden Research Center, August 1988.
    • Goetz Graefe and Karen Ward. Dynamic query evaluation plans. SIGMOD 89'.
    • Goetz Graefe: Dynamic Query Evaluation Plans: Some Course Corrections? IEEE Data Engineering Bulletin 23(2): 3-6 (2000)
    • Yannis E. Ioannidis, Raymond T. Ng, Kyuseok Shim, and Timos K. Sellis. Parametric query optimization. VLDB 92'.
    • Richard L. Cole, Goetz Graefe: Optimization of Dynamic Query Evaluation Plans. SIGMOD 94'.
    • S. Adali, K. Candan, Y. Papakonstantinou, and V. Subrahmanian. Query caching and optimization in distributed mediator systems. SIGMOD 96'.
    • L. Liu and C. Pu. A dynamic query scheduling framework for distributed and evolving information systems. In The IEEE Int. Conf. on Distributed Computing Systems (ICDCS-17), Baltimore, 1997.
  • Competition
    • Michael Stonebraker, Paul M. Aoki, Avi Pfeffer, Adam Sah, Jeff Sidell, Carl Staelin, and Andrew Yu. Mariposa: A Wide-Area Distributed Database System. VLDB Journal, 5(1):48–63, January 1996.
    • Joseph M. Hellerstein, Michael Stonebraker, and Rick Caccia. Open, independent enterprise data integration. IEEE Data Engineering Bulletin, 22(1), March 1999.
    • Gennady Antoshenkov and Mohamed Ziauddin. Query Processing and Optimization in Oracle Rdb. VLDB 96'.
    • Gennady Antoshenkov. Dynamic query optimization in Rdb/VMS. In ICDE '93.
  • Dynamic Plans Based on Adaptive Size Estimation and Sampling
    • Michael Stonebraker, Eugene Wong, Peter Kreps, Gerald Held: The Design and Implementation of INGRES. TODS 1(3): 189-222, 1976.
    • M. A. Derr. Adaptive Query Optimization in a Deductive Database System. In Proc. of International Conference on Inforamtion and Knowledge Management. 1993.
    • Navin Kabra, David J. DeWitt: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans. SIGMOD 98'. (must-read)
    • K. W. Ng, Z. Wang, R. R. Muntz, and S. Nittel. Dynamic Query Re-Optimization. Proc. Conf. on Scientific and Statistical Database Management. 1999.
    • Richard L. Cole: A Decision Theoretic Cost Model for Dynamic Plans. IEEE Data Engineering Bulletin 23(2): 34-41, 2000
    • Zachary G. Ives, Alon Y. Levy, Daniel S. Weld: Convergent Query Processing. VLDB '02. (must-read)
  • Pipelined Operators
    • A. N. Wilschut and P. M. G. Apers. Dataflow Query Execution in a Parallel Main-Memory Environment. In Proc. First International Conference on Parallel and Distributed Info. Sys. (PDIS), pages 68–77, 1991.
    • Tolga Urhan and Michael Franklin. XJoin: A Reactively-Scheduled Pipelined Join Operator. IEEE Data Engineering Bulletin, 2000.
    • Joseph M. Hellerstein, Peter J. Haas, and Helen J. Wang. Online Aggregation. SIGMOD 97'. 
    • Peter J. Haas and Joseph M. Hellerstein. Ripple Joins for Online Aggregation. SIGMOD 99'.
    • Joseph M. Hellerstein, Ron Avnur, Andy Chou, Christian Hidber, Chris Olston, Vijayshankar Raman, and Peter J. Haas Tali Roth. Interactive Data Analysis: The Control Project. IEEE Computer, 32(8):51–59, August 1999.
    • Zachary G. Ives, Daniela Florescu, Marc Friedman, Alon Y. Levy, Daniel S. Weld: An Adaptive Query Execution System for Data Integration. SIGMOD Conference 1999: 299-310
  • Scheduling
    • G. Thomas, G. Thompson, C. Chung, E. Barkmeyer, F. Carter, M. Templeton, S. Fox, and B. Hartman. Heterogeneous distributed database systems for product use.  ACM Computing Surveys, 22(3), 1990.
    • F. Ozcan, S. Nural, P. Koksal, C. Evrendilek, and A. Dogac. Dynamic query optimization on a distributed object management platform. In Conference on Information and Knowledge Management, Baltimore, Maryland, November 1996.
    • Laurent Amsaleg, Michael J. Franklin, Anthony Tomasic, and Tolga Urhan. Scrambling Query Plans to Cope With Unexpected Delays. In 4th International Conference on Parallel and Distributed Information Systems (PDIS), Miami Beach, December 1996.
    • Tolga Urhan, Michael J. Franklin, Laurent Amsaleg: Cost Based Query Scrambling for Initial Delays. SIGMOD '98.
    • Luc Bouganim, Francoise Fabret, C. Mohan, Patrick Valduriez: Dynamic Query Scheduling in Data Integration Systems. ICDE 00'.
    • Ron Avnur, Joseph M. Hellerstein: Eddies: Continuously Adaptive Query Processing. SIGMOD 00'.
  • Distributed System
    • Remzi H. Arpaci-Dusseau, Eric Anderson, Noah Treuhaft, David E. Culler, Joseph M. Hellerstein, David A. Patterson, and Katherine Yelick. Cluster I/O with River: Making the Fast Case Common. In Sixth Workshop on I/O in Parallel and Distributed Systems (IOPADS ’99), pages 10–22, Atlanta, May 1999.
  • Memory Adaptive Sorting and Hashing
    • Masaya Nakayama, Masaru Kitsuregawa, and Mikio Takagi. Hash-partitioned join method using dynamic destaging strategy. VLDB 88'.
    • Hansj¨ org Zeller and Jim Gray. An adaptive hash join algorithm for multiuser environments. VLDB 90'
    • HweeHwa Pang, Michael J. Carey, and Miron Livny. Partially preemptive hash joins. SIGMOD 93'.
    • HweeHwa Pang, Michael J. Carey, and Miron Livny. Memory-adaptive external sorting. VLDB 93'.
    • Weiye Zhang and Per- ° Ake Larson. Dynamic memory adjustment for external mergesort. VLDB 97'
  • Partial, incomplete, and approximate answers

    • Ling Liu, Calton Pu, Wei Tang: Continual Queries for Internet Scale Event-Driven Information Delivery. TKDE 11(4): 610-628 (1999).
    • Jayavel Shanmugasundaram, Kristin Tufte, David J. DeWitt, Jeffrey F. Naughton, David Maier: Architecting a Network Query Engine for Producing Partial Results. WebDB 00'.
    • Philippe Bonnet, Anthony Tomasic: Partial Answers for Unavailable Data Sources. FQAS 1998
  • XML

    • Mehmet Altinel, Michael J. Franklin: Efficient Filtering of XML Documents for Selective Dissemination of Information. VLDB 00.
    • Zachary G. Ives, Alon Y. Halevy, Daniel S. Weld. An XML Query Engine for Network-Bound Data. Submitted for publication, 2000.

Up to top

View Selection

  • J. Ullman, V. Harinarayan, A. Rajaraman. Implementing Data Cubes Efficiently. SIGMOD 1996.
    (Sigmod best-paper award)
  • S. Agrawal, S. Chaudhuri, V. Narasayya. Automated Selection of Materialized Views and Indexes for SQL Databases. VLDB 2000.
  • R. Chirkova, A. Halevy, D. Suciu. A Formal Perspective on the View Selection Problem. VLDB 2001.
  • H. Mistry, P. Roy, S. Sudarshan, and K. Ramamritham. Materialized view selection and maintenance using multi-query optimization.

    Up to top

View Matching and Semantic Caching
(v.s. Theoretical Foundation: Query Containment and Answering Queries Using Views)

  • Semantic Caching
    • S. Dar, M. J. Franklin, B. T. Jonsson, D. Srivastava and M. Tan. Semantic data cleaning and replacement. VLDB 1996. (The first semantic caching paper, must-read)
    • P. Larson, J. Goldstein, J. Zhou. MTCache: Transparent mid-tier database caching in SQL server. ICDE 2004.
  • View Matching
    • Matching with A Single View
      • First paper: P. Larson and H.Z.Yang. Computing queries from derived relations. VLDB 1985.
      • "Full-version" solution: J. Goldstein and P. Larson. Optimizing queries using materialized views: a practical scalable solution. Sigmod 2001. (Must-read)
      • Outer-join: P. Larson and J. Zhou. View matching for outer-join views. VLDB 2005.
      • Aggregation: A. Gupta, V. Harinarayan and D. Quass. Aggregate query processing in data warehousing environments. VLDB 1995.
      • Aggregation: D. Srivastava, S. Dar, H.V. Jagadish, A. Levy. Answering queries with aggregation using views. VLDB 1996. (Must-read)
      • Constraints for extra tables: J. Chang and S. Lee. Query reformulation using materialized views in data warehousing environment. DOLAP 1998.
      • Stacked view: D. Dehaan, P. Larson, and J. Zhou. Stacked indexed views in microsoft SQL server. Sigmod 2005.
    • Matching with Multiple Views
      • Rewriting to a conjunctive query: F. N. Afrati, C. Li, and J. D. Ullman. Generating efficient plans for queries using views. Sigmod 2001. (Must-read)
      • Rewriting to a disjunctive query: H. Z. Yang and P. Larson. Query transformation for PSJ queries. VLDB 1987.
      • Aggregation: C. Park, M. H. Kim and Y. J. Lee. Rewriting OLAP queries using materialized views and dimension hierarchies in data warehouses. ICDE 2001. (Must-read)
  • Semantic Caching in Industry
    • System-R: S. Chaudhuri, S. Krishnamurthy, S. Potamianos, K. Shiim. Optimizing queries with materialized views. ICDE 1995.
    • Oracle: R.G. Bello, K. Dias, J. Feenan, J. Finnerty, W.D. Norcott, H. Sun, A. Witkowski, M. Ziauddin. Materialized views in Oracle. VLDB 1998, 659-664.
    • DB2: M. Zaharioudakis, R. Cochrane, G. Lapis, H. Pirahesh, M. Urata. Answering complex SQL queries using automatic summary tables. Sigmod 2000, 105-116.
  • View Matching for XML
    • Bhushan Mandhani and Dan Suciu. Query caching and view selection for XML databases. VLDB 2005.

    Up to top

View Maintenance

  • Kenneth Salem, Kevin S. Beyer, Bruce Lindsay, Roberta Cochrane. How To Roll a Join: Asynchronous Incremental View Maintenance. Sigmod 2000.
  • P. Mork. Managing change in large-scale data sharing systems. Tech report.

    Up to top

View Updates

  • Survey

    • A. Furtado and M. Casanova. Updating relational views. In Query Processing in Database Systems, pages 127-144, Springer-Verlag, New York, NY, 1985. (must-read)

    • S. J. Kaplan and J. Davidson. Interpreting natural language database updates. In Proc. of 19th Annual Meeting of the Association for Computational Linguistics, Stanford, California, June 1981. 

  • Relational View Update Theory

    • F. Bancilhon and N. Spyratos. Update semantics of relational views. ACM Transactions on Database Systmes, 6(4):557-575, December 1981. (must-read)

    • S. S. Cosmadakis and C. H. Papadimitriou. Updates of relational views. J.ACM, 31(4):742-760, October 1984.

    • A. M. Keller and J. D. Ullman. On complementary and independent mappings on databases. In Proc. of the 3rd ACM SIGMOD Int. Conf. on Management of Data, Boston, June1984.

    • S. J. Hegner. Canonical view update support through boolean algebras of components. In Proc. of the 3rd ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, pages 163-172, April 1984.

    • P. Buneman, S. Khanna, W. Tan. On propagation of Deletions and Annotations Through Views. In Proc. of the 21st ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, 2002.

  • Relational View Updates through Abstract data types

    • K. C. Sevcik and A. L. Furtado. Complete and compatible sets of update operations. In Intl. Conf. on Management of Data (ICMOD), Milan, Italy, 1978

    • L. A. Rowe and K. A. Shoens. Data abstraction, views and updates in RIGEL. In Proc. of ACM-SIGMOD Int’l Conf. on Management of Data, pages 214-225, 1979. 

    • A. Tomasic. View update translation via deduction and annotation. In ICDT'88 (Second International Conference on DataBase Theory), pages 338--352, 1988.

    • A. Tomasic. Correct view update translation via containment. Stanford University Computer Science Technical Note STAN-CS-TN-93-3, 1993.

  • Relational View Updates through Syntax and Semantics Analysis

    • U. Dayal and P. A. Bernstein. On the updatability of relational views. In Proc. of 4th Int’l Conf. on Very Large Data Base, pages 368-377, 1978.

    • U. Dayal and P. A. Bernstein. On the correct translation of update operations on relational views. ACM Transactions on Database Systems, 3(3):381-416, September 1982. (must-read)

    • Y. Masunaga. A  relational database view update translation mechanism. In Proc. of 10th Int’l. conf. on Very Large Data Bases, pages 309-320, Singapore, 1984.

    • A. M. Keller. Algorithms for translating view updates to database updates. In Proc. of the 4th ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, March 1985. (must-read)

    • A. M. Keller. Choosing a view update translator by dialog at view definition times. In Proc. of the 12th International Conference on Very Large Data Bases, pages 467-474, Kyoto, Japan, 1986.

    • T. W. Ling and M. L. Lee. A theory for entity-relationaship view updates. 

  • Relational View Updates through constraint satisfaction

    • H. Shu. Using constraint satisfaction for view update translation. In 13th European Conference on Artificial Intelligence, 1998.

  • Object-oriented View Updates

    • T. Barsalou, N. Siambela, A. M. Keller, and G. Wiederhold. Updating relational databases through object-based views. In Proc. of the 10th ACM SIGACT-SIGMOD Symposium on Principles of Database  Systems, 1991. (must-read)
    • J. Chen. Update multidatabase through object views. MS thesis, Iowa State University, 1997. 
  • Updates on XML Data
    • S. Abiteboul. On views and XML. In Proc. ACM Symp. on the Principles of Database Systems, 1999.

    • A. Salminen and F. W. Tompa. Requirements for XML document database systems. In Proc. ACM DocEng, pages 85-94, 2001

    • P. Lehti. Design and implementation of a data manipulation processor for an XML Query Language. Technical Report, Technische Universitat Darmstadt, 2001. Report KOM-D-149. (must-read)

    • P. Lehti and P. Fankhauser. Towards type safe updates in XQuery. Technical Notes, 2002. World Wide Web:

    • M. Rys. Proposal for an XML data modification language. Microsoft Report, 2002. (must-read)

    • I. Tatarinov, Z. G. Ives, A. Y. Halevy, D. S. Weld. Updating XML. In Proc. of the 20th ACM-SIGMOD Int’l Conf. on Management of Data, 2001. (must-read)

Up to top

Data Warehousing and OLAP

  • Survey
    • S. Chaudhuri, U. Dayal.  An Overview of Data Warehousing and OLAP Technology. SIGMOD Record 26(1), 1997, pp. 65-74. (Must-read)
  • New SQL Constructs
    • J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, H. Pirahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery 1997. (Must-read)
    • D. Chatziantoniou, K. Ross. Groupwise Processing of Relational Queries. Proceedings of the 1997 VLDB Conference.
    • Ralph Kimball. Why Decision Support Fails and How To Fix It. SIGMOD 1995.
  • Benchmark
    • M. Poess, C. Floyd.  New TPC Benchmarks for Decision Support and Web Commerce.  SIGMOD Record, 29(4).

Up to top

Data Cleaning

  • H. Galhardas, D. Florescu, D. Shasha, E. Simon, C. Saita. Declarative data cleaning: language, model, and algorithms. VLDB 2001. Look under Session R12.
  • V. Raman, J. Hellerstein. Potter's wheel: an interactive data cleaning system. VLDB 2001. Session R12.

    Up to top

Data Mining

  • J. Han, M. Kamber.  Data Mining Concepts and Techniques.  Chapter 1 (Introduction), Chapter 7 (Classification), Chapter 8 (Clustering).
  • Association Rules
    • Fast Algorithms for Mining Association Rules, Agrawal and Srikant; VLDB 94. (must-read)
    • R. Agrawal, T. Imielinski, A. N. Swami. Mining Association Rules between Sets of Items in Large Databases. SIGMOD 1993.
    • Query Flocks: A Generalization of Association-Rule Mining Dick Tsur, Jeffrey D. Ullman, Serge Abiteboul, Chris Clifton, Rajeev Motwani, Svetlozar Nestorov, Arnon Rosenthal; SIGMOD 98.
  • Classification
    • PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning Rastogi, Shim; VLDB 1998
    • S. Chaudhuri, U. Fayyad, J. Bernhardt. Scalable Classification over SQL Databases. SIGMOD 1999.
  • Clustering: CURE: An Efficient Clustering Algorithm for Large Databases Guha, Rastogi, Shim; SIGMOD 98.
  • Data Mining Methodology: Integrating Mining with Relational Database Systems: Alternatives and Implications Sunita Sarawagi, Shiby Thomas, Rakesh Agrawal; SIGMOD 98.
  • Latent Semantic Indexing: Latent Semantic Indexing: A Probablistic Analysis Christos Papadimitriou, Prabhakar Raghavan, Hisao Tamaki; Preliminary version in PODS 98
  • Extending naive Bayes classifiers using long itemsets Meretakis and Wuthrich; KDD 99.
  • Squashing flat files flatter DuMouchel et al; KDD 99.
  • SQL with Data Mining Primitives
    • A. Netz, S.Chaudhuri, U. Fayyad, J. Bernhardt. Integrating Data Mining with SQL Databases: OLE DB for DAta Mining. IEEE ICDE 2001.
    • S. Chaudhhuri, V. Narasayya, S. Sarawagi. Efficient Evaluation of Queries with Mining Predicates. IEEE ICDE 2002.

Up to top

Semi-structured and XML Data

  • XML
  • XML Query
    • S. Boag, D. Chamberlin, M.F. Fernandez, D. Florescu, J. Robie, J. Simeon, M. Stefanescu.  XQuery:  An XML Query Language.  W3C working draft.
    • P. Wadler. A formal semantics of patterns in XSLT. Submitted to BSL
    • XQuery Tutorial
    • P. Wadler. Two semantics of XPath.  
  • XML Publishing
    • P. Bohannon, S. Ganguly, H. F. Korth, P. P. S. Narayan, and P. Shenoy. Optimizing view queries in ROLEX to support navigable result trees. In Proc. of the 28th Int’l. Conf. on Very Large Data Bases, 2002.
    • M. Fernández, Y. Kadiyska, D. Suciu,A Morishima, and W. Tan. SilkRoute: a framework for publishing relational data in XML. TODS 27(4), 2002. (Must-read)
    • J. Shanmugasundaram, E. Shekita, R. Barr, M. Carey, B. Lindsay, H. Pirahesh, B. Reinwald. Efficiently Publishing Relational Data as XML Documents. VLDB 2000. (Must-read)
    • M. Fernandez, A. Morishima, D. Suciu. Efficient Evaluation of XML Middle-ware Queries. SIGMOD 2001.
  • XML Storage
    • Survey: Florescu, Kossmann, A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database, INRIA Technical Report
    • Use DTD to derive schema: J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D. J. DeWitt, J. F. Naughton.  Relational Databases for Querying XML Documents: Limitations and Opportunities. VLDB 1999. (Must-read)
    • Use Generic schema: D. Florescu, D. Kossmann. Storing and Querying XML Data Using an RDBMS. IEEE Data Engineering Bulletin, 1999.
    • Use the Path Table: M. Yoshikawa, T. Amagasa, T. Shimura, S. Uemura. XRel: A Path-based Approach to Storage and Retrieval of XML Documents Using Relational Databases. ACM Transactions on Internet Technology, 2001.
    • Use Data Mining to derive schema: A. Deutsch, M. Fernandez, D. Suciu. Storing Semistructured Data with STORED. SIGMOD 1999
  • DBMS for semistructured data
    • M. F. Fernandez, D. Florescu, J. Kang, A. Y. Levy, D. Suciu. Catching the Boat with Strudel: Experiences with a Web-Site Management System. SIGMOD 1998. (Must-read)
    • McHugh, Abiteboul, Goldman, Quass, and Widom: Lore: A Database Management System for Semistructured Data. SIGMOD Record, September 1997
    • Goldman, McHugh, Widom. From Semistructured Data to XML: Migrating the Lore Data Model and Query Language. WebDB '99
  • XML normal form
  • XML indexing
  • Typechecking and type inference
    • Milo, Suciu: Type Inference for Queries on Semistructured Data, PODS 2000.
  • XML data compression
    • Liefke, Suciu: XMill: an Efficient Compressor for XML Data. SIGMOD 2000.
  • XML database systems
    • Jeffrey Nauthton et al. The Niagara Internet Query System. IEEE Bulletin 2001.
    • H. V. Jagadish et al. TIMBER: A Native XML Database. VLDBJ 2003.

Up to top

Transaction Processing

  • Overview:
    • P. A. Bernstein, E. Newcomer.  Principles of Transaction Processing, 2nd ed. Chapter 1 (Introduction), Chapter 2 (Transaction Processing Monitors).
    • R. Ramakrishnan and J. Gehrke.  Database Management Systems, 2nd ed., Chapters 18 (Transaction Management Overview).
  • Two-phase Commit (A--Atomicity):
    • P. A. Bernstein, E. Newcomer.  Principles of Transaction Processing, 2nd ed. Chapter 9 (Two-Phase Commit). (Must-read)
  • Locking and Concurrency Control (I--Isolation)
    • P. A. Bernstein, E. Newcomer.  Principles of Transaction Processing, 2nd ed. Chapter 6 (Locking). (Must-read)
    • R. Ramakrishnan and J. Gehrke.  Database Management Systems, 2nd ed., Chapters 19 (Concurrency Control).
    • Axel Moenkeberg and Gerhard Weikum: Performance Evaluation of an Adaptive and Robust Load Control Method for the Avoidance of Data-Contention Thrashing. VLDB 1992. (VLDB 10-year best paper)
    • Gerhard Weikum, Axel Moenkeberg, Christof Hasse, Peter Zabback: Self-tuning Database Technology and Information Services: from Wishful Thinking to Viable Engineering. 2002. (10-year retrospective paper, Must-read)
  • Recovery (A--Atomicity & D--Durability)
    • P. A. Bernstein, E. Newcomer.  Principles of Transaction Processing, 2nd ed. Chapter 8 (Database System Recovery). (Must read)
    • R. Ramakrishnan and J. Gehrke.  Database Management Systems, 2nd ed., Chapters 20 (Recovery).
    • David Lomet and Gerhard Weikum: Efficient Transparent Application Recovery in Client/Server Information Systems. SIGMOD'1998. (Sigmod best paper)

Up to top

Security

  • Survey
    • Adam, Wortmann. Security-control methods for statistical databases: a comparative study. (A survey of the main techniques for protecting against disclosure of confidential information in a statistical database: conceptual, query restriction, data perturbation, output perturbation.)
    • Pinkas. Cryptographic Techniques for Privacy-Preserving Data Mining. SIGKDD Explorations. (Survey paper of results in secure multi-party computation and their relevance to data mining. )
  • Privacy
    • Alan Westin One-page article. Wall Street Journal, April 2000. (Brief article summarizing survey results on individual attitudes about privacy. )
    • Agrawal, Kiernan, Srikant, Xu.Hippocratic Databases. VLDB 2002. (A proposal for a database system that respects the privacy of individuals who contribute data to the database. Includes a list of key properties and challenges of a Hippocratic database system. )
    • L. Sweeney. Uniqueness of Simple Demographics in the U.S. Population, LIDAP-WP4. Carnegie Mellon University, Laboratory for International Data Privacy, Pittsburgh, PA: 2000. (Empirical study of census data attempting to extract information on individuals from aggregate values.)
    • Agrawal, Evfimievski, Srikant. Information Sharing Across Private Databases. SIGMOD 2003
    • Agrawal, Srikant.Privacy-Preserving Data Mining. SIGMOD 2000 : 439-450
    • Evfimievski, Srikant, Agrawal, Gehrke.Privacy preserving mining of association rules. KDD 2002
    • Jon M. Kleinberg, Christos H. Papadimitriou, Prabhakar Raghavan: Auditing Boolean Attributes. PODS'2000.
  • Data Authenticity
    • Goodrich, Tamassia, Triandopoulos, Cohen. Authenticated Data Structures for Graph and Geometric Searching. Technical Report 2001
    • Prem Devanbu, Michael Gertz, Chip Martel, Stuart G. Stubblebine. Authentic Third-party Data Publication IFIP Conference on Database Security, 2000.
  • Cryptography
    • Song, Wagner, Perrig: Practical Techniques for Searches on Encrypted Data. IEEE Symposium on Security and Privacy, 2000. (Cryptographic techniques for secure search over list of values stored on untrusted server.)
    • Hacigumus, Iyer, Li, Mehrotra.Executing SQL over encrypted data in the database-service-provider model. SIGMOD 2002. (Techniques for query evaluation over an encrypted database stored on an untrusted server. )
    • Miklau, Suciu.Controlling Access to Published Data Using Cryptography. VLDB 2003.
    • Martin Abadi and Phillip Rogaway. Reconciling two views of cryptography (The computational soundness of formal encryption). {IFIP} International Conference on Theoretical Computer Science. 2000. (The first paper presents techniques for enforcing access control over published documents. The resulting encrypted documents are difficult to analyze using cryptographic techniques. The second paper contains some techniques related to this difficulty.)
    • Martin Abadi and Phillip Rogaway. Reconciling two views of cryptography (The computational soundness of formal encryption). {IFIP} International Conference on Theoretical Computer Science. 2000. (The first paper presents techniques for enforcing access control over published documents. The resulting encrypted documents are difficult to analyze using cryptographic techniques. The second paper contains some techniques related to this difficulty.)
  • Access Control
    • Bertino, Jajodia, Samarati. Database Security: Research and Practice. IS 20 (7) 1995. (Survey of access control models for relational databases including discretionary and mandatory access control models.)
    • T. Yu, D. Srivastava, L. Lakshmanan, and H. Jagadish. Compressed Accessibility Map: Efficient Access Control for XML. VLDB 2002
  • Watermark
    • Agrawal, Kiernan : Watermarking Relational Databases. VLDBJ 2003

Up to top

Distributed and Parallel Databases

  • Replication
    • David A. Patterson, Garth A. Gibson, and Randy H. Katz: A Case for Redundant Arrays of Inexpensive Disks (RAID). SIGMOD 1988. (Sigmod 10-year best paper)
    • Survey: Peter M. Chen, Edward K. Lee, Garth A. Gibson, Randy H. Katz, David A. Patterson. RAID: High-Performance, Reliable Secondary Storage. ACM Computing Surveys, 26(2), 1994. (Must-read)
    • Textbook: P. A. Bernstein, E. Newcomer.  Principles of Transaction Processing, 2nd ed. Chapter 10 (Replication).
    • J. Gray, P. Helland, P.E. O'Neill, D. Shasha.  The Dangers of Replication and a Solution.  SIGMOD 1996.
  • Distributed Database
    • Textbook: T. Oszu, P. Valduriez. Principles of Distributed Database Systems, 2nd ed.  Chapter 4 (Distributed Database Systems), pp. 82-99; Chapter 5 (Distributed Database Design), pp. 107-154, skimming examples, algorithms, and Section 5.4.3.
    • Survey: D. Kossman.  The State of the Art in Distributed Query Processing.  ACM Computing Surveys 32(4), 2000, pp. 418-469. (Must-read)
    • R. Braumandl, M. Keidl, A. Kemper, D. Kossmann, A. Kreutz, S. Seltzsam, K. Stocker: ObjectGlobe: Ubiquitous query processing on the Internet. VLDB Journal 10(1): 48-71 (2001).
    • M. Stonebraker, P. M. Aoki, R. Devine, W. Litwin, M. Olson. Mariposa: A New Architecture for Distributed Data. ICDE 1994.
  • Parallel Database
    • Textbook: T. Oszu, P. Valduriez. Principles of Distributed Database Systems, 2nd ed.  Chapter 13 (Parallel Database Systems), pp. 420-452.  (Must-read)
    • Survey: D. DeWitt and J. Gray.  Parallel Database Systems: The Future of High Performance Database Systems.  CACM 35(6), 85-98, 1992.
  • Distributed query optimization

    Up to top

Peer-to-peer System

  • Peer data management system
    • Alon Y. Halevy, Zachary G. Ives, Peter Mork, Igor Tatarinov: Piazza: data management infrastructure for semantic web applications. WWW 2003: 556-567. (Must-read)
    • Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suiu. What can Databases do for Peer-to-Peer? WebDB 2001. (Must-read)
  • Content-based network routing
    • S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker. A Scalable Content-Addressable Network. SIGCOMM 2001.
    • B. Y. Zhao, J. D. Kubiatowicz, A. D. Joseph. Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing. Berkeley TR UCB//CSD-01-1141, 2000.
  • Data migration/lookup
    • I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for Internet applications. SIGCOMM 2001.
    • S. Q. Zhuang, B. Y. Zhao, A. D. Joseph, R. H. Katz, J. Kubiatowicz. Bayeux: An Architecture for Scalable and Fault-tolerant Wide-Area Data Dissemination. International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV), 2001.
    • A. Rowstron, P. Druschel. Storage Management and Caching in PAST: A Large-scale, Persistent Peer-to-peer Storage Utility. SOSP 2001.
    • C. G. Plaxton, R. Rajaraman and A. W. Richa. Accessing nearby copies of replicated objects in a distributed environment. ACM Symposium on Parallel Algorithms and Architectures, 1997.
    • M. Stonebraker, R. Devine, M. Kornacker, W. Litwin, A. Pfeffer, A. Sah, C. Staelin. An Economic Paradigm for Query Processing and Data Migration in Mariposa. International Conference on Parallel and Distributed Information Systems, 1994.
    • J. Sidell, P.M. Aoki, A. Sah, C. Staelin, M. Stonebraker, A. Yu. Data Replication in Mariposa. ICDE 1996.
    • W. Litwin, M.A. Neimat, D. Schneider. LH* -- Linear hashing for Distributed Files. SIGMOD 1993.

    Up to top

Database and Information Retrieval

  • Surajit Chaudhuri, Raghu Ramakrishnan, Gerhard Weikum. Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? CIDR 2005.

Information Retrieval

  • Survey:
    • A. Singhal. Modern information retrieval: a brief overview. IEEE data engineering bulletin, special issue on text and databases, 24(4), 2001.
    • Christos Faloutsos and Douglas W. Oard. A survey of information retrieval and fitering methods. 1995.
  • TF/IDF: Thorsten Joachims. A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. 1996. (Must-read)
  • LSI
    • Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 1990. (Must-read)
    • Christos Papadimitriou et al. Latent Semantic Indexing: A Probabilistic Analysis. PODS 98
  • Google and PageRank: Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1-7), 1998. (Must-read)
  • Query-Answering System: Eric Brill, Susan Dumais, and Michele Banko. An analysis of the AskMSR question-answering system. 2002.
  • C. Faloutsos. Access Methods for Text. ACM Computing Surveys 17(1), 1985, pp. 49-74.
  • J. M. Kleinberg. Authoritative Sources in a Hyperlinked Environment. JACM 46(5), 1999, pp. 604-632.
  • Indexing
    • Ricardo Baeza-Yates and Berthier Ribeioro-Neto, eds, Modern Information Retrieval, Addison-Wesley, 1999. Chapter 8 (Indexing and Searching).
    • Inverted List and PAT Tree: G. Gonnet, Ricardo Baeza-Yates, and T. Snider: Lexicographical Indices for Text: Inverted Files vs. PAT Trees. Technical Report TR-OED-91-01, University of Waterloo, 1991
    • Suffix Array: Udi Manber and Gene Myers: Suffix Arrays: A New Method for On-Line String Searches. ACM-SIAM SODA, 1990
    • Multigram Index: Junghoo Cho and Sridhar Rajagopalan: A Fast Regular Expression Indexing Engine. ICDE 2001.

Up to top

Information Extraction

  • Oren Etzioni et al. Web-scale information extraction in KnowItAll. WWW, 2004.
  • G. Salton, editor. The SMART Retrieval System-Experiments in Automatic Document Retrieval. Prentice Hall Inc., Englewood Cli s, NJ, 1971.

Up to top

Approximate Queries in Databases

  • Keyword Search
    • Keyword search on RDBs
      • Qi Su and Jennifer Widom. Indexing relational database content offline for efficient keyword-based search. IDEAS 2005.
      • Vagelis Hristidis, Luis Gravano and Yannis Papakonstantinou. Efficient IR-style keyword search over relational databases. VLDB 2003.
      • Hristidis, Papakonstantinou. DISCOVER: Keyword Search in Relational Databases. VLDB, 2002.
      • Agrawal, Chaudhari, Das. DBExplorer: A System for Keyword-Based Search over Relational Databases. ICDE, 2002.
      • G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti and S. Sudarshan. Keyword searching and browsing in databases using BANKS. ICDE 2002.
      • Goldman, Shivakumar, Venkatsubramanian, Garcia-Molina. Proximity Searches in Databases. VLDB, 1998.
    • Keyword search on XML
      • Yu Xu and Yannis Papakonstantinou. Efficient keyword search for smallest LCAs in XML databases. Sigmod 2005.
      • Vagelis Hristidis, Yannis Papakonstantinou and Andrey Balmin. Keyword proximity search on XML graphs. ICDE 2003.
      • Michael Barg and Raymond K. Wong. Structural proximity searching for large collections of semi-structured data. CIKM 2001.
      • Search XML Collection: J. Naughton et al. The Niagara Internet query system. IEEE Data Engineering Bulletin, 24(2):27-33, 2001.
  • Full-text Search
    • RDB
      • U. Masermann and G. Vossen. Schema independent database querying (on and off the web). IDEAS 2000.
      • U. Masermann and Gottfried Vossen. Design and implementation of a novel approach to keyword searching in relational databases. ADBIS-DASFAA Symposium, 2000.
    • XML
      • Tutorial: Sihem Amer-Yahia, Jayavel Shanmugasundaram. XML Full-Text Search: Challenges and Opportunities. VLDB 2005. (Must-read)
      • Holger Florke, Norbert Fuhr, Kenji Hatano, Borkur Sigurbjornsson, Andrew Trotman, and Masahiro Watanabe. Queries, INEX 2003 working group report. 2004.
      • Theory result: Yaron Kanza and Yehoshua Sagiv. Flexible queries over semistructured data. PODS 2001.
      • TAG+Keyword: S. Cohen, J. Namou, Y. Kanza and Y. Sagiv. XSEarch: A semantic search engine for XML. VLDB 2003.
      • Path+Keyword: Norbert Fuhr and Kai Grobjohann. XIRQL: A query language for information retrieval in XML documents. SIGIR 2001.
      • Full-text search:
        • Vincent Aguilera, Sophie Cluet, Pierangelo Veltri, Dan Vodislav, and Fanny Wattez. Querying XML Documents in Xyleme.
        • Daniela Florescu, Donald Kossmann and Ioana Manolescu. Integrating keyword search into XML query processing. WWW 2000.
        • Shurug Al-Khalifa, Cong Yu, and H. V. Jagadish. Querying structured text in an XML database. Sigmod 2003.
        • Sihem Amer-Yahia, Mary Fernandez, Divesh Srivastava, and Yu Xu. PIX: A system for phrase matching in XML documents: a demonstration. Sigmod 2003.
        • Chavdar Botev, Sihem Amer-Yahia, and Jayavel Shanmugasundaram. Expressivemess and performance of full-text search languages.
        • Amer-Yahia, C. Botev, J. Shanmugasundaram. TeXQuery: A Full-Text Search Extension to XQuery. WWW 2004
        • Emiran Curtmola, Sihem Amer-Yahia, Philip Brown, and Mary Fernandez. GalaTex: A conformant implementation of the XQuery full-text language. Sigmod 2005.
  • Structural Proximity Search
    • RDB
      • Xiaoxin Yin, Jiawei Han and Jiong Yang. Searching for related objects in relational databases. SSDBM 2005.
    • XML
      • Relax label:
        • Anja Theobald and Gerhard Weikum. Adding relevance to XML. WebDB 2000.
        • Theobald, Weikum. The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking. EDT, 2002. (Must-read)
        • Anja Theobald. An ontology for domain-oriented semantic similarity search on XML data. BTW 2003.
      • Relax structure:
        • Yunyao Li, Cong Yu and H.V.Jagadish. Schema-free XQuery. VLDB 2004.
        • David Carmel, Yoelle S. Maarek, Matan Mandelbrod, Yosi Mass, and Aya Soffer. JuruXML: Searching XML Documents via XML Fragments. SIGIR 2003.
          (Full report: David Carmel, Nadav Efraty, Gad M. Landau, Yoelle S. Maarek, and Yosi Mass. An extension of the vector space model for querying XML document via XML fragments.)
        • Sihem Amer-Yahia, Laks V. S. Lakshmanan, and Shashank Pandit. FleXPath: Flexible structure and full-text querying for XML. Sigmod 2004.
        • Sihem Amer-Yahia, Nick Koudas, Amelie Marian, Divesh Srivastava, David Toman. Structure and content scoring for XML. VLDB 2005.
      • Natural language query:
        • Yunyao Li, Huahai Yang, H. V. Jagadish. NaLIX: an interactive natural language interface for querying XML. Sigmod 2005 best demo.
  • Approximate Search
    • Liang Jin, Nick Koudas, Chen Li and Anthony K. H. Tung. Indexig mixed types for approximate retrieval. VLDB 2005.
    • Luis Gravano, Panagiotis Ipeirotis, H. V. Jagadish, Nick Koudas, S. Muthukrishnan, and Divesh Srivastava. Approximate String Joins in a Database (Almost) for Free. VLDB, 2001.
    • William Cohen. Data Integration using Similarity Joins and a Word-based Information Representation Language. In ACM Transactions on Information Systems 18(3): 288-321 (2000)
    • Fagin. Fuzzy Queries in Multimedia Database Systems. PODS, 1998.
  • TOP-K Query
    • Ronald Fagin, Amnon Lotem, Moni Naor Optimal Aggregation Algorithms for Middleware. PODS, 2001 (Best-paper, must-read)
    • Ronald Fagin, Ravi Kumar, and D. Sivakumar. Efficient similarity search and classification via rank aggreagation. SIGMOD 2003.
  • Ranking
    • RDB:
      • Lin Guo, Jayavel Shanmugasundaram, Kevin S. Beyer, Eugene J. Shekita: Efficient Inverted Lists and Query Algorithms for Structured Value Ranking in Update-Intensive Relational Databases. ICDE 2005: 298-309
      • Xiaoxin Yin, Jiawei Han and Jiong Yang. Searching for related objects in relational databases. SSDBM 2005.
      • Agrawal, Chaudhari, Das, Gionis. Automated Ranking of Database Query results. CIDR, 2003 (Must-read)
    • XML:
      • Chavdar Botev, Jayavel Shanmugasundaram: Context-Sensitive Keyword Search and Ranking for XML. WebDB 2005: 115-120
      • Guo, Shao, Botev, Shanmugasundaram. XRANK: Ranked Keyword Search over XML Documents. Sigmod 2003. (Must-read)
    • Object-level:
      • Li Ding, Rong Pan, Tim Finin, Anupam Joshi, Yun Peng, and Pranam Kolari. Finding and ranking knowledge on the semantic web. ISWC 2005.
      • Zaiqing Nie, Yuanzhi Zhang, Ji-rong Wen, Wei-Ying Ma. Object-level ranking: bringing order to web objects. WWW 2005.
      • Andrey Balmin, Vagelis Hristidis and Yannis Papakonstantinou. ObjectRank: Authority-based keyword search in databases. VLDB 2004. (Must read)
    • Rank associations:
      • Kemafor Anyanwu, Angela Maduko, and Amit Sheth. SemRank: ranking complex relationship search results on the semantic web. WWW 2005.
      • Boanerges Aleman-Meza, Chris Halaschek, I. Budak Arpinar, and Amit Sheth. Context-aware semantic association ranking. SWDB 2003.
    • Vagelis Hristidis, Nick Koudas and Yannis Papakonstantinou. PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries. SIGMOD 2001.
  • Modeling Imprecision
    • Probabilistic databases: Nilesh Dalvi and Dan Suciu. Foundations of probabilistic answers to queries. Sigmod 2005. (Tutorial, must-read) [bibliography notes]
    • Jennifer Widom. Trio: A System for Integrated Management of Data, Accuracy, and Lineage. CIDR 2005.

Up to top

Web Services

  • Serge Aboteboul, Omar Benjelloun, Tova Milo. Positive active XML. PODS 2004.
  • Serge Abiteboul, Omar Benjelloun, Bogdan Cautis, Ioana Manolescu, Tova Milo, Nicoleta Preda. Lazy query evaluation for active XML. Sigmod 2004.
  • Semantics of Web Services
    • Anupriya Ankolekar, et al.: DAML-S: Web service description for the semantic web.
    • Massimo Paolucci, Takahiro Kawamura, Terry R. Payne, and Katia Sycara. Semantic matching of web services capabilities.
    • Andreas Heb and Nicholas Kushmerick. Machine learning for annotating semantic web services.
    • Andreas Heb and Nicholas Kushmerick. Learning to attach semantic metadata to web services. Semantic Web, 2003.
    • Andreas Heb, Eddie Johnston, and Nicholas Kushmerick. ASSAM: A tool for semi-automatically annotating semantic web servcies. 2004.
    • Andreas Heb, Nick Kushmerick. Iterative ensemble classificiation for relational data: a case study of semantic web services. 2004.
  • Web Service Discovery
    • Faith Emekci, Ozgur D. Sahin, Divyakant Agrawal, Amr El Abbadi. A peer-to-peer framework for web service discovery with ranking.
    • Xin Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang. Similarity search for web services. VLDB 2004.
  • Web Service Composition
      Ghandeharizadeh, S., Knoblock, C.A., Papadopoulos, C., Shahabi, C., Alwagait, E., Ambite, J.L., Cai, M., Chen, C.C., Pol, P., Schmidt, R., Song, S., Thakkar, S., Zhou, R.: Proteus: A system for dynamically composing and intelligently executing web services. ICWS 2003.
    • Thakkar, S., Knoblock, C.A., Ambite, J., Shahabi, C.: Dynamically composing web services from on-line sources. AAAI workshop on intelligent service integration, 2002.
    • Evren Sirin, James Hendler, and Bijan Parsia. Semi-automatic composition of web services using semantic descriptions.
    • Liangzhang Zeng, Boualem Benatallah, Marlon Dumas, Jayant Kalagnanam, Quan Z. Sheng. Quality driven web services composition. WWW, 2003.
  • Automatic Programming
    • Charles Rich and Richard C. Waters. Approach to automatic programming. Tech report 92-04, MIT, 1992.
    • Amy Moormann Zaremski and Jeannette M. Wing. Specification matching of software components. ACM TOSEM, 1997.

    Up to top

Books

  • Database Systems
    • J. Ullman and J. Widom.  A First Course in Database Systems.
    • Ramakrishnan.  Database Management Systems
    • Ramakrishnan & Gerke.  Database Management Systems, 2nd ed.
    • Korth & Silberschatz.  Database System Concepts, 2nd ed. 
    • Stonebraker, ed.  Readings in Databases, 3rd ed.
    • Garcia-Molina, Ullman, Widom.  Database System Implementation.
    • Elmagarmid, et al.  Management of Heterogeneous and Autonomous Database Systems.
  • Object-Oriented and Object-Relational
    • Stonebraker & Moore.  Object-Relational DBMSs: The Next Great Wave.
  • XML
    • Abiteboul, et al.  Data on the Web.
    • Pitts-Moultis & Kirk.  XML Black Book.
  • Query Processing

    • Yu & Meng.  Principles of Database Query Processing for Advanced Applications
    • Garcia-Molina, Ullman, Widom.  Database System Implementation.
    • Ramakrishnan.  Database Management Systems
  • Transaction Processing

    • Bernstein & Newcomer.  Principles of Transaction Processing

    • Gray & Reuter.  Transaction Processing: Concepts and Techniques

  • Distributed/Parallel Databases
    • Oszu & Valduriez.  Principles of Distributed Database Systems
  • Other Database System Topics
    • Widom & Ceri.  Active Database Systems
    • Zaniolo, et al.  Advanced Database Systems.
  • Theory

    • Abiteboul, et al.  Foundations of Databases
    • Christos H. Papadimitriou. Computational Complexity.
  • Information Retrieval
    • Baeza-Yates, Ribeiro-Neto.  Modern Information Retrieval
    • Belew, R.K. Finding out about--A cognitive perspective on search engine technology and the www. Cambridge University Press, 2000
  • Artificial Intelligence
    • S. Russell & P. Norvig, Artificial Intelligence: A Modern Approach (2nd ed.), Prentice Hall, 2003
  • Data Mining
    • Tom Mitchell, Machine Learning, McGraw-Hill, 1997
    • David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT Press, 2001
    • Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000
    • Fayyad, et al.  Advances in Knowledge Discovery & Data Mining.
  • Statistics
    • D. Koller & N. Friedman, Bayesian Networks and Beyond: Probabilistic Models for Learning and Reasoning, MIT Press
    • M. DeGroot & M. Schervish, Probability and Statistics (3rd ed.), Addison-Wesley, 2002
  • Java
    • Joshua Bloch. Effective Java: Programming Language Guide.
    • Lemay & Perkins.  Teach Yourself Java 1.1 in 21 Days, 2nd ed.
    • Java in a Nutshell
    • Downing.  Java RMI
    • Maximum Java 1.1.
    • Orfali & Harkey.  Client/Server Programming with Java and CORBA
    • Van Haecke.  JDBC: Java Database Connectivity.
  • C++
    • Gregory.  Using Visual C++ 4.2, Special Edition.

    • Visual C++ 5 Unleashed.

    • Kruglinski, Inside Visual C++

  • SQL
    • Chamberlin.  A Complete Guide to DB2 Universal Database
  • LaTeX
    • Lamport.  LaTeX: A Document Preparation System.  
    • LaTeX Graphics Companion.
  • ASP
    • Using Active Server Pages, Special Edition.