Resources
Reading List
History and General Background
- Relational Database
- E.F. Codd. A Relational Model of Data for Large
Shared Data Banks. CACM 13(6), 1970, pp. 377-387.
(Must read. First paper about RDB.)
- Michael Stonebraker. Operating System Support
for Database Management. CACM 24(7), 1981, pp.
412-418. (Must read)
- D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J.
Gray, W.F. King III, B. G. Lindsay, R.A. Lorie, J. W. Mehl,
T. G. Price, G. R. Putzolu, P. G. Selinger, M. Schkolnick,
D. R. Slutz, I. L. Traiger, B. W. Wade, R. A. Yost. A
History and Evaluation of System R. CACM 24(10),
1981, pp. 632-646.
- O. G. Tsatalos, M. H. Solomon, and Y. Ioannidis. The
GMAP: A versatile Tool for Physical Data Independence.
VLDB 1994. (Must-read. Dependence using views)
- Object-oriented
- Sort-of-survey: M.J. Carey, D.J DeWitt.
Of Objects and Databases: A Decade of Turmoil.
VLDB 1996. (Must read.)
- M. Stonebraker, G. Kemnitz. The Postgres Next
Generation Database Management System. CACM 34(10),
1991, pp. 78-92. (Must read. Important paper about ORDB.)
- L. M. Haas, W. Chang, G. M. Lohman, J. McPherson,
P. F. Wilms, G. Lapis, B. G. Lindsay, H. Pirahesh, M. J.
Carey, E. J. Shekita: Starburst Mid-Flight: As the Dust
Clears. TKDE 2(1), 1990, pp. 143-160.
- S. Abiteboul and A. Bonner. Objects and Views.
SIGMOD 1991.
- M.J. Carey, D.J. DeWitt, J. Naughton. The OO7
Benchmark. SIGMOD 1993.
- Temporal Database
- Temporal DB: C. Zaniolo, S. Ceri, C. Faloutsos,
R. T. Snodgrass, V.S. Subrahmanian, R. Zicari.
Advanced Database Systems. Chapter 5 (Overview of
Temporal Databases), pp. 99-121. (Must-read)
- Network/Hierarchical Databases
- E. H. Sibley: Development of Data-Base Technology. ACM
Computing Surveys, Vol 8 No 1, 1976.
- Dennis Tsichritzis and Frederick Lochovsky: Hierarchical
Data-Base Management: A Survey. ACM Computing Surveys, Vol 8
No 1, 1976.
- Ann Michaels, Benjamin Mittman, C. Robert Carlson: A
Comparison of the Relational and CODASYL Approaches to
Data-Base Management. ACM Computing Surveys, Vol 8 No 1,
1976.
- Document Databases
- Arjan Loeffen: Text databases: a survey of text
models and systems. SIGMOD Record, Vol 23 #1, 1994.
- Mariano P. Consens and Tova Milo: Optimizing queries
on files. SIGMOD 1994.
Up to top
Theory of Databases
-
Halpern, Harper, Immerman, Kolaitis, Vardi and
Vianu. On the Unusual Effectiveness of Logic in Computer
Science. Bulletin of Symbolic Logic, July 2002.
-
Christos Papadimitriou. Database metatheory:
asking the big queries. PODS 95.
Dependency, Constraints and Triggers
- Normal forms and design theory
- J. Ullman. Database and Knowledge Base Systems,
vol. I. Chapter 7 (Design Theory).
(Must-read)
- Marcelo Arenas, Leonid Libkin: Normalizing XML
documents. PODS 2002. (Must-read)
- Marcelo Arenas, Leonid Libkin: An
Information-Theoretic Approach to Normal Forms for
Relational and XML Data. PODS 2003. (PODS best paper).
- Dependency and chase rule
- J. Ullman. Database and Knowledge Base Systems,
vol. I. Chapter 7.11 (Generalized Dependencies).
(Must-read)
- S. Abiteboul, R. Hull, V. Vianu. Foundations
of Databases. Chapter 8 (Functional and Join
Dependency), Chapter 9 (Inclusion Dependency). (Must-read)
- Constraints and triggers
- S. Ceri, R. Cochrane, J. Widom. Practical
Applications of Triggers and Constraints: Successes and
Lingering Issues. VLDB 2000. (Must-read)
- Discovering constraints
- Y. Huhtala, J. Karkkainen, P. Porkka, and H. Toivonen.
TANE: An efficient algorithm for discovering functional and
approximate dependencies. The Computer Journal, 1999
- I. F. Ilyas, V. Markl, P. J. Haas, P. G. Brown, and A.
Aboulnaga. Cords: Automatic generation of correlation
statistics in DB2. In VLDB'04.
Up to top
Query Evaluation
- J. Ullman. Database and Knowledge Base Systems,
vol. I. Chapter 3 (Logic as a Data Model). (Must-read)
- J. Ullman. Database and Knowledge Base Systems,
vol. II. Chapters 12 (Top-Down Evaluation), 13 (Magic
Sets). (Must-read)
- Christos H. Papadimitriou, Mihalis Yannakakis: On the
Complexity of Database Queries. PODS'1997 (PODS best
paper)
- P. Buneman, S. A. Naqvi, V. Tannen, L. Wong: Principles
of Programming with Complex Objects and Collection Types.
TCS 149(1): 3–48 (1995).
- Serge Abiteboul and Paris C. Kanellakis. Object Identity
as a Query Language Primitive. JACM 1998.
Up to top
Query Containment
- Theoretical Results
- Conjunctive queries
-
A. K. Chandra and P. M. Merlin.
Optimal implementation of conjunctive queries in
relational databases. In Proc. of STOC, 1977.
-
A. Aho, Y. Sagiv, and J. D. Ullman.
Equivalence of relational expressions. SIAM Journal
of computing, (8)2:218-246, 1979.
-
Acyclic queries: M. Yannakakis.
Algorithms for acyclic database schemes. VLDB, 82-94,
1981.
-
Union: Y. Sagiv and M. Yannakakis.
Equivalence among relational expressions with the union and
diference operators. Journal of the ACM, 27(4):633-655,
1980.
-
Negation: A. Y. Levy and Y. Sagiv.
Queries independent of updates. In Proc. of VLDB, 1993.
-
Arithmetic comparisons
-
R. van der Meyden. The complexity of
querying indefinite data about linearly ordered domains.
In Proc. of PODS, pages 331-345, San Diego, CA., 1992.
-
X. Zhang and M. Z. Ozsoyoglu. On
efficient reasoning with implication constraints. In
Proc. of DOOD, 1993.
-
A. Gupta, Y. Sagiv, J. D. Ullman, and J.
Widom. Constraint checking with partial information.
In Proc. of PODS, 45-55, Minneapolis, Minnesota, 1994.
-
Recursive queries
-
O. Shmueli. Equivalence of datalog
queries is undecidable. Journal of Logic
Programming, 15:231-241, 1993.
-
S. Chaudhuri and M. Vardi. On the
equivalence of recursive and nonrecursive datalog
programs. In Proc. of PODS, 55-66, San Diego, CA.,
1992.
-
Queries over bags
-
S. Chaudhuri and M. Vardi. Optimizing
real conjunctive queries. In Proc. of PODS, 1993.
-
Y. E. Ioannidis and R. Ramakrishnan.
Containment of conjunctive queries: Beyond relations as
sets. ACM Transactions on Database Systems,
20(3):288-324, 1995.
-
XPaths
-
G. Miklau and D. Suciu. Containment
and equivalence for an XPath fragment. In Proc. of
PODS, 2002.
-
S. Amer-Yahia, S. Cho, L.V.S.Lakshmanan,
and D.Srivastava. Minimization of tree pattern
queries. In Proc. of SIGMOD, 2001.
-
T. Milo and D. Suciu. Index
structures for path expressions. In Proc. of ICDT,
pages 277{295, 1999.
-
Alin Deutsch and Val Tannen.
Containment and Integrity Constratints for XPath
Fragment. Tech Report. 2001.
-
D. Florescu, A. Levy, and D. Suciu.
Query containment for conjunctive queries with regular
expressions. In Proc.of PODS, Seattle,WA, 1998.
-
Nested Queries
-
Xin Dong, Alon Halevy, Igor Tatarinov.
Containment of Nested XML Queries. VLDB 2004.
-
A. Y. Levy and D. Suciu. Deciding
containment for queries with complex objects and
aggregations. In Proc. of PODS, Tucson, Arizona.,
1997.
- Application in Query Optimization
- S. Abiteboul, R. Hull, V. Vianu. Foundations
of Databases. Chapter 6, Sections 6.2 (Global
Optimizations) and 6.4 (Computing with Acyclic Joins).
- J. Ullman. Database and Knowledge Base Systems,
vol. II. Chapter 14 (Containment).
Up to top
Answering Queries Using Views
- Survey
- A.Y. Halevy. Answering Queries Using Views: A
Survey. VLDB Journal, 10(4). (Must-read)
- Jeffrey D. Ullman. Information Integration Using
Logical Views. ICDT 1997. (Must-read)
-
Theoretical Results
- S. Abiteboul, O. M. Duschka. Complexity of
Answering Queries Using Materialized Views. PODS
1998. (Must-read)
- Alon Levy, Alberto Mendelzon, Yehoshua Sagiv and Divesh
Srivastava. Answering queries using views. PODS 1995.
- Oliver M. Duschka and Michael R. Genesereth.
Answering recursive queries using views. PODS 1997.
-
Practical Algorithms
- Buckets: A. Y. Levy, A. Rajaraman, J. J. Ordille.
Querying Heterogeneous Information Sources Using Source
Descriptions. VLDB 1996.
- Inverse Rules: Oliver M. Duschka, Michael R.
Genesereth, and Alon Halevy. Recursive query plans for
data integration. Journal of Logic Programming, 1999.
- MiniCon: R. Pottinger, A.Y. Halevy.
MiniCon: A Scalable Algorithm for Answering Queries Using
Views. VLDB Journal 10(2-3), pp 182-198, 2001.
- Query Rewriting for XML Data
- M. Y. Vardi: Constraint Satisfaction and Database
Theory: A Tutorial. PODS 2000, pp. 76–85.
- Yannis Papakonstantinou and Vasilis Vassalos. Query
rewriting for semistructured data. Sigmod 1999.
- Alin Deutsch and Val Tannen. Reformulation of XML
queries and constraints. ICDT 2003.
- Marcelo Arenas and Leonid Libkin. XML data exchange:
consistency and query answering. PODS 2005. (PODS
Best-paper)
Up to top
Data Storage and Indexes
- Raghu Ramakrishnan and Johannes Gehrke. Database
Management Systems (2nd ed.) Chpt 7-10 (Data storage and
indexing). (Must-read).
- C. Zaniolo, S. Ceri, C. Faloutsos, R. T. Snodgrass, V.S.
Subrahmanian, R. Zicari. Advanced Database Systems,
Chapter 11 (Traditional Indexing Methods), pp. 269-294.
- J. M. Hellerstein, J. F. Naughton, A. Pfeffer: Generalized
Search Trees for Database Systems. VLDB 1995 (Must-read)
- OO Indexing
- Elisa Bertino and Won Kim: Indexing Techniques for
Queries on Nested Objects. TKDE 1989.
- Alfons Kemper, Guido Moerkotte: Access Support in Object
Bases. SIGMOD 1990.
- Paris Kanellakis, Sridhar Ramaswamy, Darren Vengroff,
and Jeffrey Vitter: Indexing for Data Models with
Constraints and Classes. PODS 1993.
Up to top
XML Indexing
- Structure Indexes
- DataGuides:
- Roy Goldman and Jennifer Widom. DataGuides:
Enabling query formulation and optimization in
semistructured databases. VLDB 1997.
- Roy Goldman and Jennifer Widom. Approximate
DataGuides. Workshop on Query Processing for
semistructured data and non-standard data formats. 1999.
- Svetlozar Nestorov, Jeffrey Ullman, Janet Wiener and
Sudarshan Chawathe. Representative objects: concise
representations of semistructured, hierarchical data.
ICDE 1997.
- 1-index: Tova Milo and Dan Suciu. Index
structures for path expressions. ICDT 1999.
- Adaptive index: C. Chung, J. Min and K. Shim.
APEX: An adaptive path index for XML data. Sigmod 2002.
- A(k)-index: Raghav Kaushik, Pradeep Shenoy,
Philip Bohannon and Ehud Gudes. Exploiting local
similarity for indexing paths in graph-structured data.
ICDE 2002.
- D(k)-index: Qun Chen, Andrew Lim and Kian Win
Ong. D(k)-index: An adaptive structural summary for
graph-structured data. Sigmod 2003.
- M(k)-index: Hao He and Jun Yang.
Multiresolution Indexing of XML for Frequent Queries.
ICDE 2004.
- F&B-index for branching: Raghav Kaushik, Philip
Bohannon, Jeffrey F. Naughton, and Henry F. Korth.
Covering indexes for branching path queries. Sigmod
2002.
- F&B-index: Wei Wang, Hongzhi Wang, Hongjun Lu,
Haifeng Jiang, Xuemin Lin, and Jianzhong Li. Efficient
Processing of XML Path Queries Using the Disk-based F&B
Index. VLDB 2005.
- A. Halverson et al. Mixed mode XML query processing.
VLDB 2003.
- Index for Structural Matching:
- Structural joins
- Flavio Rizzolo and Alberto Mendelzon. Indexing
XML Data with ToXin. WebDB 2001.
- Quanzhong Li and Bongki Moon. Indexing and
querying XML data for regular path expressions. VLDB
2001.
- Shurug Al-Khalifa, H.V. Jagadish, Nick Koudas,
Jignesh M. Patel, Divesh Srivastava and Yuqing Wu.
Structural Joins: A primitive for efficient XML query
pattern matching. ICDE 2002.
- Shu-yao Chien, Zografoula Vagena, Donghui Zhang,
Vassilis J. Tsotras, Carlo Zaniolo. Efficient
structural joins on indexed XML documents. VLDB
2002.
- Haifeng Jiang, Hongjun Lu, Wei Wang, and Beng Chin
Ooi. XR-Tree: Indexing XML data for efficient
structural joins. ICDE 2003.
- Wei Wang, Haifeng Jiang, Hongjun Lu, and Jeffrey Xu
Yu. PBiTree Coding and Efficient Processing of
Containment joins. ICDE 2003.
- C. Zhang, J. Nauthton, D. Dewitt, Q. Luo, and G.
Lohman. On supporting containment queries in
relational database management systems. Sigmod 2001.
- Holistic twig joins
- Nicolas Bruno, Nick Koudas, and Divesh Srivastava.
Holistic Twig Joins: Optimal XML pattern matching.
Sigmod 2002. (First paper on Holistic Twig Joins)
- H. Jiang, W. Wang, H. Lu, J. Yu. Holistic Twig
joins on indexed XML documents. VLDB 2003.
- H. Jiang, W. Wang, and H. Lu. Efficient
processing of XML Twig queries with or-predicates.
Sigmod 2004.
- Ting Chen, Jiaheng Lu and Tok Wang Ling. On
Boosting Holism in XML Twig Pattern Matching Using
Structural Indexing Techniques. VLDB 2005.
- B. Yang, M. Fontoura, E. Shekita, S. Rajagopalan,
and K. S. Beyer. Virtual cursors for XML joins.
CIKM, 2004.
- Marcus Fontoura, Vanja Josifovski, Eugene Shekita,
and Beverly Yang. Optimizing cursor movement in
holistic Twig joins. CIKM 05.
- Indexing Structure and Values
- B. Cooper, N. Sample, M.J.Franklin, G.R.Hjaltason, M.
Shadmon: A Fast Index for Semistructured Data. VLDB
2001, pp. 341-350. (Must-read)
- Haixun Wang, Sanghyun Park, Wei Fan, Philip S. Yu.
ViST: A dynamic index method for querying XML data by tree
structures. Sigmod 2003.
- Raghav Kaushik, Rajasekar Krishnamurthy, Jeffrey F.
Naughton, and Raghu Ramakrishnan. On the integration of
structure indexes and inverted lists. Sigmod 2004.
Up to top
Query Execution
-
Survey: G. Graefe. Query
Evaluation Techniques for Large Databases. ACM
Computing Surveys 25(2), 1993, pp. 73-170. [Website]
[Slides]
(Must-read)
-
Raghu Ramakrishnan and Johannes Gehrke.
Database Management Systems (2nd ed.) Chpt 11 (sorting) Chpt
12 (Evaluation). (Must-read).
-
A. Ailamaki, D. J. DeWitt, M. D. Hill, D. A.
Wood: DBMSs on a Modern Processor: Where Does Time Go?
VLDB 1999. (Must-read)
-
J. Hellerstein, P. Hass, H. Wang: Online
Aggregation. SIGMOD 1997.
- Chris Nyberg, Tom Barclay, Zarka Cvetanovic, Jim Gray, and
David Lomet. AlphaSort: A RISC Machine Sort. SIGMOD'1994
(SIGMOD best paper)
- Anastassia Ailamaki, David DeWitt, Mark Hill, and Marios
Skounakis. Weaving relations for cache performance.
VLDB'2001 (VLDB best paper)
Up to top
Query Optimization
- Survey
- S. Chaudhuri. An Overview of Query Optimization
in Relational Systems. PODS 1998. (Must-read)
- Y. Ioannidis. Query Optimization.
Handbook for Computer Science (CRC Press), chapter 45.
- Classical Systems
- P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, T.
Price. Access Path Selection in a Relational
Database Management System. SIGMOD 1979. (Must-read)
- L. M. Haas, J. C. Freytag, G. M. Lohman, H. Pirahesh.
Extensible Query Processing in Starburst. SIGMOD
1989. (Must-read)
- G. Graefe, W. J. McKenna: The Volcano Optimizer
Generator: Extensibility and Efficient Search. ICDE
1993. (Must-read)
- Query optimization for data integration
- Laura M. Haas, Donald Kossmann, Edward L. Wimmers, Jun
Yang: Optimizing queries across diverse data sources.
VLDB 1997.
- Serge Abiteboul, Hector Barcia-Molina, Yannis
Papakonstantinou, Ramana Yerneni: Fusion queries over
Internet databases. EDBT 1998.
-
Distributed query optimization
- D. Kossmann, K. Stocker: Iterative dynamic
programming: a new class of query optimization algorithms.
TODS 25(1): 43-82 (2000).
- L. F. Mackert, G. M. Lohman: R* Optimizer Validation
and Performance Evaluation for Distributed Queries. VLDB
1986.
Up to top
Database Statistics
-
Survey
-
Barbara, W. DuMouchel, C. Faloutsos, P.J.
Haas, J.M. Hellerstein, Y. Ioannidis, H.V. Jagadish, T.
Johnson, R. Ng, V. Poosala, K.A. Ross, K.C. Sevcik.
The New Jersey Data Reduction Report. Data
Engineering Bulletin 20(4), 1997, pp. 3-45. (Must-read)
- Statistical Models
- Projection Estimation
- Rafiul Ahad, K.V.Bapa Rao and Dennis Mcleod: On
estimating the cardinality of the projection of a
database relation. ACM Transactions on Databases,
14(1), 28-40, 1989.
- Jane Fedorowicz: Database evaluation using
multiple regression techniques. 1984.
- Selectivity Estimation
- Volker Markl, Nimrod Megiddo, Marcel Kutsch, Tam
Minh Tran, Peter J. Haas, Utkarsh Srivastava:
Consistently Estimating the Selectivity of Conjuncts of
Predicates. VLDB 05.
- Lise Getoor, Benjamin Taskar, Daphne Koller:
Selectivity Estimation using Probabilistic Models.
SIGMOD 01.
- Viswanath Poosala and Yannis E. Ioannidis:
Selectivity estimation without the attribute value
independence assumption. VLDB 1997.
- Join-size Estimation
- Y. E. Ioannidis, S. Christodoulakis. On the
Propagation of Errors in the Size of Join Results.
SIGMOD 1991. (Must-read)
- Swarup Acharya, Phillip B. Gibbons, Viswanath
Poosala, Sridhar Ramaswamy: Join Synopses for
Approximate Query Answering. SIGMOD 99.
- Noga Alon, Phillip B. Gibbons, Yossi Matias, and
Mario Szegedy: Tracking join and self-join sizes in
limited storage. PODS 1999.
- Wei Sun, Yibei Ling, Naphtali Rishe and Yi Deng:
An instant and accurate size estimation method for joins
and selection in a retrieval-intensive environment.
Sigmod 1993.
- Allen Van Gelder: Multiple join size estimation
by virtual domains. PODS 1993.
- Arun Swami and K. Bernhard Schiefer: On the
estimation of join result sizes.
- Stavros Christodoulakis: Estimating block
transfers and join sizes. 1983.
-
Histogram
- Yannis E. Ioannidis: The History of Histograms
(abridged). VLDB, 03. (10-year award paper;
Must-read).
(The original paper from VLDB 93: Yannis E. Ioannidis:
Universality of Serial Histograms. VLDB, 93.)
-
Amol Deshpande, Minos N. Garofalakis, Rajeev
Rastogi: Independence is Good: Dependency-Based Histogram
Synopses for High-Dimensional Data. SIGMOD 01.
-
Ashraf Aboulnaga and Surajit Chaudhuri:
Self-tuning histograms: building histograms without looking
at data. 1999.
-
Phillip B. Gibbons, Yossi Matias and
Viswannath Poosala: Fast incremental maintenance of
approximate histograms. VLDB 1997.
-
Viswannath Poosala, Yannis E. Ioannidis,
Peter J. Haas and Eugene J. Shekita. Improved histograms
for selectivity estimation of range predicates. 1996.
-
M. Muralikrishna and David J Dewitt:
Equi-depth histograms for estimating selectivity factors for
multi-dimensional queries. 1988.
-
Clifford A. Lynch: Selectivity estimation
and query optimization in large databases with highly skewed
distributions of column values. VLDB 1988.
-
Gregory Platetsky-Shapiro and Charles
Connell: Accurate estimation of the number of tuples
satisfying a condition. 1984.
-
Sampling
-
Surajit Chaudhuri, Rajeev Motwani, Vivek R.
Narasayya: On Random Sampling over Joins. SIGMOD 99.
- Frank Olken. Random Sampling from Databases.
Ph.D. Thesis, 1993.
-
Sumit Ganguly, Phillip B. Gibbons, Yossi
Matias and Avi Silberschatz: Bifocal sampling for
skew-resistant join size estimation. 1996.
-
Peter J. Haas and Arun N. Swami: Sampling-based
selectivity estimation for joins using augmented frequent
value statistics. 1995.
-
Peter J. Hass, Jeffrey F. Naughton and Arun
N. Swami. On the relative cost of sampling for join
selectivity estimation. Sigmod 1994.
-
Peter J. Haas and Arun N. Swami:
Sequential sampling procedures for query size estimation.
Sigmod 1992.
-
Richard J. Lipton, Jeffrey F. Naughton and
Donovan A. Schneider: Practical selectivity estimation
through adaptive sampling. 1992.
-
Wen-chi Hou and Gultekin Ozsoyoglu:
Statistical estimators for aggregate relational algebra
queries. ACM Transactions on Database Systems, 16(4),
600-654, 1991.
-
Adaptive Estimation
- Chung-Min Chen and Nick Roussopoulos. Adaptive
selectivity estimation using query feedback. SIGMOD 94'. (Must-read)
- J. M. Hellerstein, V. Ramman, and B. Ramman. Online
Dynamic Reordering for Interactive Data Processing. VLDB
99'.
- Modeling Uncertainty in Estimation
- Brian Babcock, Surajit Chaudhuri: Towards a Robust
Query Optimizer: A Principled and Practical Approach.
SIGMOD Conference 2005: 119-130
Up to top
Adaptive Query Optimization
Up to top
View Selection
- J. Ullman, V. Harinarayan, A. Rajaraman. Implementing Data
Cubes Efficiently. SIGMOD 1996.
(Sigmod best-paper award)
- S. Agrawal, S. Chaudhuri, V. Narasayya. Automated Selection
of Materialized Views and Indexes for SQL Databases. VLDB 2000.
- R. Chirkova, A. Halevy, D. Suciu. A Formal Perspective on
the View Selection Problem. VLDB 2001.
- H. Mistry, P. Roy, S. Sudarshan, and K. Ramamritham.
Materialized view selection and maintenance using multi-query
optimization.
Up to top
- Semantic Caching
- S. Dar, M. J. Franklin, B. T. Jonsson, D. Srivastava and
M. Tan. Semantic data cleaning and replacement. VLDB 1996. (The
first semantic caching paper, must-read)
- P. Larson, J. Goldstein, J. Zhou. MTCache: Transparent
mid-tier database caching in SQL server. ICDE 2004.
- View Matching
- Matching with A Single View
- First paper: P. Larson and H.Z.Yang.
Computing queries from derived relations. VLDB 1985.
- "Full-version" solution: J. Goldstein and P.
Larson. Optimizing queries using materialized views: a
practical scalable solution. Sigmod 2001. (Must-read)
- Outer-join: P. Larson and J. Zhou. View
matching for outer-join views. VLDB 2005.
- Aggregation: A. Gupta, V. Harinarayan
and D. Quass. Aggregate query processing in data
warehousing environments. VLDB 1995.
- Aggregation: D. Srivastava, S. Dar, H.V.
Jagadish, A. Levy. Answering queries with aggregation
using views. VLDB 1996. (Must-read)
- Constraints for extra tables: J. Chang and S.
Lee. Query reformulation using materialized views in
data warehousing environment. DOLAP 1998.
- Stacked view: D. Dehaan, P. Larson, and J.
Zhou. Stacked indexed views in microsoft SQL server.
Sigmod 2005.
- Matching with Multiple Views
- Rewriting to a conjunctive query: F. N.
Afrati, C. Li, and J. D. Ullman. Generating efficient
plans for queries using views. Sigmod 2001. (Must-read)
- Rewriting to a disjunctive query: H. Z. Yang
and P. Larson. Query transformation for PSJ queries.
VLDB 1987.
- Aggregation: C. Park, M. H. Kim and Y. J.
Lee. Rewriting OLAP queries using materialized views and
dimension hierarchies in data warehouses. ICDE 2001. (Must-read)
- Semantic Caching in Industry
- System-R: S. Chaudhuri, S. Krishnamurthy, S.
Potamianos, K. Shiim. Optimizing queries with materialized
views. ICDE 1995.
- Oracle: R.G. Bello, K. Dias, J. Feenan, J.
Finnerty, W.D. Norcott, H. Sun, A. Witkowski, M. Ziauddin.
Materialized views in Oracle. VLDB 1998, 659-664.
- DB2: M. Zaharioudakis, R. Cochrane, G. Lapis, H.
Pirahesh, M. Urata. Answering complex SQL queries using
automatic summary tables. Sigmod 2000, 105-116.
- View Matching for XML
- Bhushan Mandhani and Dan Suciu. Query caching and view
selection for XML databases. VLDB 2005.
Up to top
View Maintenance
- Kenneth Salem, Kevin S. Beyer, Bruce Lindsay, Roberta
Cochrane. How To Roll a Join: Asynchronous Incremental View
Maintenance. Sigmod 2000.
- P. Mork. Managing change in large-scale data sharing
systems. Tech report.
Up to top
View Updates
-
Survey
-
A. Furtado and M. Casanova.
Updating relational views. In Query Processing in
Database Systems, pages 127-144, Springer-Verlag, New
York, NY, 1985. (must-read)
-
S. J. Kaplan and J. Davidson.
Interpreting natural language database updates. In Proc.
of 19th Annual Meeting of the Association for Computational
Linguistics, Stanford, California, June 1981.
-
Relational
View Update Theory
-
F.
Bancilhon and N. Spyratos. Update semantics of relational
views. ACM Transactions on Database Systmes,
6(4):557-575, December 1981. (must-read)
-
S. S.
Cosmadakis and C. H. Papadimitriou. Updates of relational
views. J.ACM, 31(4):742-760, October 1984.
-
A. M.
Keller and J. D. Ullman. On complementary and independent
mappings on databases. In Proc. of the 3rd ACM
SIGMOD Int. Conf. on Management of Data, Boston,
June1984.
-
S. J.
Hegner. Canonical view update support through boolean
algebras of components. In Proc. of the 3rd
ACM SIGACT-SIGMOD Symposium on Principles of Database
Systems, pages 163-172, April 1984.
-
P. Buneman,
S. Khanna, W. Tan. On propagation of Deletions and
Annotations Through Views. In Proc. of the 21st
ACM SIGACT-SIGMOD Symposium on Principles of Database
Systems, 2002.
-
Relational
View Updates through Abstract data types
-
K. C.
Sevcik and A. L. Furtado. Complete and compatible sets of
update operations. In Intl. Conf. on Management of Data
(ICMOD), Milan, Italy, 1978
-
L. A. Rowe
and K. A. Shoens. Data abstraction, views and updates in
RIGEL. In Proc. of ACM-SIGMOD Int’l Conf. on Management
of Data, pages 214-225, 1979.
-
A. Tomasic.
View update translation via deduction and annotation. In
ICDT'88 (Second International Conference on DataBase Theory),
pages 338--352, 1988.
-
A. Tomasic.
Correct view update translation via containment. Stanford
University Computer Science Technical Note STAN-CS-TN-93-3,
1993.
-
Relational
View Updates through Syntax and Semantics Analysis
-
U. Dayal
and P. A. Bernstein. On the updatability of relational
views. In Proc. of 4th Int’l Conf. on Very Large Data
Base, pages 368-377, 1978.
-
U. Dayal
and P. A. Bernstein. On the correct translation of update
operations on relational views. ACM Transactions on
Database Systems, 3(3):381-416, September 1982. (must-read)
-
Y.
Masunaga. A relational database view update
translation mechanism. In Proc. of 10th Int’l.
conf. on Very Large Data Bases, pages 309-320,
Singapore, 1984.
-
A. M.
Keller. Algorithms for translating view updates to database
updates. In Proc. of the 4th ACM SIGACT-SIGMOD
Symposium on Principles of Database Systems, March 1985.
(must-read)
-
A. M.
Keller. Choosing a view update translator by dialog at view
definition times. In Proc. of the 12th
International Conference on Very Large Data Bases, pages
467-474, Kyoto, Japan, 1986.
-
T. W. Ling
and M. L. Lee. A theory for entity-relationaship view
updates.
-
Relational
View Updates through constraint satisfaction
-
Object-oriented View Updates
- T. Barsalou,
N. Siambela, A. M. Keller, and G. Wiederhold. Updating
relational databases through object-based views. In
Proc. of the 10th ACM SIGACT-SIGMOD Symposium on
Principles of Database Systems, 1991. (must-read)
- J. Chen. Update multidatabase through object views. MS
thesis, Iowa State University, 1997.
- Updates on XML Data
-
S.
Abiteboul. On views and XML. In Proc. ACM Symp. on the
Principles of Database Systems, 1999.
-
A. Salminen
and F. W. Tompa. Requirements for XML document database
systems. In Proc. ACM DocEng, pages 85-94, 2001
-
P. Lehti.
Design and implementation of a data manipulation processor
for an XML Query Language. Technical Report, Technische
Universitat Darmstadt, 2001. Report KOM-D-149. (must-read)
-
P. Lehti
and P. Fankhauser. Towards type safe updates in XQuery.
Technical Notes, 2002. World Wide Web:
-
M. Rys.
Proposal for an XML data modification language. Microsoft
Report, 2002. (must-read)
-
I.
Tatarinov, Z. G. Ives, A. Y. Halevy, D. S. Weld. Updating
XML. In Proc. of the 20th ACM-SIGMOD Int’l
Conf. on Management of Data, 2001. (must-read)
Up to top
Data Warehousing and OLAP
- Survey
- S. Chaudhuri, U. Dayal. An Overview of Data
Warehousing and OLAP Technology. SIGMOD Record 26(1),
1997, pp. 65-74. (Must-read)
- New SQL Constructs
- J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D.
Reichart, M. Venkatrao, F. Pellow, H. Pirahesh. Data Cube: A
Relational Aggregation Operator Generalizing Group-By,
Cross-Tab, and Sub-Totals. Data Mining and Knowledge
Discovery 1997. (Must-read)
- D. Chatziantoniou, K. Ross. Groupwise Processing of
Relational Queries. Proceedings of the 1997 VLDB Conference.
- Ralph Kimball. Why Decision Support Fails and How To Fix
It. SIGMOD 1995.
- Benchmark
- M. Poess, C. Floyd. New TPC Benchmarks for
Decision Support and Web Commerce. SIGMOD Record,
29(4).
Up to top
Data Cleaning
- H. Galhardas, D. Florescu, D. Shasha, E. Simon, C. Saita.
Declarative data cleaning: language, model, and algorithms. VLDB
2001. Look under Session R12.
- V. Raman, J. Hellerstein. Potter's wheel: an interactive
data cleaning system. VLDB 2001. Session R12.
Up to top
Data Mining
- J. Han, M. Kamber. Data Mining Concepts and
Techniques. Chapter 1 (Introduction), Chapter 7
(Classification), Chapter 8 (Clustering).
- Association Rules
- Fast Algorithms for Mining Association Rules, Agrawal
and Srikant; VLDB 94. (must-read)
- R. Agrawal, T. Imielinski, A. N. Swami. Mining
Association Rules between Sets of Items in Large Databases.
SIGMOD 1993.
- Query Flocks: A Generalization of Association-Rule
Mining Dick Tsur, Jeffrey D. Ullman, Serge Abiteboul, Chris
Clifton, Rajeev Motwani, Svetlozar Nestorov, Arnon
Rosenthal; SIGMOD 98.
- Classification
- PUBLIC: A Decision Tree Classifier that Integrates
Building and Pruning Rastogi, Shim; VLDB 1998
- S. Chaudhuri, U. Fayyad, J. Bernhardt. Scalable
Classification over SQL Databases. SIGMOD 1999.
- Clustering: CURE: An Efficient Clustering Algorithm
for Large Databases Guha, Rastogi, Shim; SIGMOD 98.
- Data Mining Methodology: Integrating Mining with
Relational Database Systems: Alternatives and Implications
Sunita Sarawagi, Shiby Thomas, Rakesh Agrawal; SIGMOD 98.
- Latent Semantic Indexing: Latent Semantic Indexing: A
Probablistic Analysis Christos Papadimitriou, Prabhakar
Raghavan, Hisao Tamaki; Preliminary version in PODS 98
- Extending naive Bayes classifiers using long itemsets
Meretakis and Wuthrich; KDD 99.
- Squashing flat files flatter DuMouchel et al; KDD 99.
- SQL with Data Mining Primitives
- A. Netz, S.Chaudhuri, U. Fayyad, J. Bernhardt.
Integrating Data Mining with SQL Databases: OLE DB for DAta
Mining. IEEE ICDE 2001.
- S. Chaudhhuri, V. Narasayya, S. Sarawagi. Efficient
Evaluation of Queries with Mining Predicates. IEEE ICDE
2002.
Up to top
Semi-structured and XML Data
- XML
- XML Query
- S. Boag, D. Chamberlin, M.F. Fernandez, D. Florescu, J.
Robie, J. Simeon, M. Stefanescu. XQuery: An
XML Query Language. W3C working draft.
- P. Wadler. A formal semantics of patterns in XSLT.
Submitted to BSL
-
XQuery Tutorial
- P. Wadler. Two semantics of XPath.
- XML Publishing
- P. Bohannon, S. Ganguly, H. F. Korth, P. P. S. Narayan,
and P. Shenoy. Optimizing view queries in ROLEX to support
navigable result trees. In Proc. of the 28th Int’l. Conf. on
Very Large Data Bases, 2002.
- M. Fernández, Y. Kadiyska, D. Suciu,A Morishima, and W.
Tan. SilkRoute: a framework for publishing relational data
in XML. TODS 27(4), 2002. (Must-read)
- J. Shanmugasundaram, E. Shekita, R. Barr, M. Carey, B.
Lindsay, H. Pirahesh, B. Reinwald. Efficiently Publishing
Relational Data as XML Documents. VLDB 2000. (Must-read)
- M. Fernandez, A. Morishima, D. Suciu. Efficient
Evaluation of XML Middle-ware Queries. SIGMOD 2001.
- XML Storage
- Survey: Florescu, Kossmann, A Performance
Evaluation of Alternative Mapping Schemes for Storing XML
Data in a Relational Database, INRIA Technical Report
- Use DTD to derive schema: J. Shanmugasundaram, K.
Tufte, C. Zhang, G. He, D. J. DeWitt, J. F. Naughton.
Relational Databases for Querying XML Documents:
Limitations and Opportunities. VLDB 1999. (Must-read)
- Use Generic schema: D. Florescu, D. Kossmann.
Storing and Querying XML Data Using an RDBMS. IEEE Data
Engineering Bulletin, 1999.
- Use the Path Table: M. Yoshikawa, T. Amagasa, T.
Shimura, S. Uemura. XRel: A Path-based Approach to Storage
and Retrieval of XML Documents Using Relational Databases.
ACM Transactions on Internet Technology, 2001.
- Use Data Mining to derive schema: A. Deutsch, M.
Fernandez, D. Suciu. Storing Semistructured Data with
STORED. SIGMOD 1999
- DBMS for semistructured data
- M. F. Fernandez, D. Florescu, J. Kang, A. Y. Levy, D.
Suciu. Catching the Boat with Strudel: Experiences with a
Web-Site Management System. SIGMOD 1998. (Must-read)
- McHugh, Abiteboul, Goldman, Quass, and Widom: Lore: A
Database Management System for Semistructured Data. SIGMOD
Record, September 1997
- Goldman, McHugh, Widom. From Semistructured Data to XML:
Migrating the Lore Data Model and Query Language. WebDB '99
-
XML normal form
-
XML indexing
- Typechecking and type inference
- Milo, Suciu: Type Inference for Queries on
Semistructured Data, PODS 2000.
- XML data compression
- Liefke, Suciu: XMill: an Efficient Compressor for XML
Data. SIGMOD 2000.
- XML database systems
- Jeffrey Nauthton et al. The Niagara Internet Query
System. IEEE Bulletin 2001.
- H. V. Jagadish et al. TIMBER: A Native XML Database.
VLDBJ 2003.
Up to top
Transaction Processing
- Overview:
- P. A. Bernstein, E. Newcomer. Principles of
Transaction Processing, 2nd ed. Chapter 1
(Introduction), Chapter 2 (Transaction Processing Monitors).
- R. Ramakrishnan and J. Gehrke. Database
Management Systems, 2nd ed., Chapters 18 (Transaction
Management Overview).
- Two-phase Commit (A--Atomicity):
- P. A. Bernstein, E. Newcomer. Principles of
Transaction Processing, 2nd ed. Chapter 9 (Two-Phase
Commit). (Must-read)
- Locking and Concurrency Control (I--Isolation)
- P. A. Bernstein, E. Newcomer. Principles of
Transaction Processing, 2nd ed.
Chapter 6 (Locking). (Must-read)
- R. Ramakrishnan and J. Gehrke. Database
Management Systems, 2nd ed., Chapters 19 (Concurrency
Control).
- Axel Moenkeberg and Gerhard Weikum: Performance
Evaluation of an Adaptive and Robust Load Control Method for
the Avoidance of Data-Contention Thrashing. VLDB 1992. (VLDB
10-year best paper)
- Gerhard Weikum, Axel Moenkeberg, Christof Hasse, Peter
Zabback: Self-tuning Database Technology and Information
Services: from Wishful Thinking to Viable Engineering.
2002. (10-year retrospective paper, Must-read)
- Recovery (A--Atomicity & D--Durability)
- P. A. Bernstein, E. Newcomer. Principles of
Transaction Processing, 2nd ed. Chapter 8 (Database
System Recovery). (Must read)
- R. Ramakrishnan and J. Gehrke. Database
Management Systems, 2nd ed., Chapters 20 (Recovery).
- David Lomet and Gerhard Weikum: Efficient Transparent
Application Recovery in Client/Server Information Systems.
SIGMOD'1998. (Sigmod best paper)
Up to top
Security
- Survey
- Adam, Wortmann. Security-control methods for
statistical databases: a comparative study. (A survey
of the main techniques for protecting against disclosure of
confidential information in a statistical database:
conceptual, query restriction, data perturbation, output
perturbation.)
- Pinkas. Cryptographic Techniques for
Privacy-Preserving Data Mining. SIGKDD Explorations. (Survey
paper of results in secure multi-party computation and their
relevance to data mining. )
- Privacy
- Alan Westin One-page article. Wall Street
Journal, April 2000. (Brief article summarizing survey
results on individual attitudes about privacy. )
- Agrawal, Kiernan, Srikant, Xu.Hippocratic Databases.
VLDB 2002. (A proposal for a database system that
respects the privacy of individuals who contribute data to
the database. Includes a list of key properties and
challenges of a Hippocratic database system. )
- L. Sweeney. Uniqueness of Simple Demographics in the
U.S. Population, LIDAP-WP4. Carnegie Mellon University,
Laboratory for International Data Privacy, Pittsburgh, PA:
2000. (Empirical study of census data attempting to
extract information on individuals from aggregate values.)
- Agrawal, Evfimievski, Srikant. Information Sharing
Across Private Databases. SIGMOD 2003
- Agrawal, Srikant.Privacy-Preserving Data Mining.
SIGMOD 2000 : 439-450
- Evfimievski, Srikant, Agrawal, Gehrke.Privacy
preserving mining of association rules. KDD 2002
- Jon M. Kleinberg, Christos H. Papadimitriou, Prabhakar
Raghavan: Auditing Boolean Attributes. PODS'2000.
- Data Authenticity
- Goodrich, Tamassia, Triandopoulos, Cohen.
Authenticated Data Structures for Graph and Geometric
Searching. Technical Report 2001
- Prem Devanbu, Michael Gertz, Chip Martel, Stuart G.
Stubblebine. Authentic Third-party Data Publication IFIP
Conference on Database Security, 2000.
- Cryptography
- Song, Wagner, Perrig: Practical Techniques for
Searches on Encrypted Data. IEEE Symposium on Security
and Privacy, 2000. (Cryptographic techniques for secure
search over list of values stored on untrusted server.)
- Hacigumus, Iyer, Li, Mehrotra.Executing SQL over
encrypted data in the database-service-provider model.
SIGMOD 2002. (Techniques for query evaluation over an
encrypted database stored on an untrusted server. )
- Miklau, Suciu.Controlling Access to Published Data
Using Cryptography. VLDB 2003.
- Martin Abadi and Phillip Rogaway. Reconciling two views
of cryptography (The computational soundness of formal
encryption). {IFIP} International Conference on Theoretical
Computer Science. 2000. (The first paper presents
techniques for enforcing access control over published
documents. The resulting encrypted documents are difficult
to analyze using cryptographic techniques. The second paper
contains some techniques related to this difficulty.)
- Martin Abadi and Phillip Rogaway. Reconciling two
views of cryptography (The computational soundness of formal
encryption). {IFIP} International Conference on
Theoretical Computer Science. 2000. (The first paper
presents techniques for enforcing access control over
published documents. The resulting encrypted documents are
difficult to analyze using cryptographic techniques. The
second paper contains some techniques related to this
difficulty.)
- Access Control
- Bertino, Jajodia, Samarati. Database Security:
Research and Practice. IS 20 (7) 1995. (Survey of
access control models for relational databases including
discretionary and mandatory access control models.)
- T. Yu, D. Srivastava, L. Lakshmanan, and H. Jagadish.
Compressed Accessibility Map: Efficient Access Control for
XML. VLDB 2002
- Watermark
- Agrawal, Kiernan : Watermarking Relational Databases.
VLDBJ 2003
Distributed and Parallel Databases
- Replication
- David A. Patterson, Garth A. Gibson, and Randy H. Katz:
A Case for Redundant Arrays of Inexpensive Disks (RAID).
SIGMOD 1988. (Sigmod 10-year best paper)
- Survey: Peter M. Chen, Edward K. Lee, Garth A.
Gibson, Randy H. Katz, David A. Patterson. RAID:
High-Performance, Reliable Secondary Storage. ACM
Computing Surveys, 26(2), 1994. (Must-read)
- Textbook: P. A. Bernstein, E. Newcomer.
Principles of Transaction Processing, 2nd ed. Chapter 10
(Replication).
- J. Gray, P. Helland, P.E. O'Neill, D. Shasha.
The Dangers of Replication and a Solution. SIGMOD
1996.
- Distributed Database
- Textbook: T. Oszu, P. Valduriez. Principles of
Distributed Database Systems, 2nd ed. Chapter 4
(Distributed Database Systems), pp. 82-99; Chapter 5
(Distributed Database Design), pp. 107-154, skimming
examples, algorithms, and Section 5.4.3.
- Survey: D. Kossman. The State of the Art
in Distributed Query Processing. ACM Computing
Surveys 32(4), 2000, pp. 418-469. (Must-read)
- R. Braumandl, M. Keidl, A. Kemper, D. Kossmann, A.
Kreutz, S. Seltzsam, K. Stocker: ObjectGlobe: Ubiquitous
query processing on the Internet. VLDB Journal 10(1): 48-71
(2001).
- M. Stonebraker, P. M. Aoki, R. Devine, W. Litwin, M.
Olson. Mariposa: A New Architecture for Distributed Data.
ICDE 1994.
- Parallel Database
- Textbook: T. Oszu, P. Valduriez. Principles of
Distributed Database Systems, 2nd ed. Chapter 13
(Parallel Database Systems), pp. 420-452. (Must-read)
- Survey: D. DeWitt and J. Gray. Parallel
Database Systems: The Future of High Performance Database
Systems. CACM 35(6), 85-98, 1992.
-
Distributed query optimization
Peer-to-peer System
- Peer data management system
- Alon Y. Halevy, Zachary G. Ives, Peter Mork, Igor
Tatarinov: Piazza: data management infrastructure for
semantic web applications. WWW 2003: 556-567. (Must-read)
- Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig,
Dan Suiu. What can Databases do for Peer-to-Peer?
WebDB 2001. (Must-read)
- Content-based network routing
- S. Ratnasamy, P. Francis, M. Handley, R. Karp, S.
Shenker. A Scalable Content-Addressable Network.
SIGCOMM 2001.
- B. Y. Zhao, J. D. Kubiatowicz, A. D. Joseph.
Tapestry: An Infrastructure for Fault-tolerant Wide-area
Location and Routing. Berkeley TR UCB//CSD-01-1141,
2000.
- Data migration/lookup
- I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, H.
Balakrishnan. Chord: A scalable peer-to-peer lookup
service for Internet applications. SIGCOMM 2001.
- S. Q. Zhuang, B. Y. Zhao, A. D. Joseph, R. H. Katz, J.
Kubiatowicz. Bayeux: An Architecture for Scalable and
Fault-tolerant Wide-Area Data Dissemination.
International Workshop on Network and Operating System
Support for Digital Audio and Video (NOSSDAV), 2001.
- A. Rowstron, P. Druschel. Storage Management and
Caching in PAST: A Large-scale, Persistent Peer-to-peer
Storage Utility. SOSP 2001.
- C. G. Plaxton, R. Rajaraman and A. W. Richa.
Accessing nearby copies of replicated objects in a
distributed environment. ACM Symposium on Parallel
Algorithms and Architectures, 1997.
- M. Stonebraker, R. Devine, M. Kornacker, W. Litwin, A.
Pfeffer, A. Sah, C. Staelin. An Economic Paradigm for
Query Processing and Data Migration in Mariposa.
International Conference on Parallel and Distributed
Information Systems, 1994.
- J. Sidell, P.M. Aoki, A. Sah, C. Staelin, M.
Stonebraker, A. Yu. Data Replication in Mariposa.
ICDE 1996.
- W. Litwin, M.A. Neimat, D. Schneider. LH* -- Linear
hashing for Distributed Files. SIGMOD 1993.
Up to top
Database and Information Retrieval
- Surajit Chaudhuri, Raghu Ramakrishnan, Gerhard Weikum.
Integrating DB and IR
Technologies: What is the Sound of One Hand Clapping? CIDR
2005.
Information Retrieval
- Survey:
- A. Singhal. Modern information retrieval: a brief
overview. IEEE data engineering bulletin, special issue
on text and databases, 24(4), 2001.
- Christos Faloutsos and Douglas W. Oard. A survey of
information retrieval and fitering methods. 1995.
- TF/IDF: Thorsten Joachims. A probabilistic
analysis of the Rocchio algorithm with TFIDF for text
categorization. 1996. (Must-read)
- LSI
- Scott Deerwester, Susan T. Dumais, George W. Furnas,
Thomas K. Landauer, and Richard Harshman. Indexing by
latent semantic analysis. Journal of the American
Society of Information Science, 1990. (Must-read)
- Christos Papadimitriou et al. Latent Semantic
Indexing: A Probabilistic Analysis. PODS 98
- Google and PageRank: Sergey Brin and Lawrence Page.
The anatomy of a large-scale hypertextual web search engine.
Computer networks and ISDN systems, 30(1-7), 1998. (Must-read)
- Query-Answering System: Eric Brill, Susan Dumais, and
Michele Banko. An analysis of the AskMSR question-answering
system. 2002.
- C. Faloutsos. Access Methods for Text. ACM Computing
Surveys 17(1), 1985, pp. 49-74.
- J. M. Kleinberg. Authoritative Sources in a Hyperlinked
Environment. JACM 46(5), 1999, pp. 604-632.
-
Indexing
- Ricardo Baeza-Yates and Berthier Ribeioro-Neto, eds,
Modern Information Retrieval, Addison-Wesley, 1999.
Chapter 8 (Indexing and Searching).
- Inverted List and PAT Tree: G. Gonnet, Ricardo
Baeza-Yates, and T. Snider: Lexicographical Indices for
Text: Inverted Files vs. PAT Trees. Technical Report
TR-OED-91-01, University of Waterloo, 1991
- Suffix Array: Udi Manber and Gene Myers:
Suffix Arrays: A New Method for On-Line String Searches.
ACM-SIAM SODA, 1990
- Multigram Index: Junghoo Cho and Sridhar
Rajagopalan: A Fast Regular Expression Indexing Engine.
ICDE 2001.
Up to top
Information Extraction
- Oren Etzioni et al. Web-scale information extraction in
KnowItAll. WWW, 2004.
- G. Salton, editor. The SMART Retrieval System-Experiments
in Automatic Document Retrieval. Prentice Hall Inc.,
Englewood Cli s, NJ, 1971.
Up to top
Approximate Queries in Databases
- Keyword Search
- Qi Su and Jennifer Widom. Indexing relational
database content offline for efficient keyword-based
search. IDEAS 2005.
- Vagelis Hristidis, Luis Gravano and Yannis
Papakonstantinou. Efficient IR-style keyword search
over relational databases. VLDB 2003.
- Hristidis, Papakonstantinou. DISCOVER: Keyword
Search in Relational Databases. VLDB, 2002.
- Agrawal, Chaudhari, Das. DBExplorer: A System for
Keyword-Based Search over Relational Databases.
ICDE, 2002.
- G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti
and S. Sudarshan. Keyword searching and browsing in
databases using BANKS. ICDE 2002.
- Goldman, Shivakumar, Venkatsubramanian,
Garcia-Molina. Proximity Searches in Databases.
VLDB, 1998.
- Keyword search on XML
- Yu Xu and Yannis Papakonstantinou. Efficient
keyword search for smallest LCAs in XML databases.
Sigmod 2005.
- Vagelis Hristidis, Yannis Papakonstantinou and
Andrey Balmin. Keyword proximity search on XML
graphs. ICDE 2003.
- Michael Barg and Raymond K. Wong. Structural
proximity searching for large collections of
semi-structured data. CIKM 2001.
- Search XML Collection: J. Naughton et al.
The Niagara Internet query system. IEEE Data
Engineering Bulletin, 24(2):27-33, 2001.
- Full-text Search
- RDB
- U. Masermann and G. Vossen. Schema independent
database querying (on and off the web). IDEAS 2000.
- U. Masermann and Gottfried Vossen. Design and
implementation of a novel approach to keyword searching
in relational databases. ADBIS-DASFAA Symposium,
2000.
- XML
- Tutorial: Sihem Amer-Yahia, Jayavel
Shanmugasundaram. XML Full-Text Search:
Challenges and Opportunities. VLDB 2005. (Must-read)
- Holger Florke, Norbert Fuhr, Kenji Hatano, Borkur
Sigurbjornsson, Andrew Trotman, and Masahiro Watanabe.
Queries, INEX 2003 working group report. 2004.
- Theory result: Yaron Kanza and Yehoshua
Sagiv. Flexible queries over semistructured data.
PODS 2001.
- TAG+Keyword: S. Cohen, J. Namou, Y. Kanza and
Y. Sagiv. XSEarch: A semantic search engine for XML.
VLDB 2003.
- Path+Keyword: Norbert Fuhr and Kai
Grobjohann. XIRQL: A query language for information
retrieval in XML documents. SIGIR 2001.
- Full-text search:
- Vincent Aguilera, Sophie Cluet, Pierangelo
Veltri, Dan Vodislav, and Fanny Wattez. Querying
XML Documents in Xyleme.
- Daniela Florescu, Donald Kossmann and Ioana
Manolescu. Integrating keyword search into XML
query processing. WWW 2000.
- Shurug Al-Khalifa, Cong Yu, and H. V. Jagadish.
Querying structured text in an XML database.
Sigmod 2003.
- Sihem Amer-Yahia, Mary Fernandez, Divesh
Srivastava, and Yu Xu. PIX: A system for phrase
matching in XML documents: a demonstration.
Sigmod 2003.
- Chavdar Botev, Sihem Amer-Yahia, and Jayavel
Shanmugasundaram. Expressivemess and performance
of full-text search languages.
- Amer-Yahia, C. Botev, J. Shanmugasundaram.
TeXQuery: A Full-Text Search Extension to XQuery.
WWW 2004
- Emiran Curtmola, Sihem Amer-Yahia, Philip Brown,
and Mary Fernandez. GalaTex: A conformant
implementation of the XQuery full-text language.
Sigmod 2005.
- Structural Proximity Search
- RDB
- Xiaoxin Yin, Jiawei Han and Jiong Yang. Searching
for related objects in relational databases. SSDBM
2005.
- XML
- Relax label:
- Anja Theobald and Gerhard Weikum. Adding
relevance to XML. WebDB 2000.
- Theobald, Weikum. The Index-Based XXL Search
Engine for Querying XML Data with Relevance Ranking.
EDT, 2002. (Must-read)
- Anja Theobald. An ontology for
domain-oriented semantic similarity search on XML
data. BTW 2003.
- Relax structure:
- Yunyao Li, Cong Yu and H.V.Jagadish.
Schema-free XQuery. VLDB 2004.
- David Carmel, Yoelle S. Maarek, Matan
Mandelbrod, Yosi Mass, and Aya Soffer. JuruXML:
Searching XML Documents via XML Fragments. SIGIR
2003.
(Full report: David Carmel, Nadav Efraty, Gad M.
Landau, Yoelle S. Maarek, and Yosi Mass. An
extension of the vector space model for querying XML
document via XML fragments.)
- Sihem Amer-Yahia, Laks V. S. Lakshmanan, and
Shashank Pandit. FleXPath: Flexible structure and
full-text querying for XML. Sigmod 2004.
- Sihem Amer-Yahia, Nick Koudas, Amelie Marian,
Divesh Srivastava, David Toman. Structure and
content scoring for XML. VLDB 2005.
- Natural language query:
- Yunyao Li, Huahai Yang, H. V. Jagadish.
NaLIX: an interactive natural language interface for
querying XML. Sigmod 2005 best demo.
- Approximate Search
- Liang Jin, Nick Koudas, Chen Li and Anthony K. H.
Tung. Indexig mixed types for approximate retrieval.
VLDB 2005.
- Luis Gravano, Panagiotis Ipeirotis, H. V. Jagadish,
Nick Koudas, S. Muthukrishnan, and Divesh Srivastava.
Approximate String Joins in a Database (Almost) for Free.
VLDB, 2001.
- William Cohen. Data Integration using
Similarity Joins and a Word-based Information Representation
Language. In ACM Transactions on Information Systems
18(3): 288-321 (2000)
- Fagin. Fuzzy Queries in Multimedia Database Systems.
PODS, 1998.
- TOP-K Query
- Ronald Fagin, Amnon Lotem, Moni Naor Optimal
Aggregation Algorithms for Middleware. PODS, 2001 (Best-paper,
must-read)
- Ronald Fagin, Ravi Kumar, and D. Sivakumar. Efficient
similarity search and classification via rank aggreagation.
SIGMOD 2003.
- Ranking
- RDB:
- Lin Guo, Jayavel Shanmugasundaram, Kevin S. Beyer,
Eugene J. Shekita: Efficient Inverted Lists and Query
Algorithms for Structured Value Ranking in
Update-Intensive Relational Databases. ICDE 2005:
298-309
- Xiaoxin Yin, Jiawei Han and Jiong Yang. Searching
for related objects in relational databases. SSDBM
2005.
- Agrawal, Chaudhari, Das, Gionis. Automated
Ranking of Database Query results. CIDR, 2003 (Must-read)
- XML:
- Chavdar Botev, Jayavel Shanmugasundaram:
Context-Sensitive Keyword Search and Ranking for XML.
WebDB 2005: 115-120
- Guo, Shao, Botev, Shanmugasundaram. XRANK: Ranked
Keyword Search over XML Documents. Sigmod 2003. (Must-read)
- Object-level:
- Li Ding, Rong Pan, Tim Finin, Anupam Joshi, Yun
Peng, and Pranam Kolari. Finding and ranking
knowledge on the semantic web. ISWC 2005.
- Zaiqing Nie, Yuanzhi Zhang, Ji-rong Wen, Wei-Ying
Ma. Object-level ranking: bringing order to web
objects. WWW 2005.
- Andrey Balmin, Vagelis Hristidis and Yannis
Papakonstantinou. ObjectRank: Authority-based keyword
search in databases. VLDB 2004. (Must read)
- Rank associations:
- Kemafor Anyanwu, Angela Maduko, and Amit Sheth.
SemRank: ranking complex relationship search results on
the semantic web. WWW 2005.
- Boanerges Aleman-Meza, Chris Halaschek, I. Budak
Arpinar, and Amit Sheth. Context-aware semantic
association ranking. SWDB 2003.
- Vagelis Hristidis, Nick Koudas and Yannis
Papakonstantinou. PREFER: A System for the Efficient
Execution of Multi-parametric Ranked Queries. SIGMOD
2001.
- Modeling Imprecision
- Probabilistic databases: Nilesh Dalvi and Dan
Suciu. Foundations of probabilistic answers to queries.
Sigmod 2005. (Tutorial, must-read) [bibliography
notes]
- Jennifer Widom.
Trio: A System for Integrated Management of Data, Accuracy,
and Lineage. CIDR 2005.
Up to top
Web Services
- Serge Aboteboul, Omar Benjelloun, Tova Milo. Positive
active XML. PODS 2004.
- Serge Abiteboul, Omar Benjelloun, Bogdan Cautis, Ioana
Manolescu, Tova Milo, Nicoleta Preda. Lazy query evaluation
for active XML. Sigmod 2004.
- Semantics of Web Services
- Anupriya Ankolekar, et al.: DAML-S: Web service
description for the semantic web.
- Massimo Paolucci, Takahiro Kawamura, Terry R. Payne, and
Katia Sycara. Semantic matching of web services
capabilities.
- Andreas Heb and Nicholas Kushmerick. Machine learning
for annotating semantic web services.
- Andreas Heb and Nicholas Kushmerick. Learning to
attach semantic metadata to web services. Semantic Web,
2003.
- Andreas Heb, Eddie Johnston, and Nicholas Kushmerick.
ASSAM: A tool for semi-automatically annotating semantic web
servcies. 2004.
- Andreas Heb, Nick Kushmerick. Iterative ensemble
classificiation for relational data: a case study of
semantic web services. 2004.
- Web Service Discovery
- Faith Emekci, Ozgur D. Sahin, Divyakant Agrawal, Amr El
Abbadi. A peer-to-peer framework for web service
discovery with ranking.
- Xin Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun
Zhang. Similarity search for web services. VLDB 2004.
- Web Service Composition
Ghandeharizadeh, S., Knoblock, C.A., Papadopoulos, C.,
Shahabi, C., Alwagait, E., Ambite, J.L., Cai, M., Chen,
C.C., Pol, P., Schmidt, R., Song, S., Thakkar, S., Zhou, R.:
Proteus: A system for dynamically composing and
intelligently executing web services. ICWS 2003.
- Thakkar, S., Knoblock, C.A., Ambite, J., Shahabi, C.:
Dynamically composing web services from on-line sources.
AAAI workshop on intelligent service integration, 2002.
- Evren Sirin, James Hendler, and Bijan Parsia.
Semi-automatic composition of web services using semantic
descriptions.
- Liangzhang Zeng, Boualem Benatallah, Marlon Dumas,
Jayant Kalagnanam, Quan Z. Sheng. Quality driven web
services composition. WWW, 2003.
- Automatic Programming
- Charles Rich and Richard C. Waters. Approach to
automatic programming. Tech report 92-04, MIT, 1992.
- Amy Moormann Zaremski and Jeannette M. Wing.
Specification matching of software components. ACM
TOSEM, 1997.
Up to top
Books
- Database Systems
- J. Ullman and J. Widom. A First Course in
Database Systems.
- Ramakrishnan. Database Management Systems.
- Ramakrishnan & Gerke. Database Management
Systems, 2nd ed.
- Korth & Silberschatz. Database System
Concepts, 2nd ed.
- Stonebraker, ed. Readings in Databases, 3rd ed.
- Garcia-Molina, Ullman, Widom. Database System
Implementation.
- Elmagarmid, et al. Management of Heterogeneous
and Autonomous Database Systems.
- Object-Oriented and Object-Relational
- Stonebraker & Moore. Object-Relational DBMSs:
The Next Great Wave.
- XML
- Abiteboul, et al. Data on the Web.
- Pitts-Moultis & Kirk. XML Black Book.
-
Query Processing
- Yu & Meng. Principles of Database Query
Processing for Advanced Applications.
- Garcia-Molina, Ullman, Widom. Database System
Implementation.
- Ramakrishnan. Database Management Systems.
-
Transaction Processing
- Distributed/Parallel Databases
- Oszu & Valduriez. Principles of Distributed
Database Systems.
- Other Database System Topics
- Widom & Ceri. Active Database Systems.
- Zaniolo, et al. Advanced Database Systems.
-
Theory
- Abiteboul, et al. Foundations of Databases.
- Christos H. Papadimitriou. Computational Complexity.
- Information Retrieval
- Baeza-Yates, Ribeiro-Neto. Modern Information
Retrieval.
- Belew, R.K. Finding out about--A cognitive
perspective on search engine technology and the www.
Cambridge University Press, 2000
- Artificial Intelligence
- S. Russell & P. Norvig, Artificial Intelligence: A
Modern Approach (2nd ed.), Prentice Hall, 2003
- Data Mining
- Tom Mitchell, Machine Learning, McGraw-Hill, 1997
- David Hand, Heikki Mannila and Padhraic Smyth,
Principles of Data Mining, MIT Press, 2001
- Jiawei Han and Micheline Kamber, Data Mining:
Concepts and Techniques, Morgan Kaufmann, 2000
- Fayyad, et al. Advances in Knowledge Discovery
& Data Mining.
- Statistics
- D. Koller & N. Friedman, Bayesian Networks and
Beyond: Probabilistic Models for Learning and Reasoning,
MIT Press
- M. DeGroot & M. Schervish, Probability and
Statistics (3rd ed.), Addison-Wesley, 2002
- Java
- Joshua Bloch.
Effective Java: Programming Language Guide.
- Lemay & Perkins. Teach Yourself Java 1.1 in 21
Days, 2nd ed.
- Java in a Nutshell.
- Downing. Java RMI.
- Maximum Java 1.1.
- Orfali & Harkey. Client/Server Programming
with Java and CORBA.
- Van Haecke. JDBC: Java Database Connectivity.
- C++
-
Gregory. Using Visual C++ 4.2,
Special Edition.
-
Visual C++ 5 Unleashed.
-
Kruglinski, Inside Visual C++.
- SQL
- Chamberlin. A Complete Guide to DB2 Universal
Database.
- LaTeX
- Lamport. LaTeX: A Document Preparation System.
- LaTeX Graphics Companion.
- ASP
- Using Active Server Pages, Special Edition.
|