Database Tuning from Relational Database to Big Data

Database Tuning from Relational Database to Big Data

  IJETT-book-cover           
  
© 2023 by IJETT Journal
Volume-71 Issue-11
Year of Publication : 2023
Author : Bery Leouro MBAIOSSOUM, Ladjel BELLATRECHE, Narkoy BATOUMA, Ahmat Mahamat DAOUDA
DOI : 10.14445/22315381/IJETT-V71I11P209

How to Cite?

Bery Leouro MBAIOSSOUM, Ladjel BELLATRECHE, Narkoy BATOUMA, Ahmat Mahamat DAOUDA, "Database Tuning from Relational Database to Big Data," International Journal of Engineering Trends and Technology, vol. 71, no. 11, pp. 90-99, 2023. Crossref, https://doi.org/10.14445/22315381/IJETT-V71I11P209

Abstract
The objective of this work is to present the database tuning from relational databases to Big Data. It revisits the tools of the physical design. It examines their applicability to the main types of databases, in particular legacy (hierarchical or network), relational, object-oriented and NoSQL databases and Big Data. A literary review is done on database tuning tools to examine how to bring them closer to the DB lifecycle. It is noted that modern physical design techniques consider all phases of the DB lifecycle. Database tuning has evolved as new database types emerge. There is a vertical evolution of database tuning with the addition of new phases when a new database type appears and a horizontal evolution resulting in the enrichment of each phase of the DB lifecycle by considering new tools. This phenomenon with the Bigata data results vertically with multiple types of Big Data schemas and horizontally with the advent of the map-reduce technique. The database administrator has to consider these evolutions in the database tuning work.

Keywords
Databases tuning, Queries optimization, Big Data, Databases evolution, Databases lifecycle, NoSQL.

References
[1] Dimitri Theodoratos, and Timos Sellis, “Designing Data Warehouses,” Data and Knowledge Engineering, vol. 31, no. 3, pp. 279-301, 1999.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Mehrad Eslami et al., “Query Batching Optimization in Database Systems,” Computers and Operations Research, vol. 121, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Surajit Chaudhuri, and Vivek Narasayya, “Self-tuning Database Systems: A Decade of Progress,” Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 3-14, 2007.
[Google Scholar] [Publisher Link]
[4] Bailu Ding, Surajit Chaudhuri, and Vivek Narasayya, “Bitvector-Aware Query Optimization for Decision Support Queries,” Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 2011-2026, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Robert M. Pecherer, “Efficient Evaluation of Expressions in a Relational Algebra,” Association for Computing Machinery Pacific, vol. 75, pp. 44-49, 1975.
[Google Scholar] [Publisher Link]
[6] John Miles Smith, and Philip Yen-Tang Chang, “Optimizing the Performance of a Relational Algebra Database Interface,” Communications of the ACM, vol. 18, no. 10, pp. 568-579, 1975.
[CrossRef] [Google Scholar] [Publisher Link]
[7] P.A.V. Hall, “Optimization of a Single Relational Expression in a Relational Data Base Management System,” IBM Journal of Research and Development, vol. 20, no. 3, pp. 247-257, 1976.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Frank P. Palermo, “A Database Search Problem,” Information Systems, pp. 67-101, 1974.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Leo R. Gotlieb, “Computing Joins of Relations, Proceedings of the 1975 ACM SIGMOD International Conference on Management of Data, pp. 55-63, 1975.
[CrossRef] [Google Scholar] [Publisher Link]
[10] S. Bing Yao, “Optimization of Query Evaluation Algorithms,” ACM Transactions on Database Systems, vol. 4, no. 2, pp. 133-155, 1979.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Eugene Wong, and Karel Youssefi, “Decomposition-a Strategy for Query Processing,” ACM Transactions on Database Systems, vol. 1, no. 3, pp. 223-241, 1976.
[CrossRef] [Google Scholar] [Publisher Link]
[12] M.M. Astrahan et al., “System R: Relational Approach to Database Management,” ACM Transactions on Database Systems, vol. 1, no. 2, pp. 97-137, 1976.
[CrossRef] [Google Scholar] [Publisher Link]
[13] P. Griffiths Selinger et al., “Access Path Selection in a Relational Database Management System,” Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, pp. 23-34, 1979.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Ashok K. Chandra, and Philip M. Merlin, “Optimal Implementation of Conjunctive Queries in Relational Data Bases,” Proceedings of the 9th Annual ACM Symposium on Theory of Computing, pp. 77-90, 1977.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Alfred Aho, Y. Sagiv, and Jeffrey Ullman, “Equivalences among Relational Expressions,” Society for Industrial and Applied Mathematics Journal on Computing, vol. 8, no. 2, pp. 218-246, 1979.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Yehoshua Sagiv, and Mihalis Yannakakis, “Equivalences among Relational Expressions with the Union and Difference Operators,” Journal of the Association for Computing Machinery, vol. 27, no. 4, pp. 633-655, 1980.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Won Kim, Kyung-Chang Kim, and Alfred Dale, Indexing Techniques for Object-Oriented Databases, Object-Oriented Concepts, Databases, and Applications, pp. 371-394, 1987.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Alfons Kemper, and Guido Moerkotte, “Advanced Query Processing in Object Bases Using Access Support Relations,” Proceedings of the 16th International Conference on Very Large Data Bases, pp. 290-301, 1990.
[Google Scholar] [Publisher Link]
[19] E. Bertino, and W. Kim, “Indexing Techniques for Queries on Nested Objects,” IEEE Transactions on Knowledge and Data Engineering, vol. 1, no. 2, pp. 196-214, 1989.
[CrossRef] [Google Scholar] [Publisher Link]
[20] E. Bertino, “An Indexing Technique for Object-Oriented Databases,” Proceedings Seventh International Conference on Data Engineering, pp.160-170, 1991.
[CrossRef] [Google Scholar] [Publisher Link]
[21] M. Tamer Ozsu, and Jose A. Blakeley, “Query Processing in Object-Oriented Database Systems,” Modern Database Systems, pp. 1-19, 1995.
[Google Scholar] [Publisher Link]
[22] Georges Gardarin, Jean-Robert Gruser, and Zhao-Hui Tang, “A Cost Model for Clustered Object-Oriented Databases,” Proceedings of 21st International Conference on Very Large Databases, pp. 323-334, 1995.
[Google Scholar] [Publisher Link]
[23] Rosana S.G. Lanzelotte, and Patrick Valduriez, “Extending the Search Strategy in a Query Optimizer,” Proceedings of the 17th International Conference on Very Large Data Bases, vol. 91, pp. 363-373, 1991.
[Google Scholar] [Publisher Link]
[24] E. Bertino et al., “Object-Oriented Query Languages: The Notion and the Issues,” IEEE Transactions on Knowledge and Data Engineering, vol. 4, no. 3, pp. 223-237, 1992.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Himanshu Gupta et al., “Index Selection for OLAP,” Proceedings 13th International Conference on Data Engineering, pp. 208-219, 1997.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Himanshu Gupta, “Selection of views to Materialize in a Data Warehouse,” International Conference on Database Theory, pp. 98-112, 1997.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Alexandre A.B. Lima et al., “Parallel OLAP Query Processing in Database Clusters with Data Replication,” Distributed and Parallel Databases, vol. 25, pp. 97-123, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Sanjay Agrawal, Vivek Narasayya, and Beverly Yang, “Integrating Vertical and Horizontal Partitioning into Automated Physical Database Design,” Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 359-370, 2004.
[CrossRef] [Google Scholar] [Publisher Link]
[29] Kamel Boukhalfa, “From Physical Design to Data Warehouse Administration and Tuning Tools,” Isae-Ensma Ecole Nationale Supérieure de Mécanique et d’Aérotechique - Poitiers, 2009.
[Google Scholar] [Publisher Link]
[30] Douglas Comer, “The Difficulty of Optimum Index Selection,” ACM Transactions on Database Systems, vol. 3, no. 4, pp. 440-445, 1978.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Nicolas Pasquier et al., “Discovering Frequent Closed Itemsets for Association Rules,” International Conference on Database Theory, pp. 398-416, 1999.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Surajit Chaudhuri, and Vivek Narasayya, “Auto Admin “what-if” Index Analysis Utility,” ACM SIGMOD Record, vol. 27, no. 2, pp. 367- 378, 1998.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Rupali Chopade, and Vinod Pachghare, “MongoDB Indexing for Performance Improvement,” ICT Systems and Sustainability, pp. 529- 539, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[34] Himanshu Gupta, Selection and Maintenance of Views in a Data Warehouse, Stanford University, pp. 1-114, 1999.
[Google Scholar] [Publisher Link]
[35] Jeffrey D. Ullman, “Efficient Implementation of Data Cubes Via Materialized Views,” Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 386-388, 1996.
[Google Scholar] [Publisher Link]
[36] Philip A. Bernstein, and Dah-Ming W. Chiu, “Using Semi-Joins to Solve Relational Queries,” Journal of the Association for Computing Machinery, vol. 28, no. 1, pp. 25-40, 1981.
[CrossRef] [Google Scholar] [Publisher Link]
[37] Michel E. Adiba, and Bruce G. Lindsay, “Database Snapshots,” Proceedings of the Sixth International Conference on Very Large Data Bases, vol. 6, pp. 86-91, 1980.
[Google Scholar] [Publisher Link]
[38] Jose A. Blakeley, Per-Ake Larson, and Frank Wm Tompa, “Efficiently Updating Materialized Views,” ACM SIGMOD Record, vol. 15, no. 2, pp. 61-71, 1986.
[CrossRef] [Google Scholar] [Publisher Link]
[39] A. Segev, and J. Park, “Maintaining Materialized views in Distributed Databases,” Proceedings Fifth International Conference on Data Engineering, pp. 262-270, 1989.
[CrossRef] [Google Scholar] [Publisher Link]
[40] Divesh Srivastava et al., “Answering Queries with Aggregation Using Views,” Proceedings of the 22th International Conference on Very Large Data Bases, pp. 318-329, 1996.
[Google Scholar] [Publisher Link]
[41] Pavan Edara, and Mosha Pasumansky, “Big Metadata: When Metadata is Big Data,” Proceedings of the VLDB Endowment, vol. 14, no. 12, pp. 3083-3095, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[42] Sharma Chakravarthy et al., “An Objective Function for Vertically Partitioning Relations in Distributed Databases and its Analysis,” Distributed and Parallel Databases, vol. 2, pp. 183-207, 1994.
[CrossRef] [Google Scholar] [Publisher Link]
[43] Stefano Ceri, Mauro Negri, and G. Pelagatti, “Horizontal Data Partitioning in Database Design,” Proceedings of the 1982 ACM SIGMOD International Conference on Management of Data, pp. 128-136, 1982.
[CrossRef] [Google Scholar] [Publisher Link]
[44] Pankaj Gupta, and Prakashkumar Patel, “Demystifying Databases: Exploring their Use Cases,” SSRG International Journal of Computer Science and Engineering, vol. 10, no. 6, pp. 43-53, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[45] Thomas Stöhr, Holger Märtens, and Erhard Rahm, “Multi-Dimensional Database Allocation for Parallel Data Warehouses,” Proceedings of the 26th International Conference on Very Large Data Bases, pp. 273-284, 2000.
[Google Scholar] [Publisher Link]
[46] Radu Prodan, and Thomas Fahringer, Grid Computing: Experiment Management, Tool Integration, and Scientific Workflows, Springer, pp. 1-317, 2007.
[Google Scholar] [Publisher Link]
[47] Bonneau Sophie, and Hameurlain Abdelkader, “Placement of SQL Query(s) on a Parallel Distributed Memory Architecture: From Static to Dynamic,” PhD Thesis, University of Toulouse, pp. 1-213, 1999.
[Google Scholar] [Publisher Link]
[48] Jeffrey Dean, and Sanjay Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.
[CrossRef] [Google Scholar] [Publisher Link]
[49] Tom White, Hadoop: The Definitive Guide, O’Reilly, pp. 1-657, 2012. [Google Scholar] [Publisher Link]
[50] Bikas Saha et al., “Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications,” Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1357-1369, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[51] Michael Armbrust et al., “Spark SQL: Relational Data Processing in Spark,” Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1383-1394, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[52] Avrilia Floratou, Umar Farooq Minhas, and Fatma Özcan, “SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures,” Proceedings of the Very Large Data Bases Endowment, vol. 7, no. 12, pp. 1295-1306, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[53] Olivier Gruber, and Laurent Amsaleg, “Object Grouping in EOS,” Unite De Recherche Inria-Rocquencourtpp, pp. 1-20, 1992.
[Google Scholar] [Publisher Link]
[54] Georges Gardarin, Jean-Robert Gruser, and Zhao-Hui Tang, “Cost-Based Selection of Path Expression Processing Algorithms in ObjectOriented Databases,” Very Large Data Base, pp. 1-26, 1996.
[Google Scholar] [Publisher Link]
[55] Mitesh Athwani, “A Novel Approach to Version XML Data Warehouse,” SSRG International Journal of Computer Science and Engineering, vol. 8, no. 9, pp. 5-11, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[56] Abbassi Kamel, and Tahar Ezzedine, “Dynamic Selection of Indexes and Views Materialize with Algorithm Knapsack,” 2019 International Conference on Internet of Things, Embedded Systems and Communications (IINTEC), pp. 214-219, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[57] Mohamed Mehdi Kandi, Shaoyi Yin, and Abdelkader Hameurlain, “SLA-Driven Resource Re-Allocation for SQL-Like Queries in the Cloud,” Knowledge and Information Systems, vol. 62, pp. 4653-4680, 2020.
[CrossRef] [Google Scholar] [Publisher Link]