Ensemble Machine Learning-Based Real Estate Price Prediction with Explainable Artificial Intelligence Methods for Determinant Analysis

Ensemble Machine Learning-Based Real Estate Price Prediction with Explainable Artificial Intelligence Methods for Determinant Analysis

  IJETT-book-cover           
  
© 2025 by IJETT Journal
Volume-73 Issue-10
Year of Publication : 2025
Author : Matthew C. Okoronkwo Ugochukwu E. Orji, Chikodili H. Ugwuishiwu, Caroline N. Asogwa, Emenike C. Ugwuagbo, Bande S. Ponsak
DOI : 10.14445/22315381/IJETT-V73I10P106

How to Cite?
Matthew C. Okoronkwo Ugochukwu E. Orji, Chikodili H. Ugwuishiwu, Caroline N. Asogwa, Emenike C. Ugwuagbo, Bande S. Ponsak,"Ensemble Machine Learning-Based Real Estate Price Prediction with Explainable Artificial Intelligence Methods for Determinant Analysis", International Journal of Engineering Trends and Technology, vol. 73, no. 10, pp.79-94, 2025. Crossref, https://doi.org/10.14445/22315381/IJETT-V73I10P106

Abstract
The Real Estate (RE) industry is an essential part of many countries’ economies, and accurately forecasting housing prices is beneficial to buyers, real estate agents, and the government. However, multiple factors influence the prices of RE properties, which are difficult to measure; the relationship between housing prices and housing characteristics is complex and nonlinear, requiring a flexible algorithm and tools. Three regression-based models were developed using Neural Network (NN), Random Forest (RF), and Extreme Gradient Boosting (XGB) algorithms to predict house prices. Explainable Artificial Intelligence (XAI) methods were deployed to explain the key factors influencing RE prices. The dataset used has 923,159 records, available on Kaggle. The models were evaluated using four zip codes, and the house size influenced the price prediction for the RF model. For efficient RE price performance evaluation, the following metrics were computed: squared (R2), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE). These metrics were applied to evaluate the machine learning models adopted in the research, and the results show that the XGB model performed better with R2 = 0.817011. The XGB model, SHapley Additive exPlanations (SHAP) plot showed that acre lot and bath are the most influential determinants in predicting the price of houses, while the Individual Conditional Expectation (ICE) plots showed that bath, by the traditional evaluation methods. The result promises a better decision support to potential RE buyers in selecting houses that meet their specific needs.

Keywords
Explainable AI, Ensemble Learning, Housing Price Prediction, Random Forest, Gradient Boosting Analysis.

References
[1] Amina Adadi, and Mohammed Berrada, “Peeking inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI),” IEEE Access, vol. 6, pp. 52138-52160, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Leo Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Liujia Chen et al., “Measuring Impacts of Urban Environmental Elements on Housing Prices based on Multisource Data-A Case Study of Shanghai, China,” International Journal of Geo-Information, vol. 9, no. 2, pp. 1-23, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Tianqi Chen, and Tong He, “Xgboost: Extreme Gradient Boosting,” R Package Version 0.4-2, pp. 1-4, 2025.
[Google Scholar]
[5] Adele Cutler, D. Richard Cutler, and John R. Stevens, Random Forests, Ensemble Machine Learning, Springer, pp. 157-175, 2012.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Xibin Dong et al., “A Survey on Ensemble Learning,” Frontiers of Computer Science, vol. 14, no. 2, pp. 241-258, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[7] A. Fazel Famili et al., “Data Preprocessing and Intelligent Data Analysis,” Intelligent Data Analysis, vol. 1, no. 1-4, pp. 3-23, 1997.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Jerome H. Friedman, “Greedy Function Approximation: A Gradient Boosting Machine,” The Annals of Statistics, vol. 29, no. 5, pp. 1189-1232, 2001.
[Google Scholar] [Publisher Link]
[9] Jerome Friedman, Trevor Hastie, and Robert Tibshirani, “Additive Logistic Regression: A Statistical view of Boosting (with Discussion and a Rejoinder by the Authors),” The Annals of Statistics, vol. 28, no. 2, pp. 337-407, 2000.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Julia García Cabello, “Mathematical Neural Networks,” Axioms, vol. 11, no. 2, pp. 1-18, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Alex Goldstein et al., “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation,” Journal of Computational and Graphical Statistics, vol. 24, no. 1, pp. 44-65, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Riccardo Guidotti et al., “A Survey of Methods for Explaining Black Box Models,” ACM Computing Surveys, vol. 51, no. 5, pp. 1-42, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[13] David Gunning et al., “XAI-Explainable Artificial Intelligence,” Science Robotics, vol. 4, no. 37, pp. 1-6, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Dave Gunning et al., “DARPA's Explainable AI (XAI) Program: A Retrospective,” Authorea Preprints, pp. 1-12, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Hani Hagras, “Toward Human-Understandable, Explainable AI,” Computer, vol. 51, no. 9, pp. 28-36, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Salim Heddam, Intelligent Data Analytics Approaches for Predicting Dissolved Oxygen Concentration in River: Extremely Randomized Tree Versus Random Forest, MLPNN and MLR, Intelligent Data Analytics for Decision-Support Systems in Hazard Mitigation, Springer, pp. 89-107, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Andreas Holzinger et al., Explainable AI Methods - A Brief Overview, xxAI - Beyond Explainable AI, Springer, Cham, pp. 13-38, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Castro Gbememali Hounmenou, Kossi Essona Gneyou, and Romain Lucas Glele Kakaï, “A Formalism of the General Mathematical Expression of Multilayer Perceptron Neural Networks,” Preprints, pp. 1-12, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Anil Jadhav, Dhanya Pramod, and Krishnan Ramanathan, “Comparison of Performance of data Imputation Methods for Numeric Dataset,” Applied Artificial Intelligence, vol. 33, no. 10, pp. 913-933, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Joseph D. Janizek, Safiye Celik, and Su-In Lee, “Explainable Machine Learning Prediction of Synergistic Drug Combinations for Precision Cancer Medicine,” BioRxiv Preprint, pp. 1-5, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Soren Jordan, Hannah L. Paul, and Andrew Q. Philips, “How to Cautiously Uncover the ‘Black Box’of Machine Learning Models for Legislative Scholars,” Legislative Studies Quarterly, vol. 48, no. 1, pp. 165-202, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Hua Li et al., “XGBoost Model and its Application to Personal Credit Evaluation,” IEEE Intelligent Systems, vol. 35, no. 3, pp. 52-61, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Andy Liaw, and Matthew Wiener, “Classification and Regression by Randomforest,” Scientific Research, vol. 2, no. 3, pp. 18-22, 2002.
[Google Scholar] [Publisher Link]
[24] Scott M. Lundberg, and Su-In Lee, “A Unified Approach to Interpreting Model Predictions,” NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, California, USA, pp. 4768 - 4777, 2017.
[Google Scholar] [Publisher Link]
[25] CH. Raga Madhuri, G. Anuradha, and M. Vani Pujitha, “House Price Prediction using Regression Techniques: A Comparative Study,” 2019 International Conference on Smart Structures and Systems (ICSSS), Chennai, India, pp. 1-5, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[26] J. Manasa, Radha Gupta, and N.S. Narahari, “Machine Learning based Predicting House Prices using Regression Techniques,” 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bangalore, India, pp. 624-630, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Yuan Meng et al., “What Makes an Online Review more Helpful: An Interpretation Framework using XGBoost and SHAP Values,” Journal of Theoretical and Applied Electronic Commerce Research, vol. 16, no. 3, pp. 466-490, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Rory Mitchell, and Eibe Frank, “Accelerating the XGBoost Algorithm using GPU Computing,” PeerJ Computer Science, vol. 3, pp. 1-37, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[29] Naci Büyükkaraciğ, “Modern Methods Approach in Real Estate Valuation,” Ankara Iksad Publications House, pp. 1-130, 2021.
[Google Scholar] [Publisher Link]
[30] Mieczysław Lech Owoc, and Adam Stambulski, “Software Quality Management: Machine Learning for Recommendation of Regression Test Suites,” Journal of Economics and Management, vol. 47, no. 1, pp 117-137, 2025.
[Google Scholar] [Publisher Link]
[31] Elli Pagourtzi et al., “Real Estate Appraisal: A Review of Valuation Methods,” Journal of Property Investment and Finance, vol. 21, no. 4, pp. 383-401, 2003.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Byeonghwa Park, and Jae Kwon Bae, “Using Machine Learning Algorithms for Housing Price Prediction: The Case of Fairfax County, Virginia Housing Data,” Expert Systems with Applications, vol. 42, no. 6, pp. 2928-2934, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Prasad Patil, What is Exploratory Data Analysis?, Towards Data Science, 2018. [Online]. Available: https://medium.com/data-science/exploratory-data-analysis-8fc1cb20fd15
[34] Robi Polikar, Ensemble Learning, Ensemble Machine Learning, Springer, New York, NY, pp. 1-34, 2012.
[CrossRef] [Google Scholar] [Publisher Link]
[35] Luca Rampini, and Fulvio Re Cecconi, “Artificial Intelligence Algorithms to Predict Italian Real Estate Market Prices,” Journal of Property Investment and Finance, vol. 40, no. 6, pp. 588-611, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[36] Baijayanta Roy, “All About Feature Scaling,” Towards Data Science, pp. 3-19, 2021.
[Google Scholar] [Publisher Link]
[37] Robert E. Schapire, The Boosting Approach to Machine Learning: An Overview, Nonlinear Estimation and Classification, pp. 149-171, Springer, 2003.
[CrossRef] [Google Scholar] [Publisher Link]
[38] Mark R. Segal, “Machine Learning Benchmarks and Random Forest Regression,” UCSF: Center for Bioinformatics and Molecular Biostatistics, pp. 1-14, 2004.
[Google Scholar] [Publisher Link]
[39] Hasan Selim, “Determinants of House Prices in Turkey: Hedonic Regression Versus Artificial Neural Network,” Expert systems with Applications, vol. 36, no. 2, pp. 2843-2852, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[40] Robert P. Sheridan et al., “Extreme Gradient Boosting as a Method for Quantitative Structure-Activity Relationships,” Journal of Chemical Information and Modeling, vol. 56, no. 12, pp. 2353-2360, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[41] Andreja Stojić et al., “Explainable Extreme Gradient Boosting Tree-Based Prediction of Toluene, Ethylbenzene and Xylene Wet Deposition,” Science of The Total Environment, vol. 653, pp. 140-147, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[42] Ean Zou Teoh et al., “Explainable Housing Price Prediction with Determinant Analysis,” International Journal of Housing Markets and Analysis, vol. 16, no. 5, pp. 1021-1045, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[43] Ayush Varma et al., “House Price Prediction using Machine Learning and Neural Networks,” 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), Coimbatore, India, pp. 1936-1939, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[44] Adrian Weller, “Challenges for Transparency,” Open Review, pp. 1-8, 2017.
[Google Scholar] [Publisher Link]
[45] Yanli Wu et al., “Application of Alternating Decision Tree with Adaboost and Bagging Ensembles for Landslide Susceptibility Mapping,” Catena, vol. 187, pp. 1-58, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[46] Emine Yaman, and Abdulhamit Subasi, “Comparison of Bagging and Boosting Ensemble Machine Learning Methods for Automated EMG Signal Classification,” BioMed Research International, vol. 2019, pp. 1-13, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[47] Cha Zhang, and Yunqian Ma, Ensemble Machine Learning: Methods and Applications, 1st ed., Springer New York, NY, 2012.
[CrossRef] [Google Scholar] [Publisher Link]
[48] Zhi-Hua Zhou, “When Semi-Supervised Learning Meets Ensemble Learning,” International Workshop on Multiple Classifier Systems, Reykjavik, Iceland, pp. 529-538, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[49] Zhi-Hua Zhou, “When Semi-Supervised Learning Meets Ensemble Learning,” Frontiers of Electrical and Electronic Engineering, vol. 6, no. 1, pp. 6-16, 2011.
[CrossRef] [Google Scholar] [Publisher Link]
[50] Mohammad Zounemat-Kermani et al., “Ensemble Machine Learning Paradigms in Hydrology: A Review,” Journal of Hydrology, vol. 598, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[51] Jozef M. Zurada, Alan S. Levitan, and Jian Guan, “Non-Conventional Approaches to Property Value Assessment,” Journal of Applied Business Research, vol. 22 no. 3, pp. 1-14, 2006.
[CrossRef] [Google Scholar] [Publisher Link]