A Novel Indexing based Decision tree model on high Dimension Data Stream

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
© 2015 by IJETT Journal
Volume-30 Number-5
Year of Publication : 2015
Authors : G.Geetha, Dr. M Nagabhushana Rao


G.Geetha, Dr. M Nagabhushana Rao"A Novel Indexing based Decision tree model on high Dimension Data Stream", International Journal of Engineering Trends and Technology (IJETT), V30(5),246-250 December 2015. ISSN:2231-5381. www.ijettjournal.org. published by seventh sense research group

As the dimension increases, the complexity to find the nearest neighbor also increases. In one dimension we can find the nearest point very efficiently. But when the dimension increases, the search efficiency depends on how many points we have to search. In this thesis, we propose a simple and efficient technique on how to organize the points in the high dimensional space which also allows searching the nearest point efficiently. We have used a memory efficient database organization of the points in the high dimension, and the searching algorithm is based on limiting the search within some particular indexes of the database. A comparative analysis reveals that our database organization is faster than Etree, R-tree , and the searching algorithm is also very efficient.


[1] A. Arasu, V. Ganti, and R. Kaushik, “Efficient Exact Set- Similarity Joins,” in VLDB, 2006, pp. 918–929.
[2] R. Bayardo, Y. Ma, and R. Srikant, “Scaling up all-pairs similarity search,” in WWW Conference, 2007.
[3] D. Deng, Y. Jiang, G. Li, J. Li, and C. Yu. Scalable column concept determination for web tables using large knowledge bases. PVLDB, 6(13):1606–1617, 2013.
[4] J. Feng, J. Wang, and G. Li. Trie-join: a trie-based method for efficient string similarity joins. VLDB J., 21(4):437–461, 2012.
[5] E. H. Jacox and H. Samet. Metric space similarity joins. ACM Trans. Database Syst., 33(2), 2008.
[6] S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani, “Robust and Efficient Fuzzy Match for Online Data Cleaning,” in SIGMOD, 2003, pp. 313–324.
[7] J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137–150, 2004.
[8] Y. Kim and K. Shim. Parallel top-k similarity join algorithms using mapreduce. In ICDE, pages 510–521, 2012.
[9] F. Li, B. C. Ooi, M. T. ¨Ozsu, and S. Wu. Distributed data management using mapreduce. ACM Comput. Surv., 2014.
[10] L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava, “Approximate string joins in a database (almost) for free,” in VLDB, 2001, pp. 491–500.

Watermark, Security, Encode and Decode.