A Novel Indexing based Decision tree model on high Dimension Data Stream

	International Journal of Engineering Trends and Technology (IJETT)
	© 2015 by IJETT Journal
	Volume-30 Number-5
	Year of Publication : 2015
	Authors : G.Geetha, Dr. M Nagabhushana Rao

Citation

G.Geetha, Dr. M Nagabhushana Rao"A Novel Indexing based Decision tree model on high Dimension Data Stream", International Journal of Engineering Trends and Technology (IJETT), V30(5),246-250 December 2015. ISSN:2231-5381. www.ijettjournal.org. published by seventh sense research group

Abstract
As the dimension increases, the complexity to find the nearest neighbor also increases. In one dimension we can find the nearest point very efficiently. But when the dimension increases, the search efficiency depends on how many points we have to search. In this thesis, we propose a simple and efficient technique on how to organize the points in the high dimensional space which also allows searching the nearest point efficiently. We have used a memory efficient database organization of the points in the high dimension, and the searching algorithm is based on limiting the search within some particular indexes of the database. A comparative analysis reveals that our database organization is faster than Etree, R-tree , and the searching algorithm is also very efficient.

References

[1] A. Arasu, V. Ganti, and R. Kaushik, “Efficient Exact Set- Similarity Joins,” in VLDB, 2006, pp. 918–929.
[2] R. Bayardo, Y. Ma, and R. Srikant, “Scaling up all-pairs similarity search,” in WWW Conference, 2007.
[3] D. Deng, Y. Jiang, G. Li, J. Li, and C. Yu. Scalable column concept determination for web tables using large knowledge bases. PVLDB, 6(13):1606–1617, 2013.
[4] J. Feng, J. Wang, and G. Li. Trie-join: a trie-based method for efficient string similarity joins. VLDB J., 21(4):437–461, 2012.
[5] E. H. Jacox and H. Samet. Metric space similarity joins. ACM Trans. Database Syst., 33(2), 2008.
[6] S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani, “Robust and Efficient Fuzzy Match for Online Data Cleaning,” in SIGMOD, 2003, pp. 313–324.
[7] J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137–150, 2004.
[8] Y. Kim and K. Shim. Parallel top-k similarity join algorithms using mapreduce. In ICDE, pages 510–521, 2012.
[9] F. Li, B. C. Ooi, M. T. ¨Ozsu, and S. Wu. Distributed data management using mapreduce. ACM Comput. Surv., 2014.
[10] L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava, “Approximate string joins in a database (almost) for free,” in VLDB, 2001, pp. 491–500.

Keywords
Watermark, Security, Encode and Decode.

IJBTT

A Novel Indexing based Decision tree model on high Dimension Data Stream