Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics

Dharmendra Agawane, Rohit Pawar, Pavankumar Purohit, Gangadhar Agre


Big-data is a very important resource in today’s world, if utilized properly. But for utilizing this big data it is required to analyse that data to find out some interesting facts out of it. The increase in the Five V’s of the data i.e. Velocity, Volume, Variety, Veracity and Value have made the processing of the data more and more complex and this change is bringing more and more challenges in this field of Data Processing.
In such a case to get some useful information out from such a large variety and volume of data we need to use the concepts like DATA MINING and CLUSTERING. In our project we would be finding different insights from US census Dataset which will help us to understand the dataset more easily and conclude inferences from it. For this purpose i.e. for finding the insights from a large data set we would be working with the framework called “Hadoop”, which is a tool for processing big-data with minimum time and with more accuracy.
Then we would be analysing the performance of the Hadoop cluster both single node and multi-node while working with different number of nodes with different block size in addition with replication factor


Hadoop, Map-Reduce, Insights, Single node Hadoop Cluster, Multi node Hadoop Cluster, Replication, Block size.

Full Text:



. Ruchi Mittal and Ruhi Bagga, “Performance Analysis of Multi-Node Hadoop Clusters using Amazon EC2 Instances”, International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064

. Jiong xie & Jshu Yin, “Improving Map Reduce Performance through Data placement in Heterogeneous Hadoop Cluster”, Department of Computer Science and Software Engineering Auburn University, Auburn, AL 36849-5347

. Dr.Kiran Jyoti & Mrs.Bhawna Gupta, “Big Data Analytics with Hadoop to Analyze Targeted Attacks on Enterprise Data by”, (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (3), 2014, 3867-3870

. Zaheer Khan & Ashiq Anjum, “Cloud based Big Data Analytics for Smart Future Cities”, Utility and Cloud Computing (UCC), 2013 IEEE/ACM 6th International Conference.

. Vishal S. Patil & Pravin D. Soni, “Hadoop Skelton and fault tolerance in Hadoop Cluster”, International Journal of Application or Innovation in Engineering & Management (IJAIEM), Volume 2, Issue 2, February 2013, ISSN 2319 – 4847.

. Xindong Wu & Xingquan Zhu, “Data Mining with Big Data”, ieee transactions on knowledge and data engineering, vol. 26, no. 1, january 2014.

. P. Prabhu and N. Anbazhagan, “Improving the Performance of K-means Clustering for High Dimensional Data Set”, International Journal on Computer Science and Engineering (IJCSE)

. Apache Hadoop.







Creative Commons License
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.