An Efficient Partition-Repetition Approach in Clustering of Big Data

Karmakar, Bikram; Mukhopadhayay, Indranil

An Efficient Partition-Repetition Approach in Clustering of Big Data

Bikram Karmakar () and Indranil Mukhopadhayay ()
Additional contact information
Bikram Karmakar: University of Pennsylvania, Department of Statistics, The Wharton School
Indranil Mukhopadhayay: Indian Statistical Institute, Human Genetics Unit

A chapter in Big Data Analytics, 2016, pp 75-93 from Springer

Abstract: Abstract Addressing the problem of clustering, i.e. splitting the data into homogeneous groups in an unsupervised way, is one of the major challenges in big data analytics. Volume, variety and velocity associated with such big data make this problem even more complex. Standard clustering techniques might fail due to this inherent complexity of the data cloud. Some adaptations are required or demand for novel methods are to be fulfilled towards achieving a reasonable solution to this problem without compromising the performance, at least beyond a certain limit. In this article we discuss the salient features, major challenges and prospective solution paths to this problem of clustering big data. Discussion on current state of the art reveals the existing problems and some solutions to this issue. The current paradigm and research work specific to the complexities in this area is outlined with special emphasis on the characteristic of big data in this context. We develop an adaptation of a standard method that is more suitable to big data clustering when the data cloud is relatively regular with respect to inherent features. We also discuss a novel method for some special types of data where it is a more plausible and realistic phenomenon to leave some data points as noise or scattered in the domain of whole data cloud while a major portion form different clusters. Our demonstration through simulations reveals the strength and feasibility of applying the proposed algorithm for practical purpose with a very low computation time.

Keywords: Cluster Algorithm; Localize Algorithm; Data Cloud; Rand Index; Tight Cluster (search for similar items in EconPapers)
Date: 2016
References: Add references at CitEc
Citations:

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-81-322-3628-3_5

Ordering information: This item can be ordered from
http://www.springer.com/9788132236283

DOI: 10.1007/978-81-322-3628-3_5

Access Statistics for this chapter

More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().