DataMining:ConceptsandTechniques(3rded.)—Chapter10—JiaweiHan,MichelineKamber,andJianPeiUniversityofIllinoisatUrbana-Champaign&SimonFraserUniversity©2011Han,Kamber&Pei.Allrightsreserved.123Chapter10.ClusterAnalysis:BasicConceptsandMethodsClusterAnalysis:BasicConceptsPartitioningMethodsHierarchicalMethodsDensity-BasedMethodsGrid-BasedMethodsEvaluationofClusteringSummary34WhatisClusterAnalysis?Cluster:Acollectionofdataobjectssimilar(orrelated)tooneanotherwithinthesamegroupdissimilar(orunrelated)totheobjectsinothergroupsClusteranalysis(orclustering,datasegmentation,…)FindingsimilaritiesbetweendataaccordingtothecharacteristicsfoundinthedataandgroupingsimilardataobjectsintoclustersUnsupervisedlearning:nopredefinedclasses(i.e.,learningbyobservationsvs.learningbyexamples:supervised)TypicalapplicationsAsastand-alonetooltogetinsightintodatadistributionAsapreprocessingstepforotheralgorithms5ClusteringforDataUnderstandingandApplicationsBiology:taxonomyoflivingthings:kingdom,phylum,class,order,family,genusandspeciesInformationretrieval:documentclusteringLanduse:IdentificationofareasofsimilarlanduseinanearthobservationdatabaseMarketing:Helpmarketersdiscoverdistinctgroupsintheircustomerbases,andthenusethisknowledgetodeveloptargetedmarketingprogramsCity-planning:Identifyinggroupsofhousesaccordingtotheirhousetype,value,andgeographicallocationEarth-quakestudies:ObservedearthquakeepicentersshouldbeclusteredalongcontinentfaultsClimate:understandingearthclimate,findpatternsofatmosphericandoceanEconomicScience:marketresarch6ClusteringasaPreprocessingTool(Utility)Summarization:Preprocessingforregression,PCA,classification,andassociationanalysisCompression:Imageprocessing:vectorquantizationFindingK-nearestNeighborsLocalizingsearchtooneorasmallnumberofclustersOutlierdetectionOutliersareoftenviewedasthose“faraway”fromanyclusterQuality:WhatIsGoodClustering?Agoodclusteringmethodwillproducehighqualityclustershighintra-...