1DataMining:ConceptsandTechniques(3rded.)—Chapter11—JiaweiHan,MichelineKamber,andJianPeiUniversityofIllinoisatUrbana-Champaign&SimonFraserUniversity©2012Han,Kamber&Pei.Allrightsreserved.1January26,2025DataMining:ConceptsandTechniques23Review:BasicClusterAnalysisMethods(Chap.10)ClusterAnalysis:BasicConceptsGroupdatasothatobjectsimilarityishighwithinclustersbutlowacrossclustersPartitioningMethodsK-meansandk-medoidsalgorithmsandtheirrefinementsHierarchicalMethodsAgglomerativeanddivisivemethod,Birch,CameleonDensity-BasedMethodsDBScan,OpticsandDenCLuGrid-BasedMethodsSTINGandCLIQUE(subspaceclustering)EvaluationofClusteringAssessclusteringtendency,determine#ofclusters,andmeasureclusteringquality3K-MeansClusteringK=2ArbitrarilypartitionobjectsintokgroupsUpdatetheclustercentroidsUpdatetheclustercentroidsReassignobjectsLoopifneeded4TheinitialdatasetPartitionobjectsintoknonemptysubsetsRepeatComputecentroid(i.e.,meanpoint)foreachpartitionAssigneachobjecttotheclusterofitsnearestcentroidUntilnochangeHierarchicalClusteringUsedistancematrixasclusteringcriteria.Thismethoddoesnotrequirethenumberofclusterskasaninput,butneedsaterminationconditionStep0Step1Step2Step3Step4bdceaabdecdeabcdeStep4Step3Step2Step1Step0agglomerative(AGNES)divisive(DIANA)5DistancebetweenClustersSinglelink:smallestdistancebetweenanelementinoneclusterandanelementintheother,i.e.,dist(Ki,Kj)=min(tip,tjq)Completelink:largestdistancebetweenanelementinoneclusterandanelementintheother,i.e.,dist(Ki,Kj)=max(tip,tjq)Average:avgdistancebetweenanelementinoneclusterandanelementintheother,i.e.,dist(Ki,Kj)=avg(tip,tjq)Centroid:distancebetweenthecentroidsoftwoclusters,i.e.,dist(Ki,Kj)=dist(Ci,Cj)Medoid:distancebetweenthemedoidsoftwoclusters,i.e.,dist(Ki,Kj)=dist(Mi,Mj)Medoid:achosen,centrallylocatedobjectintheclusterXX6BIRCHandtheClusteringFeature(CF)TreeStructureCF1child1CF3child3CF2child2CF6child6CF1child1CF3child3CF2child2CF5child5CF1CF2CF6prevnextCF1CF2CF4prevnextB=7L=6R...