Paper by Erik D. Demaine

Reference:

Erik D. Demaine, Dotan Emanuel, Amos Fiat, and Nicole Immorlica, “Correlation Clustering in General Weighted Graphs”, Theoretical Computer Science, volume 361, number 2–3, September 2006, pages 172–187. Special issue on approximation and online algorithms.

BibTeX

@Article{Clustering_TCS,
  AUTHOR        = {Erik D. Demaine and Dotan Emanuel and Amos Fiat and
                   Nicole Immorlica},
  TITLE         = {Correlation Clustering in General Weighted Graphs},
  JOURNAL       = {Theoretical Computer Science},
  journalurl    = {https://www.journals.elsevier.com/theoretical-computer-science},
  VOLUME        = 361,
  NUMBER        = {2--3},
  MONTH         = {September},
  YEAR          = 2006,
  PAGES         = {172--187},
  NOTE          = {Special issue on approximation and online algorithms.},

  length        = {20 pages},
  papers        = {Clustering_APPROX2003},
  replaces      = {Clustering_APPROX2003},
  withstudent   = 1,
  doi           = {https://dx.doi.org/10.1016/J.TCS.2006.05.008},
  dblp          = {https://dblp.org/rec/journals/tcs/DemaineEFI06},
  comments      = {This paper is also available from
                   <A HREF="http://dx.doi.org/10.1016/j.tcs.2006.05.008">ScienceDirect</A>.},
}

Abstract:

We consider the following general correlation-clustering problem [1]: given a graph with real nonnegative edge weights and a ⟨+⟩/⟨−⟩ edge labeling, partition the vertices into clusters to minimize the total weight of cut ⟨+⟩ edges and uncut ⟨−⟩ edges. Thus, ⟨+⟩ edges with large weights (representing strong correlations between endpoints) encourage those endpoints to belong to a common cluster while ⟨−⟩ edges with large weights encourage the endpoints to belong to different clusters. In contrast to most clustering problems, correlation clustering specifies neither the desired number of clusters nor a distance threshold for clustering; both of these parameters are effectively chosen to be the best possible by the problem definition.

Correlation clustering was introduced by Bansal, Blum, and Chawla [1], motivated by both document clustering and agnostic learning. They proved NP-hardness and gave constant-factor approximation algorithms for the special case in which the graph is complete (full information) and every edge has the same weight. We give an O(log n)-approximation algorithm for the general case based on a linear-programming rounding and the “region-growing” technique. We also prove that this linear program has a gap of Ω(log n), and therefore our approximation is tight under this approach. We also give an O(r³)-approximation algorithm for K_r, r-minor-free graphs. On the other hand, we show that the problem is equivalent to minimum multicut, and therefore APX-hard and difficult to approximate better than Θ(log n).

Comments:

This paper is also available from ScienceDirect.

Length:

The paper is 20 pages.

Availability:

The paper is available in PostScript (522k), gzipped PostScript (209k), and PDF (239k).

See information on file formats.

[Google Scholar search]

Related papers:

Clustering_APPROX2003 (Correlation Clustering with Partial Information)