**Reference**:- Erik D. Demaine, Dotan Emanuel, Amos Fiat, and Nicole Immorlica, “Correlation Clustering in General Weighted Graphs”,
*Theoretical Computer Science*, volume 361, number 2–3, September 2006, pages 172–187. Special issue on approximation and online algorithms. **Abstract**:-
We consider the following general
*correlation-clustering problem*[1]: given a graph with real nonnegative edge weights and a ⟨+⟩/⟨−⟩ edge labeling, partition the vertices into clusters to minimize the total weight of cut ⟨+⟩ edges and uncut ⟨−⟩ edges. Thus, ⟨+⟩ edges with large weights (representing strong correlations between endpoints) encourage those endpoints to belong to a common cluster while ⟨−⟩ edges with large weights encourage the endpoints to belong to different clusters. In contrast to most clustering problems, correlation clustering specifies neither the desired number of clusters nor a distance threshold for clustering; both of these parameters are effectively chosen to be the best possible by the problem definition.Correlation clustering was introduced by Bansal, Blum, and Chawla [1], motivated by both document clustering and agnostic learning. They proved NP-hardness and gave constant-factor approximation algorithms for the special case in which the graph is complete (full information) and every edge has the same weight. We give an

*O*(log*n*)-approximation algorithm for the general case based on a linear-programming rounding and the “region-growing” technique. We also prove that this linear program has a gap of Ω(log*n*), and therefore our approximation is tight under this approach. We also give an*O*(*r*^{3})-approximation algorithm for*K*_{r, r}-minor-free graphs. On the other hand, we show that the problem is equivalent to minimum multicut, and therefore APX-hard and difficult to approximate better than Θ(log*n*). **Comments**:- This paper is also available from ScienceDirect.
**Length**:- The paper is 20 pages.
**Availability**:- The paper is available in PostScript (522k), gzipped PostScript (209k), and PDF (239k).
- See information on file formats.
- [Google Scholar search]
**Related papers**:- Clustering_APPROX2003 (Correlation Clustering with Partial Information)

See also other papers by Erik Demaine.

Last updated October 19, 2020 by Erik Demaine.