Fair Colorful k-Center Clustering

An instance of colorful k-center consists of points in a metric space that are colored red or blue, along with an integer k and a coverage requirement for each color. The goal is to find the smallest radius \r{ho} such that there exist balls of radius \r{ho} around k of the points that meet the coverage requirements. The motivation behind this problem is twofold. First, from fairness considerations: each color/group should receive a similar service guarantee, and second, from the algorithmic challenges it poses: this problem combines the difficulties of clustering along with the subset-sum problem. In particular, we show that this combination results in strong integrality gap lower bounds for several natural linear programming relaxations. Our main result is an efficient approximation algorithm that overcomes these difficulties to achieve an approximation guarantee of 3, nearly matching the tight approximation guarantee of 2 for the classical k-center problem which this problem generalizes.


Introduction
In the colorful k-center problem introduced in [5], we are given a set of n points P in a metric space partitioned into a set R of red points and a set B of blue points, along with parameters k, r, and b. The goal is to find a set of k centers C ⊆ P that minimizes ρ so that balls of radius ρ around each point in C cover at least r red points and at least b blue points. More generally, the points can be partitioned into ω color classes C 1 , . . . , C ω , with coverage requirements p 1 , . . . , p ω . To keep the exposition of our ideas as clean as possible, we concentrate the bulk of our discussion to the version with two colors. In Section 3 we show how our algorithm can be generalized for ω color classes with an exponential dependence on ω in the running time in a rather straightforward way, thus getting a polynomial time algorithm for constant ω.
This generalization of the classic k-center problem has applications in situations where fairness is a concern. For example, if a telecommunications company is required to provide service to at least 90% of the people in a country, it would be cost effective to only provide service in densely populated areas. This is at odds with the ideal that at least some people in every community should receive service. In the absence of color classes, an approximation algorithm could be "unfair" to some groups by completely considering them as outliers. The inception of fairness in clustering can be found in the recent paper [8] (see also [1,4]), which uses a related but incomparable notion of fairness. Their notion of fairness requires each individual cluster to have a balanced number of points from each color class, which leads to very different algorithmic considerations and is motivated by other applications, such as "feature engineering".
The other motive for studying the colorful k-center problem derives from the algorithmic challenges it poses. One can observe that it generalizes the k-center problem with outliers, which is equivalent to only having red points and needing to cover at least r of them. This outlier version is already more challenging than the classic k-center problem: only recent results give tight 2-approximation algorithms [6,12], improving upon the 3-approximation guarantee of [7]. In contrast, such algorithms for the classic k-center problem have been known since the '80s [10,13]. That the approximation guarantee of 2 is tight, even for classic k-center, was proved in [14].
At the same time, a subset-sum problem with polynomial-sized numbers is embedded within the colorful k-center problem. To see this, consider n numbers a 1 , . . . , a n and let A = n i=1 a i . Construct an instance of the colorful k-center problem with r = k · A + A/2, b = k · A − A/2, and for every i ∈ {1, . . . , n}, a ball of radius one containing A + a i red points and A − a i blue points. These balls are assumed to be far apart so that any single ball that covers two of these balls must have a very large radius. It is easy to see that the constructed colorful k-center instance has a solution of radius one if and only if there is a size k subset of the n numbers whose sum equals A/2.
We use this connection to subset-sum to show that the standard linear programming (LP) relaxation of the colorful k-center problem has an unbounded integrality gap even after a linear number of rounds of the powerful Lasserre/Sum-of-Squares hierarchy (see Section 4.1). We remark that the standard linear programming relaxation gives a 2-approximation algorithm for the outliers version even without applying lift-and-project methods. Another natural approach for strengthening the standard linear programming relaxation is to add flow-based inequalities specially designed to solve subset-sum problems. However, in Section 4.2, we prove that they do not improve the integrality gap due to the clustering feature of the problem. This shows that clustering and the subset-sum problem are intricately related in colorful k-center. This interplay makes the problem more complex and prior to our work only a randomized constant-factor approximation algorithm was known when the points are in R 2 with an approximation guarantee greater than 6 [5].
Our main result overcomes these difficulties and we give a nearly tight approximation guarantee: There is a 3-approximation algorithm for the colorful k-center problem.
As aforementioned, our techniques can be easily extended to a constant number of color classes but we restrict the discussion here to two colors.
On a very high level, our algorithm manages to decouple the clustering and the subset-sum aspects. First, our algorithm guesses certain centers of the optimal solution that it then uses to partition the point set into a "dense" part P d and a "sparse" part P s . The dense part is clustered using a subset-sum instance while the sparse set is clustered using the techniques of Bandyapadhyay, Inamdar, Pai, and Varadarajan [5] (see Section 2.1). Specifically, we use the pseudo-approximation of [5] that satisfies the coverage requirements using k + 1 balls of at most twice the optimal radius. While our approximation guarantee is nearly tight, it remains an interesting open problem to give a 2-approximation algorithm or to show that the ratio 3 is tight. One possible direction is to understand the strength of the relaxation obtained by combining the Lasserre/Sum-of-Squares hierarchy with the flow constraints. While we show that individually they do not improve the integrality gap, we believe that their combination can lead to a strong relaxation.
Independent work. Independently and concurrently to our work, authors in [2] obtained a 4approximation algorithm for the colorful k-center problem with ω = O(1) using different techniques than the ones described in this work. Furthermore they show that, assuming P = N P , if ω is allowed to be unbounded then the colorful k-center problem admits no algorithm guaranteeing a LP1 i∈B(j) finite approximation. They also show that assuming the Exponential Time Hypothesis, colorful k-center is inapproximable if ω grows faster than log n. Organization. We begin by giving some notation and definitions and describing the pseudoapproximation algorithm in [5]. In fact, we then describe a 2-approximation algorithm on a certain class of instances that are well-separated, and the 3-approximation follows almost immediately. This 2-approximation proceeds in two phases: the first is dedicated to the guessing of certain centers, while the second processes the dense and sparse sets.
Section 3 explains the generalization to ω color classes. In Section 3 we present our integrality gaps under the Sum-of-Squares hierarchy and additional constraints deriving from a flow network to solve subset-sums.

A 3-Approximation Algorithm
In this section we present our 3-approximation algorithm. We briefly describe the pseudo-approximation algorithm of Bandhyapadhyay et al. [5] since we use it as a subroutine in our algorithm.
Notation: We assume that our problem instance is normalized to have an optimal radius of one and we refer to the set of centers in an optimal solution as OP T . The set of all points at distance at most ρ from a point j is denoted by B(j, ρ) and we refer to this set as a ball of radius ρ at j. We write B(j) for B(j, 1). By a ball of OP T we mean B(j) for some j ∈ OP T .

The Pseudo-Approximation Algorithm
The algorithm of Bandhyapadhyay et al. [5] first guesses the optimal radius for the instance (there are at most O(n 2 ) distinct values the optimal radius can take), which we assume by normalization to be one, and considers the natural LP relaxation LP1 depicted on the left in Figure 1. The variable x i indicates how much point i is fractionally opened as a center and z i indicates the amount that i is covered by centers.
Given a fractional solution to LP1, the algorithm of [5] finds a clustering of the points. The clusters that are produced are of radius two, and with a simple modification (details can be found in Appendix B), can be made to have a special structure that we call a flower: Definition 2.1. For j ∈ P , a flower centered at j is the set F(j) = ∪ i∈B(j) B(i).
More specifically, given a fractional solution (x, z) to LP1, the clustering algorithm in [5] produces a set of points S ⊆ P and a cluster C j ⊆ P for every j ∈ S such that: 1. The set S is a subset of the points {j ∈ P : z j > 0} with positive z-values.
2. For each j ∈ S, we have C j ⊆ F(j) and the clusters {C j } j∈S are pairwise disjoint.
3. If we let r j = |C j ∩ R| and b j = |C j ∩ B| for j ∈ S, then the linear program LP2 (depicted on the right in Figure 1) has a feasible solution y of value at least r.
As LP2 has only two non-trivial constraints, any extreme point will have at most two variables attaining strictly fractional values. So at most k + 1 variables of y are non-zero. The pseudoapproximation of [5] now simply takes those non-zero points as centers. Since each flower is of radius two, this gives a 2-approximation algorithm that opens at most k + 1 centers. (Note that, as the clusters {C j } j∈S are pairwise disjoint, at least b blue points are covered, and at least r red points are covered since the value of the solution is at least r.) Obtaining a constant-factor approximation algorithm that only opens k centers turns out to be significantly more challenging. Nevertheless, the above techniques form an important subroutine in our algorithm. Given a fractional solution (x, z) to LP1, we proceed as above to find S and an extreme point to LP2 of value at least r. However, instead of selecting all points with positive y-value, we, in the case of two fractional values, only select the one whose cluster covers more blue points. This gives us a solution of at most k centers whose clusters cover at least b blue points. Furthermore, the number of red points that are covered is at least r − max j∈S r j since we disregarded at most one center. As S ⊆ {j : z j > 0} (see first property above) and C j ⊆ F(j) (see second property above), we have max j∈S r j ≤ max j:z j >0 |F(j) ∩ R|. We summarize the obtained properties in the following lemma.
Lemma 2.2. Given a fractional solution (x, z) to LP1, there is a polynomial-time algorithm that outputs at most k clusters of radius two that cover at least b blue points and at least r − max j:z j >0 |F(j) ∩ R| red points.
We can thus find a 2-approximate solution that covers sufficiently many blue points but may cover fewer red points than necessary. The idea now is that, if the number of red points in any cluster is not too large, i.e., max j:z j >0 |F(j) ∩ R| is "small", then we can hope to meet the coverage requirements for the red points by increasing the radius around some opened centers. Our algorithm builds on this intuition to get a 2-approximation algorithm using at most k centers for well-separated instances as defined below.

Definition 2.3. An instance of colorful k-center is well-separated if there does not exist a ball of radius three that covers at least two balls of OP T .
Our main result of this section can now be stated as follows: There is a 2-approximation algorithm for well-separated instances.
The above theorem immediately implies Theorem 1, i.e., the 3-approximation algorithm for general instances. Indeed, if the instance is not well-separated, we can find a ball of radius three that covers at least two balls of OP T by trying all n points and running the pseudo-approximation of [5] on the remaining uncovered points with k − 2 centers. In the correct iteration, this gives us at most k − 1 centers of radius two, which when combined with the ball of radius three that covers two balls of OP T , is a 3-approximation.
Our algorithm for well-separated instances now proceeds in two phases with the objective of finding a subset of P on which the pseudo-approximation algorithm produces subsets of flowers containing not too many red points. In addition, we maintain a partial solution set of centers (some guessed in the first phase), so that we can expand the radius around these centers to recover the deficit of red points from closing one of the fractional centers.

Phase I
In this phase we will guess some balls of OP T that can be used to construct a bound on max j:z j >0 |R∩ F(j)|. To achieve this, we define the notion of Gain(p, q) for any point p ∈ P and q ∈ B(p).
Definition 2.4. For any p ∈ P and q ∈ B(p), let be the set of red points added to B(p) by forming a flower centered at q.
Our algorithm in this phase proceeds by guessing three centers c 1 , c 2 , c 3 of the optimal solution OP T : For i = 1, 2, 3, guess the center c i in OP T and calculate the point q i ∈ B(c i ) such that the number of red points in Gain(c i , q i ) ∩ P i is maximized over all possible c i , where The time it takes to guess c 1 , c 2 , and c 3 is O(n 3 ) and for each c i we find the q i ∈ B(c i ) such that |Gain(c i , q i ) ∩ P i | is maximized by trying all points in B(c i ) (at most n many).
For notation, define Guess : The important properties guaranteed by the first phase is summarized in the following lemma.
Lemma 2.5. Assuming that c 1 , c 2 , and c 3 are guessed correctly, we have that are contained in P 4 and cover b − |B ∩ Guess| blue points and r − |R ∩ Guess| red points; and 2. the three clusters F(q 1 ), F(q 2 ), and F(q 3 ) are contained in P \P 4 and cover at least |B∩Guess| blue points and at least |R ∩ Guess| + 3 · τ red points.

Proof. 1) We claim that the intersection of any ball of OP T \ {c
Hence, a ball of radius three around q ′ covers both B(p) and B(c i ) as c i ∈ B(q i ), which contradicts that the instance is well-separated.
2) Note that for , and that B(c i ) and Gain(c i , q i ) are disjoint. The balls B(c i ) cover at least |B ∩ Guess| blue points and |R ∩ Guess| red points, while

Phase II
Throughout this section we assume c 1 , c 2 , and c 3 have been guessed correctly in Phase I so that the properties of Lemma 2.5 hold. Furthermore, by the selection and the definition of τ , we also have This implies that F(p) \ B(p) contains at most τ red points of P 4 . However, to apply Lemma 2.2 we need that the number of red points of P 4 in the whole flower F(p) is bounded. To deal with balls with many more than τ red points, we will iteratively remove dense sets from P 4 to obtain a subset P s of sparse points.
Definition 2.6. When considering a subset of the points P s ⊆ P , we say that a point j ∈ P s is dense if the ball B(j) contains strictly more than 2 · τ red points of P s . For a dense point j, we also let I j ⊆ P s contain those points i ∈ P s whose intersection B(i) ∩ B(j) contains strictly more than τ red points of P s .
We remark that in the above definition, we have in particular that j ∈ I j for a dense point j ∈ P s . Our iterative procedure now works as follows: Initially, let I = ∅ and P s = P 4 . While there is a dense point j ∈ P s : • Add I j to I and update P s by removing the points D j = ∪ i∈I j B(i) ∩ P s .
Let P d = P 4 \ P s denote those points that were removed from P 4 . We will cluster the two sets P s and P d of points separately. Indeed, the following lemma says that a center in OP T \ {c i } 3 i=1 either covers points in P s or P d but not points from both sets. Recall that D j denotes the set of points that are removed from P s in the iteration when j was selected and so P d = ∪ j D j .
Then, since c / ∈ I j , the intersection B(c) ∩ B(j) contains fewer than τ red points from D j (recall that D j contains the points of B(j) in P s at the time j was selected). But by the definition of dense clients, B(j) ∩ D j has more than 2 · τ red points, so (B(j) \ B(c)) ∩ D j has more than τ red points. This region is a subset of Gain(c, p) ∩ P 4 , which contradicts (1). This is shown in Figure 2(a). Now consider the second case when B(c) ∩ B(j) = ∅ and there is a point p in the intersection B(c) ∩ B(i) for some i ∈ I j and i = j. Then, by the definition of I j , B(i) ∩ B(j) has more than τ red points of D j . However, this is also a subset of Gain(c, p) ∩ P 4 so we reach the same contradiction. See Figure  2 Our algorithm now proceeds by guessing the number We also guess the numbers r d and b d of red and blue points, respectively, that these balls cover in P d . Note that after guessing k d , we know that the number of balls in Henceforth, we therefore assume that we have guessed those parameters correctly. In that case, we show that we can recover an equally good solution for P d and a solution for P s that covers b s blue points and almost r s red points: • A s returns k s balls of radius two that cover at least b s blue points of P s and at least r s − 3 · τ red points of P s .
Proof. We first describe and analyze the algorithm A d followed by A s .
The algorithm A d for the dense point set P d . By Lemma 2.7, we have that all k d balls in OP T \ {c i } 3 i=1 that cover points in P d are centered at points in ∪ j I j . Furthermore, we have that each I j contains at most one center of OP T . This is because every i ∈ I j is such that B(i)∩B(j) = ∅ and so, by the triangle inequality, B(j, 3) contains all balls {B(i)} i∈I j . Hence, by the assumption that the instance is well-separated, the set I j contains at most one center of OP T .
We now reduce our problem to a 3-dimensional subset-sum problem. For each I j ∈ I, form a group consisting of an item for each p ∈ I j . The item corresponding to p ∈ I j has the 3dimensional value vector (1, |B(p) ∩ D j ∩ B|, |B(p) ∩ D j ∩ R|). Our goal is to find k d items such that at most one item per group is selected and their 3-dimensional vectors sum up to (k d , b d , r d ). Such a solution, if it exists, can be found by standard dynamic programming that has a table of size O(n 4 ). For completeness, we provide the recurrence and precise details of this standard technique in Appendix A. Furthermore, since the D j 's are disjoint by definition, this gives k d centers that cover b d blue points and r d red points in P d , as required in the statement of the lemma.
It remains to show that such a solution exists. Let o 1 , o 2 , . . . , o k d denote the centers of the balls in OP T \ {c i } 3 i=1 that cover points in P d . Furthermore, let I j 1 , . . . , I j k d be the sets in I such that o i ∈ I j i for i ∈ {1, . . . , k d }. Notice that by Lemma 2.7 we have that B(o i ) ∩ P d is disjoint from P d \ D j i and contained in D j i . It follows that the 3-dimensional vector corresponding to an OP T center o i equals (1, |B(p) ∩ P d ∩ B|, |B(p) ∩ P d ∩ R|). Therefore, the sum of these vectors corresponding to o 1 , . . . , o k d results in the vector (k d , b d , r d ), where we used that our guesses of k d , b d , and r d were correct.
The algorithm A s for the sparse point set P s . Assuming that the guesses are correct we have that OP T \ {c i } 3 i=1 contains k s balls that cover b s blue points of P s and r s red points of P s . Hence, LP1 has a feasible solution (x, z) to the instance defined by the point set P s , the number of balls k s , and the constraints b s and r s on the number of blue and red points to be covered, respectively. Lemma 2.2 then says that we can in polynomial-time find k s balls of radius two such that at least b s blue balls of P s are covered and at least r s − max j:z j >0 |F(j) ∩ R| red points of P s are covered. Here, F(j) refers to the flower restricted to the point set P s .
To prove the the second part of Lemma 2.8, it is thus sufficient to show that LP1 has a feasible solution where z j = 0 for all j ∈ P s such that |F(j) ∩ R| > 3 · τ . In turn, this follows by showing that, for any such j ∈ P s with |F(j) ∩ R| > 3 · τ , no point in B(j) is in OP T (since then z j = 0 in the integral solution corresponding to OP T ). Such a feasible solution can be found by adding x i = 0 ∀i ∈ B(j) for all such j to LP1.
To see why this holds, suppose towards a contradiction that there is a c ∈ OP T such that c ∈ B(j). First, since there are no dense points in P s , we have that the number of red points in B(c) ∩ P s is at most 2 · τ . Therefore the number of red points of P s in F(j) \ B(c) is strictly more than τ . In other words, we have τ < |Gain(c, j) ∩ P s | ≤ |Gain(c, j) ∩ P 4 | which contradicts (1).
Equipped with the above lemma we are now ready to finalize the proof of Theorem 2.
Proof of Theorem 2. Our algorithm guesses the optimal radius and the centers c 1 , c 2 , c 3 in Phase I, and k d , r d , b d in Phase II. There are at most n 2 choices of the optimal radius, n choices for each c i , and n + 1 choices of k d , r d , b d (ranging from 0 to n). We can thus try all these possibilities in polynomial time and, since all other steps in our algorithm run in polynomial time, the total running time will be polynomial. The algorithm tries all these guesses and outputs the best solution found over all choices. For the correct guesses, we output a solution with 3 + k d + k s = k balls of radius at most two. Furthermore, by the second property of Lemma 2.5 and the two properties of Lemma 2.8, we have that • the number of blue points covered is at least |B ∩ Guess| + b d + b s = b; and • the number of red points covered is at least |R ∩ Guess| + 3τ + r d + r s − 3τ = r.
We have thus given a polynomial-time algorithm that returns a solution where the balls are of radius at most twice the optimal radius.

Constant Number of Colors
Our algorithm extends easily to a constant number ω of color classes C 1 , . . . , C ω with coverage requirements p 1 , . . . , p ω . We use the LPs in Fig. 3 for a general number of colors, where p j,i in LP2(ω) indicates the number of points of color class i in cluster j ∈ S. S is the set of cluster centers obtained from modified clustering algorithm in Appendix B to instances with ω color classes. LP2(ω) has only ω non-trivial constraints, so any extreme point has at most ω variables attaining strictly fractional values, and a feasible solution attaining objective value at least p 1 will have at most k + ω − 1 positive values. By rounding up to 1 the fractional value of the center that contains the most number of points of C ω , we can cover p ω points of C ω . We would like to be able to close the remaining fractional centers, so we apply an analogous procedure to the case with just two colors.
We can guess 3(ω − 1) centers of OP T for each of the ω − 1 colors whose coverage requirements are to be satisfied. Then we bound the number of points of each color that may be found in a cluster, by removing dense sets that contain too many points of any one color and running a dynamic program on the removed sets. The final step is to run the clustering algorithm of [5] on the remaining points, and rounding to one the fractional center with the most number of points of C 1 , and closing all other fractional centers.
In particular, we get a running time with a factor of n O(ω 2 ) . The remainder of this section gives a formal description of the algorithm for ω color classes.

Formal Algorithm for ω colors
The following is a natural generalization of Lemma 2.2 and summarizes the main properties of the clustering algorithm of Appendix B for instances with ω color classes.

Lemma 1 ′ . Given a fractional solution (x, z) to LP1(ω), there is a polynomial-time algorithm that outputs at most k clusters of radius two that cover at least p ω points of C ω , and at least
Since we may not meet the coverage requirements for ω − 1 color classes, it is necessary to guess some balls of OP T for each of those colors, and for each fractional center. In total we guess 3(ω − 1) 2 points of OP T as follows: For j = 2, . . . , ω, for i = 1, 2, . . . , 3(ω − 1) guess the center c j,i in OP T and calculate the point q j,i ∈ B(c j,i ) such that |C j ∩ Gain(c j,i , q j,i ) ∩ P j,i | is maximized over all possible c j,i , where This guessing takes O(n 3(ω−1) 2 ) rounds. It is possible that some c j,i coincide, but this does not affect the correctness of the algorithm. In fact, this can only improve the solution, in the sense that the coverage requirements will be met with fewer than k centers. Let k c denote the number of distinct c j,i obtained in the correct guess. For notation, define To be consistent with previous notation, let The important properties guaranteed by the first phase can be summarized in the following lemma whose proof is the natural extension of Lemma 2.5.
Lemma 2 ′ . Assuming that c j,i are guessed correctly, we have that {c j,i } are contained in P 4 and cover p ω − |C ω ∩ Guess| of points in C ω and p j − |C j ∩ Guess| points of C j for j = 2, . . . , ω; and 2. the clusters F(q j,i ) are contained in P \ P 3(ω−1)+1 and cover at least |C ω ∩ Guess| points of C ω and at least |C j ∩ Guess| + 3(ω − 1) · τ j points of C j .
Now we need to remove points which contain many points from any one of the color classes to partition the instance into dense and sparse parts which leads to the following generalized definition of dense points.
Definition 4 ′ . When considering a subset of the points P s ⊆ P , we say that a point p ∈ P s is j-dense if |C j ∩ B(p) ∩ P s | > 2τ j . For a j-dense point p, we also let I p ⊆ P s contain those points i ∈ P s such that |C j ∩ B(i) ∩ B(p) ∩ P s | > τ j , for every 2 ≤ j ≤ ω. Now we perform a similar iterative procedure as for two colors: Initially, let I = ∅ and P s = P 3(ω−1) . While there is a j-dense point p ∈ P s for any 2 ≤ j ≤ ω: • Add I p to I and update P s by removing the points D p = ∪ i∈Ip B(i) ∩ P s .
As in the case of two colors, set P d = P 3(ω−1) \ P s . By naturally extending Lemma 2.7 and its proof, we can ensure that any ball of OP T \ ∪ ω j=2 ∪ {c j,i } is completely contained in either P d or P s . We guess the number k d of such balls of OP T contained in P d , and guess the numbers {c j,i } contained in P s is given by k s = k − k c − k d and these balls cover at least s j = p j − |C j ∩ Guess all | − d j points of C j in P s , 1 ≤ j ≤ ω.
Assuming that the parameters are guessed correctly we can show, similar to Lemma 2.8, that the following holds. • A ′ s returns k s balls of radius two that cover at least s 1 points of C 1 of P s and at least s j − 3(ω − 1) · τ j points of C j of P s , 2 ≤ j ≤ ω.
The algorithm A ′ d proceeds as did A d , with the modification that the dynamic program is now (ω + 1)-dimensional. Algorithm A ′ s , is also similar to A s , because LP1 has a feasible solution where z p = 0 for all p ∈ P s such that |F(p) ∩ C j | > 3τ j holds for any 2 ≤ j ≤ ω. Hence, we output a solution with k c + k d + k s = k balls of radius at most two, and • the number of points of C 1 covered is at least |C 1 ∩ Guess| + d 1 + s 1 = p 1 ; and • the number of points of C j covered is at least |C j ∩Guess|+3(ω−1)τ j +d j +s j −3(ω−1)τ j = p j , for all j = 2, . . . , ω.
This is a polynomial-time algorithm for colorful k-center with a constant number of color classes.

LP Integrality Gaps
In this section, we present two natural ways to strengthen LP1 and show that they both fail to close the integrality gap, providing evidence that clustering and knapsack feasibility cannot be decoupled in the colorful k-center problem. On one hand, the Sum-of-Squares hierarchy is ineffective for knapsack problems, while on the other hand, adding knapsack constraints to LP1 is also insufficient due to the clustering aspect of this problem.

Sum-of-Squares Integrality Gap
The Sum-of-Squares hierarchy (equivalently Lasserre [16,17]) is a method of strengthening linear programs that has been used in constraint satisfaction problems, set-cover, and graph coloring, to just name a few examples [3,9,18]. We use the same notation for the Sum-of-Squares hierarchy, abbreviated as SoS, as in Karlin et al. [15]. For a set V of variables, P(V ) are the power sets of V and P t (V ) are the subsets of V of size at most t. Their succinct definition of the hierarchy makes use of the shift operator : for two vectors x, y ∈ R P(V ) the shift operator is the vector x * y ∈ R P(V ) such that Analogously, for a polynomial g(x) = I⊆V a I i∈I x i we have (g * y) I = J⊆V a J y I∪J . In particular, we work with the linear inequalities g 1 , . . . , g m so that the polytope to be lifted is Let T be a collection of subsets of V and y a vector in R T . The matrix M T (y) is indexed by elements of T such that (M T (y)) I,J = y I∪J .
We can now define the t-th SoS lifted polytope. Definition 4.1. For any 1 ≤ t ≤ n, the t-th SoS lifted polytope SoS t (K) is the set of vectors y ∈ [0, 1] P 2t (V ) such that y ∅ = 1, M Pt(V ) (y) 0, and M P t−1 (V ) (g ℓ * y) 0 for all ℓ.
A point x ∈ [0, 1] n belongs to the t-th SoS polytope SoS t (K) if there exists y ∈ SoS t (K) such that We use a reduction from Grigoriev's SoS lower bound for knapsack [11] to show that the following instance has a fractional solution with small radius that is valid for a linear number of rounds of SoS.
Theorem 3 (Grigoriev). At least min{2⌊min{k/2, n − k/2}⌋ + 3, n} rounds of SoS are required to recognize that the following polytope contains no integral solution for k ∈ Z odd.  Consider an instance of colorful k-center with two colors, 8n points, k = n, and r = b = 2n where n is odd. Points {4i − 3, 4i − 2, 4i − 1, 4i}∀i ∈ [2n] belong to cluster C i of radius one. For odd i, C i has three red points and one blue point and for even i, C i has one red point and three blue points. A picture is shown in Figure 4. In an optimal integer solution, one center needs to cover at least 2 of these clusters while a fractional solution satisfying LP1 can open a center of 1/2 around each cluster of radius 1. Hence, LP1 has an unbounded integrality gap since the clusters can be arbitrarily far apart. This instance takes an odd number of copies of the integrality gap example given in [5].
We can do a simple mapping from a feasible solution for the tth round of SoS on the system of equations in Theorem 3 to our variables in the tth round of SoS on LP1 for this instance to demonstrate that the infeasibility of balls of radius one is not recognized. More precisely, we assign a variable w i to each pair of clusters of radius one as shown in Figure 4, corresponding to opening each cluster in the pair by w i amount. Then a fractional opening of balls of radius one can be mapped to variables that satisfy the polytope in Theorem 3. The remainder of this subsection is dedicated to formally describing the reduction from Theorem 3. Let W denote the set of variables used in the polytope defined in Theorem 3. Let w be in the t-th round of SoS applied to the system in Theorem 3 so that w is indexed by subsets of W of size at most t. Let V = V x ∪ V z , where V x = {x 1 , . . . , x 8n } and V z = {z 1 , . . . , z 8n }, be the set of variables used in LP1 for the instance shown in Figure 4. We define vector y with entries indexed by subsets of V , and show that y is in the t-th SoS lifting of LP1. In each ball we pick a representative x i , i ≡ 1 mod 4, to indicate how much the ball is opened, so we set y I = 0 if x j ∈ I, j ≡ 1 mod 4. Otherwise, we set y I = w π(I) where We have M Pt(W ) (w) 0, and for g 1 = −n+ n i=1 2x i and g 2 = n− n i=1 2x i , M P t−1 (W ) (g ℓ * w) 0 for ℓ = 1, 2 since w satisfies the t-th round of SoS. This implies that M P t−1 (W ) (g ℓ * w) is the zero matrix.
To show that M Pt(V ) (y) 0, we start with M Pt(W ) (w) and construct a sequence of matrices such that the semidefiniteness of one implies the semidefiniteness of the next, until we arrive at a matrix that is M Pt(V ) (y) with rows and columns permuted, i.e. M Pt(V ) (y) multiplied on the left and right by a permutation matrix and its transpose. Since the eigenvalues of a matrix are invariant under this operation, M Pt(W ) (w) 0 implies that M Pt(V ) (y) 0. Proof. We claim that this sequence of matrices exists with the following description. Firstly, the matrix M i+1 has one extra row and column than M i , and is the same on the leading principal submatrix of size M i . Then there are two possibilities: (a) The last row and column of M i+1 are all zeroes, or (b) for some j, the last row of M i+1 is a copy of the jth row of M i , the last column is a copy of the jth column of M i , and the last entry is (M i ) j,j .
Either way, the rank of M i+1 would be the same as the rank of M i . To prove this claim, it suffices to consider a sequence of indices of the matrix M Pt(V ) (y). The matrix M 0 in our sequence will be the submatrix of M Pt(V ) (y) indexed by the first k indices, where k is the dimension of M Pt(W ) (w), i.e. the number of subsets of W of size at most t. Each subsequent matrix M i will be the submatrix of M Pt(V ) (y) indexed by the first k + i indices. Note that the rows/columns of M Pt(V ) (y) can be considered to be indexed by all the subsets of V of size at most t. With this in mind, consider a sequence of subsets of V of size at most t with the following properties: 1. All subsets of {x 8i−7 : i ∈ [n]} of size at most t form a prefix of our sequence.
2. Each set index after the first has exactly one more element than some set index that came earlier in the sequence.
It is clear that it is possible to arrange all the subsets of V of size at most t in a sequence to satisfy these properties. It only remains to show that this sequence produces the desired construction for M 0 , M 1 , . . . , M p . We have M Pt(y) I,J = y I∪J = w π(I∪J) = w π(I),π(J) so property (1) guarantees that we begin with M 0 being M Pt(W ) (w), up to the correct permutation of subsets of {x 8i−7 : i ∈ [n]}. Now consider some k ′ th index in the sequence, k ′ > k where k is the dimension of M Pt(W ) (w). By property (2), it is of the form J ∪ {x}, where J is one of the first k ′ − 1 indices, and x ∈ V . There are two cases: • If x is some x i with i ≡ 1 mod 4, then y I ℓ ∪J = 0 for all ℓ ≤ k ′ .
In the first case, the matrix constructed from the first k ′ indices will have property (a), and in the second, property (b). Finally, it is clear that at each step the dimension of the matrices increases by one, and that it is the leading principal submatrix of the following matrix in the sequence, until we end up with M Pt(V ) (y) (up to some permutation of its rows and columns).
By the rank-nullity theorem, M i+1 has one more 0 eigenvalue than M i , so we can apply the following theorem.
With M i+1 = A and M i = B as in Theorem 4 we have that α n = 0 (since M i+1 and M i have the same eigenvalues but the dimension of the zero eigenspace of M i+1 is one greater than that of M i ). Hence, M i+1 has no negative eigenvalues if M i has no negative eigenvalues. This is sufficient to show that each matrix in the sequence constructed is positive semidefinite, and concludes the proof that M Pt(V ) (y) 0.
It remains to show that the matrices arising from the shift operator between y and the linear constraints of our polytope are positive semidefinite. Let h i denote the linear inequalities in LP1. In essence, the corresponding moment matrices M P t−1 (V ) (h i * y) are zero matrices since all h i are tight for the example in Figure 4. Formally, we have Proof. Let h 1,j be the linear polynomial that corresponds to the first inequality of LP1 for j ∈ P . First, if i ≡ 1 mod 4, then y I∪{x i } = 0 for any I ⊆ V . Otherwise, we have For the remaining inequalities of LP1: h 2 , h 3 , and h 4 , we have that M P t−1 (V ) (h ℓ * y) is the zero matrix because of how we defined the projection onto w: 2w π(I∪J∪{w j }) = (M P t−1 (g 2 * w)) π(I),π(J) = 0 This concludes the formal proof of the following theorem.

Flow Constraints
In this section we add additional constraints based on standard techniques to LP1. These incorporate knapsack constraints for the fractional centers produced in the hope of obtaining a better clustering and show that this fails to reduce the integrality gap.
We define an instance of a knapsack problem with multiple objectives. Each point p ∈ P corresponds to an item with three dimensions: a dimension of size one to restrict the number of centers, |B ∩ B(p)|, and |R ∩ B(p)|. We set up a flow network with an (n + 1) × n × n × k grid of nodes and we name the nodes with the coordinate (w, x, y, z) of its position. The source s is located at (0, 0, 0, 0) and we add an extra node t for the sink. Assign an arbitrary order to the points in P . For the item corresponding to i ∈ P , for each x ∈ [n], y ∈ [n], z ∈ [k]: 1. Add an edge from (i, x, y, z) to (i + 1, x, y, z) with flow variable e i,x,y,z .
2. With b i := |B ∩ B(i)| and r i := |R ∩ B(i)|, if z < k add an edge from (i, x, y, z) to (i + 1, min{x + b i , n}, min{y + b i , n}, z + 1) with flow variable f i,x,y,z .
For each x ∈ [b, n], y ∈ [r, n]: 3. Add an edge from (n + 1, x, y, k) to t with flow variable g x,y .
Set the capacities of all edges to one. In addition to the usual flow constraints, add to LP1 the constraints e i,x,y,z for all i ∈ P.
We refer to the resulting linear program as LP3. Notice that an integral solution to LP1 defines a path from s to t through which one unit of flow can be sent; hence LP3 is a valid relaxation. On the other hand, any path P from s to t defines a set C P of at most k centers by taking those points c for which f c,x,y,z ∈ P for some x, y, and z. Moreover, as t can only be reached from a coordinate with x ≥ b and y ≥ r we have that c∈C P |B(c) ∩ B| ≥ b and c∈C P |B(c) ∩ R| ≥ r. It follows that C P forms a solution to the problem of radius one if the balls are disjoint. In particular, our integrality gap instances for the Sum-of-Squares hierarchy do not fool LP3.
The example in Figure 5 shows that in an instance where balls overlap, the integrality gap remains large. Here, the fractional assignment of open centers is 1/2 for each of the six balls and this gives a fractional covering of 8 red and 8 blue points as required. This assignment also satisfies the flow constraints because the three balls at the top of the diagram define a path disjoint from the three at the bottom. By double counting the five points in the intersection of two balls we cover 8 red and 8 blue points with each set of three balls. Hence, we can send flow along each path. However, this does not give a feasible integral solution with three centers as any set of three clusters does not contain enough points. In fact, the four clusters can be placed arbitrarily far from each other and in this way we have an unbounded integrality gap since one ball needs to cover two clusters. j∈S r j y j ≥ r. To see this, j∈S r j y j = j∈S |R j |y j = j∈S j ′ ∈R jz j (y j =z j for any j ∈ S) ≥ j∈S j ′ ∈R j z j ′ (from second observation,z j ≥ z j ′ for any j ′ ∈ C j ) = j ′ ∈R:z j ′ >0 z j ′ (since C j 's are disjoint and contain all j s.t. z j > 0) = j ′ ∈R z j ′ ≥ r (since z satisfies LP1)) Similarly j∈S b j y j ≥ b. Finally we will show that j∈S y j ≤ k, j∈S y j ≤ j∈S j ′ ∈B(j) This concludes the proof of the claim that y is a feasible solution to LP2 with objective value at least r.