Monotone circuit lower bounds from resolution

For any unsatisfiable CNF formula F that is hard to refute in the Resolution proof system, we show that a gadget-composed version of F is hard to refute in any proof system whose lines are computed by efficient communication protocols—or, equivalently, that a monotone function associated with F has large monotone circuit complexity. Our result extends to monotone real circuits, which yields new lower bounds for the Cutting Planes proof system.

lower bounds for any propositional proof system whose lines are computed by efficient communication protocols-can be proved via dag-like protocols. Namely, a lower bound is given by the least size of a dag-like protocol that solves a certain CNF search problem associated with F .
In this paper, we prove a query-to-communication lifting theorem that escalates lower bounds for a dag-like query model (essentially Resolution) to lower bounds for dag-like communication protocols. In particular, this yields a new technique to prove size lower bounds for monotone circuits and several types of proof systems (including Cutting Planes).
The result can be interpreted as a converse to monotone feasible interpolation [10,32], which is a popular method to prove refutation size lower bounds for proof systems (such as Resolution and Cutting Planes) by reductions to monotone circuit lower bounds. A theorem of this type was conjectured by Beame, Pitassi, and Huynh [5, §6]. We also note that lifting theory for deterministic tree-like protocolswith applications to monotone formula size, tree-like refutation size, and size-space tradeoffs-has been developed in quite some detail [11,13,19,20,27,40,52]. We import techniques from this line of work into the dag-like setting.
We formalize our result in Section 3 after we have carefully defined our dag-like models in Section 2.

DAG-LIKE MODELS
We define all computational models as solving search problems, defined by a relation S ⊆ I × O for some finite input and output sets I and O. On input x ∈ I the search problem is to find some output in S(x) {o ∈ O : (x, o) ∈ S }. We always assume S is total so that S(x) ∅ for all x ∈ I. We also define S mKW search problem S f : input: a pair (x, y) ∈ f −1 (1) × f −1 (0) output: a coordinate i ∈ [n] such that x i > y i CNF search problem S F : input: an n-variable truth assignment z ∈ {0, 1} n output: clause D of F unsatisfied by z, i.e., D(z) = 0 Resolution proof : Bottom-up definition ⊤ (¬x 2 ) Conjunction-dag : Top-down definition Figure 1: Two equivalent ways to view a Resolution refutation, illustrated in the tree-like case (see [30, §18.2] for more discussion of the tree-like case).

Abstract Dags
We work with a top-down definition of dag-like models. A version of the following definition (with a specialized F ) was introduced by [44] and subsequently simplified in [38,50].
Top-down definition. Let F be a family of functions I → {0, 1}. An F -dag solving S ⊆ I × O is a directed acyclic graph of fan-out ≤ 2 where each node v is associated with a function f v ∈ F (we call f −1 v (1) the feasible set for v) satisfying the following: (1) Root: There is a distinguished root node r (fan-in 0), and f r ≡ 1 is the constant 1 function.
(2) Non-leaves: For each non-leaf node v with children u,u ′ (perhaps u = u ′ ), we have The size of an F -dag is its number of nodes. If we specialize S to be a CNF search problem S F , the above specializes to the familiar definition of refutations in a proof system whose lines are negations of functions in F . Here is that dual definition, specialized to S = S F .
Bottom-up definition. Let G be a family of functions {0, 1} n → {0, 1}. (To match up with the top-down definition, one should take G {¬f : f ∈ F }.) A (semantic) G-refutation of an n-variable CNF contradiction F is a directed acyclic graph of fan-out ≤ 2 where each node (or line) v is associated with a function д v ∈ G satisfying the following: (1) Root: There is a distinguished root node r (fan-in 0), and д r ≡ 0 is the constant 0 function.
(2) Non-leaves: For each non-leaf node v with children u,u ′ (perhaps u = u ′ ), we have

Concrete Dags
We now instantiate the abstract model for the purposes of communication and query complexity.
Rectangle-dags (dag-like protocols). Consider a bipartite input domain I X × Y so that Alice holds x ∈ X, Bob holds y ∈ Y, and let F be the set of all indicator functions of (combinatorial) rectangles over X × Y (sets of the form X × Y with X ⊆ X, Y ⊆ Y). Call such F -dags simply rectangle-dags. For a search problem S ⊆ X × Y × O we define its rectangle-dag complexity by rect-dag(S) least size of a rectangle-dag that solves S.
In circuit complexity, a straightforward generalization of the Karchmer-Wigderson depth characterization [31] shows that the monotone circuit complexity of any monotone function f equals rect-dag(S f ); see [38,50].
In proof complexity, a useful-to-study semantic proof system is captured by F c -dags solving CNF search problems S F where F c is the family of all functions X × Y → {0, 1} (where X × Y = {0, 1} n corresponds to a bipartition of the n input variables of S F ) that can be computed by tree-like protocols of communication cost c, say for c = polylog(n). Such a proof system can simulate other systems (such as Resolution and Cutting Planes with bounded coefficients), and hence lower bounds against F c -dags imply lower bounds for other concrete proof systems. Moreover, any F c -dag can be simulated by a rectangle-dag with at most a factor 2 c blow-up in size, and hence we do not lose much generality by studying only rectangle-dags.
Conjunction-dags (essentially Resolution). Consider the n-bit input domain I {0, 1} n and let F be the set of all conjunctions of literals over the n input variables. Call such F -dags simply conjunction-dags. We define the width of a conjunction-dag Π as the maximum width of a conjunction associated with a node of Π. For a search problem S ⊆ {0, 1} n × O we define conj-dag(S) least size of a conjunction-dag that solves S, w(S) least width of a conjunction-dag that solves S.
In the context of CNF search problems S = S F , conjunction-dags are equivalent to Resolution refutations; see also Figure 1. Indeed, conj-dag(S F ) is just the Resolution refutation size complexity of F , and w(S F ) is the Resolution width complexity of F [8]. Figure 2: We show lifting theorems for dags whose feasible sets are (a) rectangles or (b) triangles. It remains open (see Section 9) to prove any lower bounds for explicit mKW/CNF search problems when the feasible sets are (c) block-diagonal, which a special case of (d) intersections of 2 triangles.
The complexity measures introduced so far are related as follows; here S ′ is any two-party version of S obtained by choosing some bipartition X × Y = {0, 1} n of the input domain of S: The first inequality holds because each conjunction can be simulated by a rectangle. The second inequality holds since there are at most n O (w ) many distinct width-w conjunctions, and we may assume wlog that any f ∈ F is associated with at most one node in an F -dag (any incoming edge to a node v can be rewired to the lowest node u, in topological order, such that f v = f u ).

OUR RESULTS
Our first theorem is a characterization of the rectangle-dag complexity for composed search problems of the form S • д n . Here S ⊆ {0, 1} n ×O is an arbitrary n-bit search problem, and д : X×Y → {0, 1} is some carefully chosen two-party gadget that helps to distribute each input bit of S between the two parties. More precisely, S • д n ⊆ X n × Y n × O is the search problem where Alice holds x ∈ X n , Bob holds y ∈ Y n , and their goal is to find some o ∈ S(z) for z д n (x, y) = (д(x 1 , y 1 ), . . . , д(x n , y n )). Implications. The primary advantage of such a lifting theorem is that we obtain, in a generic fashion, a large class of hard (explicit) monotone functions and CNF contradictions. Indeed, let us see an example of how to apply our theorem. We can start with any nvariable k-CNF contradiction F of Resolution width w, and conclude from Theorem 1 that the composed problem S ′ S F • Ind n m has rectangle-dag complexity n Θ(w ) . Then we can use known reductions to translate S ′ back to a mKW/CNF search problem. We recall such reductions in Section 8, but the upshot will be that: A disadvantage, stemming from the large gadget size m = poly(n), is that we get at best (using w = Θ(n)) a monotone circuit lower bound of exp(N ε ) for a small constant ε > 0. This falls especially short of the current best record of exp(N 1/3−o(1) ) shown for an explicit monotone function by Harnik and Raz [24]. For this reason (and others), it is an important open problem to develop a lifting theory for gadgets of size m = O(1). In particular, an optimal 2 Ω(N ) lower bound would follow from an appropriate constantsize-gadget version of Theorem 1; see Section 8 for details.
Techniques. We use tools developed in the context of tree-like lifting theorems, specifically from [18,21]. These tools allow us to relate large rectangles in the input domain of S • Ind n m with large subcubes in the input domain of S; see Section 4. Given these tools, the proof of Theorem 1 is relatively short (two pages). The proof is extremely direct: from any rectangle-dag of size n d solving S • Ind n m we extract a width-O(d) conjunction-dag solving S. Classical works on monotone circuit lower bounds have typically focused on specific monotone functions [1,3,22,42,48] and more generally on studying the power of the underlying proof methods [2,9,43,45,49,51]. A notable exception is Jukna's criterion [29], recently applied in [14,26], which is a general sufficient condition for a monotone function to require large monotone circuit complexity. Our perspective is seemingly even more abstract, as our result is phrased for arbitrary search problems (not just of mKW/CNF type). However, it remains unclear exactly how the power of our methods compare with the classical techniques; for example, can our result be rephrased in the language of Razborov's method of approximations?

Extension: Monotone Real Circuits
Triangle-dags. Consider a bipartite input domain I X × Y and let F be the set of all indicator functions of (combinatorial) triangles over X × Y; here a triangle T ⊆ X × Y is a set that can be written as T = {(x, y) ∈ X × Y : a T (x) < b T (y)} for some labeling of the rows a T : X → R and columns b T : Y → R by real numbers; see Figure 2b. In particular, every rectangle is a triangle. Call such F -dags simply triangle-dags. For a search problem S ⊆ X × Y × O we define tri-dag(S) least size of a triangle-dag that solves S.
Hrubeš and Pudlák [25] showed recently that the monotone real circuit complexity of an f equals tri-dag(S f ). Monotone real circuits [23,36] generalize monotone circuits by allowing the wires to carry arbitrary real numbers and the binary gates to compute arbitrary monotone functions R × R → R. The original motivation to study such circuits, and what interests us here, is that lower bounds for monotone real circuits imply lower bounds for the Cutting Planes proof system [12]. In our language, semantic Cutting Planes refutations are equivalent to L-dags solving CNF search problems, where L is the family of linear threshold functions (each f ∈ L is defined by some (n Our second theorem states that Theorem 1 holds more generally with rectangle-dags replaced with triangle-dags. The proof is however more involved than the proof for Theorem 1. A pithy corollary is that if we start with any CNF contradiction F that is hard for Resolution and compose F with a gadget (as described in Section 8), the formula becomes hard for Cutting Planes. Previously, only few examples of hard contradictions were known for Cutting Planes, all proved via feasible interpolation [14,23,26,36]. A widely-asked question has been to improve this state-of-the-art by developing alternative lower bound methods; see the surveys [6, §4] and [47, §5]. In particular, Jukna [30,Research Problem 19.17] asked to find a more intuitive "combinatorial" proof method "explicitly showing what properties of [contradictions] force long derivations. " It is unclear how "combinatorial" our method is, but at least it does afford a simple intuition: the hardness is simply borrowed from the realm of Resolution (where we understand very well what makes formulas hard).

SUBCUBES FROM RECTANGLES
In this section, as preparation, we recall some technical notions from [18,21] concerning the index gadget д Ind m . Namely, writing G д n : [m] n × {0, 1} mn → {0, 1} n for n copies of д, we explain how large rectangles in G's domain are related with large subcubes in G's codomain.

Structured Rectangles
For a partial assignment ρ ∈ {0, 1, * } n we let free ρ ρ −1 ( * ) denote its free coordinates, and fix ρ [n] free ρ denote its fixed coordinates. The number of fixed coordinates |fix ρ| is the width of ρ. Width-d partial assignments are naturally in 1-to-1 correspondence with width-d conjunctions: for any ρ we define C ρ : {0, 1} n → {0, 1} as the width-|fix ρ| conjunction that accepts an For a random variable x we let H ∞ (x) min x log(1/Pr[x = x]) denote the usual min-entropy of x. When x ∈ [m] J for some index set J , we write x I ∈ [m] I for the marginal distribution of x on a subset I ⊆ J of coordinates. For a set X we use the boldface X to denote a random variable uniformly distributed over X .
(2) X free ρ is 0.9-dense: for every nonempty I ⊆ free ρ, X I has min-entropy rate ≥ 0.9, that is, In this work we need a slight strengthening of Lemma 3: for a ρ-structured R, there is a single row of R that is already ρ-like. The proof is given in the full version [16].

Rectangle Partition Scheme
We claim that, given any rectangle R X × Y ⊆ [m] n × {0, 1} mn , we can partition most of X ×Y into ρ-structured subrectangles with |fix ρ| bounded in terms of the size of X × Y . Indeed, we describe a simple 2-round partitioning scheme from [21] below; see also Figure 3. In the 1st round of the algorithm, we partition the rows as X = i X i where each X i will be fixed on some blocks I i ⊆ [n] and 0.95-dense on the remaining blocks [n] I i . In the 2nd round, each X i × Y is further partitioned along columns so as to fix the outputs of the gadgets on coordinates I i .

Rectangle Scheme
Output: A partition of R into subrectangles. 1: 1st round: Iterate the following for i = 1, 2, . . . , until X becomes empty: be a maximal subset (possibly I i = ∅) such that X I i has min-entropy rate < 0.95, and let α i ∈ [m] I i be an outcome witnessing this: All the properties of Rectangle Scheme that we will subsequently need are formalized below; see also Figure 3. For terminology, given a subset A ′ ⊆ A we define its density (inside A) as |A ′ |/|A|. The proof of the following lemma is postponed to Section 7. Rectangle Lemma. Fix any parameter k ≤ n log n. Given a rec-

LIFTING FOR RECTANGLE-DAGS
In this section we prove the nontrivial direction of Theorem 1: Let Π be a rectangle-dag solving S • G of size n d for some d. Our goal is to show that w(S) ≤ O(d).

Game Semantics for Dags
For convenience (and fun), we use the language of two-player competitive games, introduced in [4,37], which provide an alternative way of thinking about conjunction-dags solving S ⊆ {0, 1} n × O.
The game involves two competing players, Explorer and Adversary, and proceeds in rounds. The state of the game in each round is modeled as a partial assignment ρ ∈ {0, 1, * } n . At the start of the game, ρ * n . In each round, Explorer makes one of two moves: − Query a bit: Explorer specifies an i ∈ free ρ, and Adversary responds with a bit b ∈ {0, 1}. The state ρ is updated by ρ i ← b. − Forget a bit: Explorer specifies an i ∈ fix ρ, and the state is updated by ρ i ← * .
An important detail is that Adversary is allowed to choose b ∈ {0, 1} freely even if the i-th bit was queried (with response different from b) and subsequently forgotten during past play. The game ends when a solution to S can be inferred from ρ, that is, when Explorer's goal is to end the game while keeping the width of the game state ρ as small as possible. Indeed, Atserias and Dalmau [4] prove that w(S) is characterized (up to an additive ±1) as the least w such that the Explorer has a strategy for ending the game that keeps the width of the game state at most w throughout the game. (A similar characterization exists for dag size [37].) Hence our goal becomes to describe a Explorer-strategy for S such that the width of the game state never exceeds O(d) regardless of how the Adversary plays.

Simplified Proof
To explain the basic idea, we first give a simplified version of the proof: We assume that all rectangles R involved in Π-call them the original rectangles-can be partitioned errorlessly into ρ-structured subrectangles for ρ of width O(d). That is, invoking Rectangle Scheme for each original R, we assume that ( * ) Assumption: All subrectangles in the partition R = i R i output by Rectangle Scheme satisfy the "structured" case of Rectangle Lemma for k 2d log n.
In Section 5.3 we remove this assumption by explaining how the proof can be modified to work with some error rows/columns.
Overview. We extract a width-O(d) Explorer-strategy for S by walking down the rectangle-dag Π, starting at the root. For each original rectangle R that is reached in the walk, we maintain a ρ-structured subrectangle R ′ ⊆ R chosen from the partition of R.
Note that ρ will have width O(d) by our choice of k. The intention is that ρ will record the current state of the game. There are three issues to address: (1) Why is the starting condition of the game met? (2) How do we take a step from a node of Π to one of its children?
(3) Why are we done once we reach a leaf?
(1) Root case. At start, the root of Π is associated with the original rectangle R = [m] n × {0, 1} mn comprising the whole domain. The partition of R computed by Rectangle Scheme is trivial: it contains a single part, the * n -structured R itself. Hence we simply maintain the * n -structured R ⊆ R, which meets the starting condition for the game.
(2) Internal step. This is the crux of the argument: Supposing the game has reached state ρ R ′ and we are maintaining some ρ R ′structured subrectangle R ′ ⊆ R associated with an internal node v, we want to move to some ρ L ′ -structured subrectangle L ′ ⊆ L associated with a child of v. Moreover, we must keep the width of the game state at most O(d) during this move.
Since R ′ X ′ × Y ′ is ρ R ′ -structured, we have from Lemma 4 that there exists some x * ∈ X ′ such that {x * } × Y ′ is ρ R ′ -like. Let the two original rectangles associated with the children of v be L 0 and L 1 . Let i L i b be the partition of L b output by Rectangle Scheme. By query alignment in Rectangle Lemma, there is some As Explorer, we now query the input bits in coordinates J (I * 0 ∪ I * 1 ) fix ρ R ′ (in any order) obtaining some response string z J ∈ {0, 1} J from the Adversary. As a result, the state of the game becomes the extension of ρ R ′ by z J , call it ρ * , which has width |fix ρ Note that there is some y * ∈ Y ′ (and hence (x * , y * ) ∈ R ′ ⊆ L 0 ∪ L 1 ) such that G(x * , y * ) is consistent with ρ * ; indeed, the whole row {x * } ×Y ′ is ρ R ′ -like and ρ * extends ρ R ′ . Suppose (x * , y * ) ∈ L 0 ; the case of L 1 is analogous. In the partition of L 0 , let L ′ be the unique part such that (x * , y * ) ∈ L ′ . Note that L ′ is ρ L ′ -like for some ρ L ′ that is consistent with G(x * , y * ) and fix ρ L ′ ⊆ I * 0 (by query alignment). Hence ρ * extends ρ L ′ . As Explorer, we now forget all queried bits in ρ * except those queried in ρ L ′ .
We have recovered our invariant: the game state is ρ L ′ and we maintain a ρ L ′ -structured subrectangle L ′ of an original rectangle L 0 . Moreover, the width of the game state remained O(d).
(3) Leaf case. Suppose the game state is ρ and we are maintaining an associated ρ-structured subrectangle R ′ ⊆ R corresponding to a leaf node. The leaf node is labeled with some . Therefore the game ends. This concludes the (simplified) proof.

Accounting for Error
Next, we explain how to get rid of the assumption ( * ) by accounting for the rows and columns that are classified as error in Rectangle Lemma for k 2d log n. The partitioning of Π's rectangles is done more carefully: We sort all original rectangles in reverse topological order R 1 , R 2 , . . . , R n d from leaves to root, that is, if R i is a descendant of R j then R i comes before R j in the order. Then we process the rectangles in this order: Initialize cumulative error sets X * err = Y * err ∅. Iterate for i = 1, 2, . . . , n d rounds: (1) Remove from R i the rows/columns X * err , Y * err . That is, update (2) Apply the Rectangle Scheme for R i . Output all resulting subrectangles that satisfy the "structured" case of Rectangle Lemma for k 2d log n. (All non-structured subrectangles are omitted). Call the resulting error rows/columns X err and Y err .
In words, an original rectangle R i is processed only after all of its descendants are partitioned. Each descendant may contribute some error rows/columns, accumulated into sets X * err , Y * err , which are deleted from R i before it is partitioned. The partitioning of R i will in turn contribute its error rows/columns to its ancestors.
We may now repeat the proof of Section 5.2 verbatim using only the structured subrectangles output by the above process. That is, we still maintain the same invariant: when the game state is ρ, we maintain a ρ-structured R ′ (output by the above process) of an original R. We highlight only the key points below.
(1) Root case. The cumulative error at the end of the process is tiny: X * err , Y * err have density at most n d · n −2d ≤ 1/4 by a union bound over all rounds. In particular, the root rectangle R n d (with errors removed) still has density ≥ 1/2 inside [m] n × {0, 1} mn , and so the partition output by Rectangle Scheme is trivial, containing only the * n -structured R n d itself. This meets the starting condition for the game.
(2) Internal step. By construction, the cumulative error sets shrink when we take a step from a node to one of its children. This means that our error handling does not interfere with the internal step: each structured subrectangle R ′ of an original rectangle R is wholly covered by the structured subrectangles of R's children.
(3) Leaf case. This case is unchanged.

LIFTING FOR TRIANGLE-DAGS
In this section we prove the nontrivial direction of Theorem 2: Let Π be a triangle-dag solving S • G of size n d for some d. Our goal is The proof is conceptually the same as for rectangle-dags. The only difference is that we need to replace Rectangle Scheme (and the associated Rectangle Lemma) with an algorithm that partitions a given triangle T ⊆ [m] n × {0, 1} mn into subtriangles that behave like conjunctions.

Triangle Partition Scheme
We introduce a triangle partitioning algorithm, Triangle Scheme. Its definition is given in the full version [16]. For now, we only need its high-level description: On input a triangle T , Triangle Scheme outputs a disjoint cover i R i ⊇ T where R i are rectangles. This induces a partition of T into subtriangles T ∩ R i . Each (non-error) rectangle R i is ρ i -structured (for low-width ρ i ) and is associated with a ρ i -structured "inner" subrectangle Figure 4: Structured case of Triangle Lemma: The subtriangle T ∩ R i is sandwiched between two ρ i -structured rectangles L i and R i . Figure 4. Hence T ∩R i is ρ i -like, as it is sandwiched between two ρ i -like rectangles.
More formally, all the properties of Triangle Scheme that we will subsequently need are formalized below (note the similarity with Rectangle Lemma); see the full version [16] for the proof.
Triangle Lemma. Fix any parameter k ≤ n log n. Given a triangle T ⊆ [m] n × {0, 1} mn , let i R i be the output of Triangle Scheme.
Then there exist "error" sets X err ⊆ [m] n and Y err ⊆ {0, 1} mn , both of density ≤ 2 −k (inside their respective sets), such that for each i, one of the following holds: • Structured case: R i is ρ i -structured for some ρ i of width at most O(k/log n). Moreover, there exists an "inner" rectangle L i ⊆ T ∩ R i such that L i is also ρ i -structured.
• Error case: R i is covered by error rows/columns, i.e., Finally, a query alignment property holds: for every x ∈ [m] n X err , there exists a subset I x ⊆ [n] with |I x | ≤ O(k/log n) such that every "structured" R i intersecting {x } × {0, 1} mn has fix ρ i ⊆ I x .

Simplified Proof
As in the rectangle case, we give a simplified proof assuming no errors. That is, invoking Triangle Scheme for each triangle T involved in Π, we assume that ( †) Assumption: All rectangles in the cover i R i ⊇ T output by Triangle Scheme satisfy the "structured" case of Triangle Lemma for k 2d log n.
The argument for getting rid of the assumption ( †) is the same as in the rectangle case, and hence we omit that step-one only needs to observe that removing cumulative error rows/columns from a triangle still leaves us with a triangle.
Overview. As before, we extract a width-O(d) Explorer-strategy for S by walking down the triangle-dag Π, starting at the root. For each triangle T of Π that is reached in the walk, we maintain a ρ-structured inner rectangle L ⊆ T . Here ρ (of width O(d) by the choice of k) will record the current state of the game. There are the three steps (1)-(3) to address, of which (1) and (3) remain exactly the same as in the rectangle case. So we only explain step (2), which requires us to replace the use of Rectangle Lemma with the new Triangle Lemma.
(2) Internal step. Supposing the game has reached state ρ L and we are maintaining some ρ L -structured inner rectangle L ⊆ T associated with an internal node v, we want to move to some ρ L -structured inner rectangle L ⊆ T associated with a child of v.
Moreover, we must keep the width of the game state at most O(d) during this move.
Since L X ′ × Y ′ is ρ L -structured, we have from Lemma 4 that there exists some x * ∈ X ′ such that {x * } × Y ′ is ρ L -like. Let the two triangles associated with the children of v be T 0 and T 1 , so that L ⊆ T 0 ∪ T 1 .
Let i R i b be the rectangle cover ofT b output by Triangle Scheme. By query alignment in Triangle Lemma, there is some As Explorer, we now query the input bits in coordinates J (I * 0 ∪ I * 1 ) fix ρ L (in any order) obtaining some response string z J ∈ {0, 1} J from the Adversary. As a result, the state of the game becomes the extension of ρ L by z J , call it ρ * , which has width |fix ρ * | = |fix ρ L ∪ J | ≤ O(d).
Note that there is some y * ∈ Y ′ (and hence (x * , y * ) ∈ L ⊆ T 0 ∪T 1 ) such that G(x * , y * ) is consistent with ρ * ; indeed, the whole row {x * } × Y ′ is ρ L -like and ρ * extends ρ L . Suppose (x * , y * ) ∈ T 0 ; the case of T 1 is analogous. In the rectangle covering of T 0 , let R be the unique part such that (x * , y * ) ∈ R. Note that R is ρ R -like for some ρ R that is consistent with G(x * , y * ) and fix ρ R ⊆ I * 0 (by query alignment). Hence ρ * extends ρ R . As Explorer, we now forget all queried bits in ρ * except those queried in ρ R . Also we move to the inner rectangle L ⊆ R promised by Triangle Lemma that satisfies L ⊆ T 0 and is ρ L = ρ R structured.
We have recovered our invariant: the game state is ρ L and we maintain a ρ L -structured subrectangle L of a triangle T 0 . Moreover, the width of the game state remained O(d).

PARTITIONING RECTANGLES
In this section, we prove Rectangle Lemma. We use repeatedly the following simple fact about min-entropy. The proof is more-or-less implicit in [18,21]. We start by recording a key property of the 1st round of Rectangle Scheme. Claim 6. Each part X i obtained in 1st round of Rectangle Scheme satisfies: − Blockwise-density: X i Proof. By definition, This shows the first property. For the second property, apply Fact 5 On the other hand, since X i is fixed on I i , we have H ∞ (X i ) ≤ (n − |I i |) log m. Combining these two inequalities we get H ∞ (X i ) ≤ (n − 0.05|I i |) log m, which yields the second property.
. To bound the size of Y err , we claim that there are at most (4m) n possible choices of i, γ . Indeed, each X i is associated with a unique pair (I i ⊆ [n], α i ∈ [m] I i ), and there are at most 2 n choices of I i and at most m n choices of corresponding α i . Also, for each X i , there are at most 2 n possible assignments to γ ∈ {0, 1} I i . For each i, γ , we add at most 2 mn−n 2 columns to Y err . Thus, Y err has density at most (4m) n · 2 −n 2 < 2 −k inside {0, 1} mn .
We define X err i X i subject to |I i | > 20k/log m. Let i be the least index with |I i | > 20k/log m so that X err ⊆ X i . By Claim 6, |X i | ≤ m n−0.05 |I i | < m n · 2 −k since |I i | > 20k/log m. In other words, X i , and hence X err , has density at most 2 −k inside [m] n .
, be a rectangle not contained in the error rows/columns. By definition of X err , Y err , this means |Y i,γ | ≥ 2 mn−n 2 (so that H ∞ (Y i,γ ) ≥ mn − n 2 ) and |I i | ≤ 20k/log m. We have from Claim 6 that X i Query alignment. For each x ∈ [m] n X err , we define I x = I i where X i is the unique part that contains x. It follows that any ρ-structured rectangle that intersects the x-th row is of the form X i × Y i,γ and hence has fix ρ = I i . Since X i X err , we have |I i | ≤ O(k/log n).

TRANSLATING BETWEEN mKW/CNF
In this section, for exposition, we recall some known reductions between mKW and CNF search problems. These reductions can be combined with our main theorems to yield applications in proof and monotone circuit complexity (as outlined in Section 3).
Certificates. The key property of an n-bit search problem S ⊆ {0, 1} n × O that facilitates an efficient reduction to a mKW/CNF search problem is having a low certificate (aka nondeterministic) complexity. A certificate for (x, o) ∈ S is a partial assignment ρ ∈ {0, 1, * } n such that x is consistent with ρ and o is a valid output for every input consistent with ρ; in short, x ∈ C −1 ρ (1) ⊆ S −1 (o). A certificate for x is a certificate for (x, o) ∈ S for some o ∈ S(x). The certificate complexity of x is the least width of a certificate for x. The certificate complexity of S is the maximum over all x ∈ {0, 1} n of the certificate complexity of x.
For any search problem S one can associate a "certification" search problem S cert : on input x to S, output a certificate for x in S. Algorithmically speaking, such an S cert is clearly at least as hard as S: if we solve S cert by finding a certificate for (x, o) ∈ S, we can solve S by outputting o.
CNF search ⇔ low certificate complexity. For any k-CNF contradiction F , the associated CNF search problem S F has certificate complexity at most k. Conversely [35], for any total search problem S ⊆ {0, 1} n × O, we can construct a k-CNF contradiction F , where k is the certificate complexity of S, such that S F is a type of certification problem for S (and hence at least as hard as S). Namely, we can pick a collection C of width-k certificates, one for each x ∈ {0, 1} n . The k-CNF formula F is then defined as ρ ∈ C ¬C ρ .
Gadget composition. For the purposes of query complexity, there are two ways to represent the first argument x ∈ [m] to the index function Ind m : [m] × {0, 1} m as a binary string. The simplest is to write x as a log m-bit string. Under this convention, Ind m has certificate complexity log m + 1. If S ⊆ {0, 1} n × O has certificate complexity k, the composed problem S • Ind n m has certificate complexity k(log m + 1) (by composing certificates). For applications, this means that if we start with a k-CNF contradiction F , we may reduce S F • Ind n m to solving S F ′ where F ′ is a k(log m + 1)-CNF contradiction over O(mn) variables.
A better representation [5,13], which does not blow up the certificate complexity (or CNF width), is to write x as an m-bit string of Hamming weight 1 (the index of the unique 1-entry encodes x ∈ [m]). Under this convention, Ind n m : {0, 1} m × {0, 1} m → {0, 1} becomes a partial function of certificate complexity 2. Hence, if S has certificate complexity k, the partial composed problem S ′ S • Ind n m has certificate complexity 2k. Moreover, the partial problem S ′ can be extended into a total problem S tot without making it any easier to solve for rectangledags. Indeed, we introduce new variables/certificates allowing us to say that an input (x, y) to S ′ is trivially solved with output ⊥ O, if for some i ∈ [n], x i ∈ {0, 1} m is not of Hamming weight 1. Specifically, Alice will receive new input bits x ′ ∈ ({0, 1} m ) n (in addition to the original x ∈ ({0, 1} m ) n ) and we say that an Alice input xx ′ is good if for each i ∈ [n], the string x ′ i ∈ {0, 1} m describes a non-decreasing sequence 1 (the last value being hardcoded by convention), and moreover . Note that if xx ′ is not good, there is a width-3 certificate witnessing this. Our total search problem is defined by all these width-3 certificates (for output ⊥) together with all the original certificates of S ′ . To see that S tot is at least as hard as S ′ for rectangledags, we note that for any input (x, y) to S ′ , Alice can compute a unique x ′ so that xx ′ is good. Now any output o ∈ S tot (xx ′ , y) is also such that o ∈ S ′ (x, y).
In summary, we can reduce (in the context of rectangle-dags) S F • Ind n m to solving S F ′ where F ′ is a 2k-CNF contradiction over O(mn) variables. Consider a composed search problem S F • д n obtained from a k-CNF contradiction with ℓ clauses. Its nondeterministic communication complexity is at most log ℓ + k · (log m + 1); intuitively, it takes log ℓ bits to specify an unsatisfied clause C, and log m + 1 bits to verify the output of a single gadget, and there are k gadgets relevant to C. Suppose for a moment that a version of Theorem 1, proving a 2 Ω(w ) lower bound, held for a gadget of constant size m = O(1). Then we could lift any of the known CNF contradictions with parameters k = O(1), ℓ = O(n), w = Ω(n), to obtain an explicit monotone function on N = Θ(n) variables, with essentially maximal monotone circuit complexity 2 Ω(N ) . This gives some motivation to further develop lifting tools for small gadgets.

OPEN PROBLEMS
If the long line of work on tree-like lifting theory is of any indication, there should be much to explore also in the dag-like setting. We propose a few concrete directions.
Can our methods be extended to prove lower bounds for dags whose feasible sets are intersections of k triangles for k ≥ 2? See Figure 2. This would imply lower bounds for proofs systems such as width-k Resolution over Cutting Planes [33] and Resolution over linear equations [28,41]. Question 1. Prove a lifting theorem for F -dags where F {intersections of k triangles}.
One of the most important open problems (e.g., [47, §5]) regarding semi-algebraic proof systems that manipulate low-degree polynomials-where F is, say, degree-d polynomial threshold functionsis to prove lower bounds on their dag-like refutation length (treelike lower bounds are known [7,19]). Since degree-d polynomials can be efficiently evaluated by (d + 1)-party number-on-forehead (NOF) protocols, one might hope to prove a dag-like NOF lifting theorem. However, we currently lack a good understanding of NOF lifting even in the tree-like case. We believe the first necessary step should be to settle the following (a two-party analogue of which was proved in [18]).
Question 2. Prove a nondeterministic lifting theorem for NOF protocols.
The proof of Theorem 1, which extracts a width-O(d) conjunctiondag from a size-n d rectangle-dag, has the additional property of preserving the dag depth (up to an O(d) factor). This raises the question of whether one could investigate size-depth tradeoffs for monotone circuits via lifting. Razborov [46] has recently obtained related results for Resolution, but the parameters in his construction seem not to be good enough for a direct application of Theorem 1.