Finding an injective mapping with restrictions for values is NP-Complete

Finding an injective mapping with restrictions for values is NP-Complete

This post is about an interesting problem I recently tried to solve efficiently but ended up proving its NP-completeness. Informally, the problem is to find an injective mapping between two finite sets with certain, statically defined properties for values. For example, if A={a,b,c,d}A = \{a, b, c, d\} and B={1,2,3,4,5}B = \{1, 2, 3, 4, 5\}, we may want to find an injective mapping f:ABf : A \rightarrow B with restrictions f(a){1,3,4,5}f(a) \in \{1, 3, 4, 5\}, f(b){3,4,5}f(b) \in \{3, 4, 5\}, f(c){1,2,4,5}f(c) \in \{1, 2, 4, 5\}, f(d){1,2,5}f(d) \in \{1, 2, 5\}, f(e){1,2,3,4}f(e) \in \{1, 2, 3, 4\}. I find this problem of particular interest, because its nature is in my opinion very different compared to well-known NP-complete problems proven to be so by Karp. In this post we will visualize and formalize this problem, prove its NP-completeness and, finally, derive some interesting corollaries.

The problem

The above intuitive description of the problem can be equivalently formalized as follows:

Name: Restricted injective mapping

Input: A set AA as well as a collection S={A1,,An}S = \{A_1, \dots, A_n\} of subsets of AA, i.e. AiAA_i \subseteq A for all 1in1 \le i \le n.

Question: (a1,,an)A1××An\exists (a_1, \dots, a_n) \in A_1 \times \dots \times A_n such that aiaja_i \neq a_j for all ij,1i,jni \neq j, 1 \le i,j \le n?

This definition is equivalent to the brief description of the problem at the beginning of this post, since we can only focus on the restrictions in the target set (assuming that the elements in the source set are numbered) and it is possible to define such an injective mapping ff​ if and only if (a1,,an)A1××An\exists (a_1, \dots, a_n) \in A_1 \times \dots \times A_n​ such that aiaja_i \neq a_j​ for all iji \neq j​.

Example

Consider the following instance of the problem: A:={1,2,3,4,5,6}A1:={1,3,4,5}A2:={3,4,6}A3:={1,2,4,5}A4:={1,2,5}A5:={1,2,3,4}S:={A1,A2,A3,A4,A5}\begin{aligned} A &:= \{1, 2, 3, 4, 5, 6\} \\ A_1 &:= \{1, 3, 4, 5\} \\ A_2 &:= \{3, 4, 6\} \\ A_3 &:= \{1, 2, 4, 5\} \\ A_4 &:= \{1, 2, 5\} \\ A_5 &:= \{1, 2, 3, 4\} \\ S &:= \{A_1, A_2, A_3, A_4, A_5\} \end{aligned}

This example instance can be visualized with a table

 123456
A1A_1E EEE 
A2A_2  EE E
A3A_3EE EE 
A4A_4EE  E 
A5A_5EEEE  

where the elements of the subsets are marked with E. In other words, we essentially want to find 5 E-cells in the above table such that no two cells that we pick have the same row or column. This instance of the problem has a solution, since we can pick, for example, the following cells marked with S:

 123456
A1A_1S EEE 
A2A_2  SE E
A3A_3ES EE 
A4A_4EE  S 
A5A_5EEES  

This visualization makes it also very clear why the formalization of the problem above is really equivalent to finding an injective mapping with restrictions for values — we can interpret the above table as a mapping f:S{1,2,3,4,5,6}f : S \rightarrow \{1, 2, 3, 4, 5, 6\}​ where f(A1)=1f(A_1) = 1​, f(A2)=3f(A_2) = 3​, f(A3)=2f(A_3) = 2​, f(A4)=5f(A_4) = 5​, f(A5)=4f(A_5) = 4​. Obviously,

  • if some two cells that we choose are in the same column, then ff is not injective,
  • by definition of a mapping, no two cells can have the same row

A special case of the problem

Every injective mapping f:ABf : A \rightarrow B with A=BN0\st A \st = \st B \st \in \mathbb{N}_0 is also surjective and bijective. This leads us to an interesting special case of Restricted injective mapping, where we just add an additional constraint, that the amount of subsets in the collection SS must be equal to the cardinality of AA:

Name: Restricted bijective mapping

Input: A set AA of cardinality A=n\st A \st = n and a collection of nn subsets S={A1,,An}S = \{A_1, \dots, A_n\} of AA, i.e. AiAA_i \subseteq A for all 1in1 \le i \le n.

Question: (a1,,an)A1××An\exists (a_1, \dots, a_n) \in A_1 \times \dots \times A_n such that aiaja_i \neq a_j for all ij,1i,jni \neq j, 1 \le i,j \le n?

In other words and in light of the above visualization of an example instance of Restricted injective mapping with a table, this special case has an additional requirement that there are exactly as many rows as columns.

Proof idea

In order to prove NP\mathcal{NP}​​​-hardness of the Restricted bijective mapping problem above, we need to construct a reduction from an already-proven NP\mathcal{NP}​​​-complete problem. This reduction must somehow express the constraints of the original problem with the collection of subsets SS​​​. Many NP\mathcal{NP}​​​​-complete problems are graph-related and in graphs, a path with disjoint vertices seems to have very similar constraints compared to our problem. It may therefore be a good idea to let AA be the set of nodes VV in an undirected graph G=(V,E)G = (V, E) where, without loss of generality, V={1,,n}V = \{1, \dots, n\} and define AiA_i for 1in1 \le i \le n as follows: Ai:={v{i,v}E}A_i := \{v \mid \{i, v\} \in E\}

For example, consider the following graph with n=8n = 8​ vertices:

TODO

If there exists a cycle in GG​ that visits each node exactly once, or, in other words, a Hamiltonian circuit, like in this example where such a cycle is marked red, then, by definition of AiA_i, it holds that (a1,,an)A1××An:1i,jn:ijaiaj\exists (a_1, \dots, a_n) \in A_1 \times \dots \times A_n : \forall 1 \le i, j \le n : i \neq j \Rightarrow a_i \neq a_j

which means that such a reduction will lead to a yes-instance of Restricted bijective mapping. However, the converse is not true — a graph without a Hamiltonian circuit can still be reduced to a yes-instance of our problem. For example, the graph

TODO

gets reduced to A:={1,2,3,4,5,6}S:={A1,A2,A3,A4,A5,A6}A1:={3,2}A2:={1,3}A3:={1,2,6}A4:={5,6}A5:={4,6}A6:={3,4,5}\begin{aligned} A &:= \{1, 2, 3, 4, 5, 6\} \\ S &:= \{A_1, A_2, A_3, A_4, A_5, A_6\} \\ A_1 &:= \{3, 2\} \\ A_2 &:= \{1, 3\} \\ A_3 &:= \{1, 2, 6\} \\ A_4 &:= \{5, 6\} \\ A_5 &:= \{4, 6\} \\ A_6 &:= \{3, 4, 5\} \end{aligned}

where it is possible to choose (a1,a2,a3,a4,a5,a6)=(2,3,1,5,6,4)(a_1, a_2, a_3, a_4, a_5, a_6) = (2, 3, 1, 5, 6, 4)

such that aiaja_i \neq a_j for iji \neq j​​. It is clear that this problem only arises when the graph contains multiple vertex-disjoint cycles such that every node is part of exactly one such cycle.

The Hamiltonian circuit problem is a well-known NP\mathcal{NP}​-complete problem, which is formally defined as follows:

Name: HC

Input: Undirected graph G=(V,E)G = (V, E)

Question: Does there exist a hamiltonian circuit, i.e. a cycle that visits every vertex exactly once?

If we prove NP\mathcal{NP}-hardness of HC with a reduction ff that cannot produce graphs with two or more vertex-disjoint cycles such that every node is in exactly one such cycle, and then prove NP\mathcal{NP}-hardness of Restricted bijective mapping with a reduction gg that assumes that the input graph cannot be partitioned in vertex-disjoint cycles as mentioned above, then the composite reduction gfg \circ f will prove NP\mathcal{NP}-hardness of our problem. For a formal proof I will therefore need to reiterate a variant of the well-known proof of NP\mathcal{NP}​-completeness of HC emphasizing the fact that only graphs with the above mentioned constraint are being produced.

NP-completeness of HC

This problem is obviously in NP\mathcal{NP}, because we can use a sequence of vertices as a certificate that can be verified in polynomial time by making sure that:

  • this sequence of vertices is a path with disjoint vertices;
  • the last vertex is connected with the first one;
  • every node of the graph is in the sequence;

In order to prove NP\mathcal{NP}​​​-hardness, we will construct a polynomial-time reduction from the 3-SAT problem. Let G(i,k)G(i, k)​ be a graph with the following vertices and edges:

TODO

This gadget contains exactly the following two hamiltonian paths between s(i)s(i)​​​​ and e(i)e(i)​​​​​​ (we will drop the (i)(i)​​ in node names​​ if the graph G(i,k)G(i, k)​​​ containing that vertex can be inferred from the context): (s,s,t0,m0,f0,t1,m1,f1,t2,m2,f2,,tk,mk,fk,e,e)(s,s,f0,m0,t0,f1,m1,t1,f2,m2,t2,,fk,mk,tk,e,e)(s, s', t_0, m_0, f_0, t_1, m_1, f_1, t_2, m_2, f_2, \dots, t_k, m_k, f_k, e', e) \\ (s, s', f_0, m_0, t_0, f_1, m_1, t_1, f_2, m_2, t_2, \dots, f_k, m_k, t_k, e', e)

This is the case because after choosing t0t_0 or f0f_0 after ss', the rest of the hamiltonian path is uniquely determined — (s,t0,f1)(s', t_0, f_1), (s,f0,t1)(s', f_0, t_1), (tj,fj+1,tj+2)(t_j, f_{j+1}, t_{j+2}), (fj,tj+1,fj+2)(f_j, t_{j+1}, f_{j+2}), (tk1,fk,e)(t_{k-1}, f_k, e') and (fk1,tk,e)(f_{k-1}, t_k, e') cannot be parts of the path since then there is no way to visit

  • f0f_0 or m0m_0,
  • t0t_0 or m0m_0,
  • tj+1t_{j+1} or mj+1m_{j+1},
  • fj+1f_{j+1} or mj+1m_{j+1},
  • tkt_k or mkm_k,
  • fkf_k or mkm_k,

respectively, and then reach ee.

Let φ\varphi be a 3-SAT formula with kk clauses and, without loss of generality, nn variables x1,,xnx_1, \dots, x_n such that no variable appears positively and negatively in the same clause (otherwise remove such redundant clauses). We can construct a graph G=(V,E)G = (V, E) as follows:

  • For each variable xix_i​, add the gadget G(i,k)G(i, k) to GG. If i2i \ge 2, connect the current gadget with the gadget for the previous variable by creating an edge {e(i1),s(i)}E\{e(i - 1), s(i)\} \in E.
  • For each clause 1yk1 \le y \le k, create a node c(y)Vc(y) \in V.
  • If a positive literal xix_i appears in clause 1yk1 \le y \le k, create edges {c(y),fy1(i)}E\{c(y), f_{y-1}(i)\} \in E and {ty(i),c(y)}E\{t_y(i), c(y)\} \in E.
  • If a negative literal ¬xi\neg x_i appears in clause 1yk1 \le y \le k, create edges {c(y),ty1(i)}E\{c(y), t_{y-1}(i)\} \in E and {fy(i),c(y)}E\{f_y(i), c(y)\} \in E.
  • Create an edge {e(n),s(1)}E\{e(n), s(1)\} \in E.

Intuitively, a hamiltonian circuit in GG containing (s(i),t0(i))(s'(i), t_0(i)) corresponds to a variable assignment where xi=x_i = \top. Analogously, (s(i),f0(i))(s'(i), f_0(i)) in the hamiltonian circuit means that it corresponds to a variable assignment with xi=x_i = \bot. By construction of GG and because of the fact that no variable in φ\varphi appears in a single clause both positively and negatively, node c(y)c(y) can be part of a hamiltonian circuit if an only if the hamiltonian circuit “sets” at least one literal in clause yy to true, as described above. Moreover, if {c(y),fy1(i)}E\{c(y), f_{y-1}(i)\} \in E is part of the hamiltonian circuit, then ty(i)t_y(i) must be the next node after c(y)c(y), because otherwise it is impossible to visit all nodes because of the way gadgets are connected. The same holds for the case when {c(y),ty1(i)}E\{c(y), t_{y-1}(i)\} \in E is part of the hamiltonian circuit — then fy(i)f_y(i) must be the next node after c(y)c(y). It follows from the above, that there exists a hamiltonian circuit iff φ\varphi is satisfiable.

We now need to argue why GG cannot be partitioned into at least two vertex-disjoint cycles. Assume this were not the case. Then there would exist a clause yy such that c(y)Vc(y) \in V​ is part of a cycle that doesn’t visit every vVv \in V. If this cycle contains only c(y)c(y)​ and nodes within a single gadget G(i,k)G(i, k), then there exists at least one mj(i)m_j(i)​ in that gadget, which cannot be visited by any vertex-disjoint cycle because of the above argument that a gadget has exactly two hamiltonian paths, and we therefore get a contradiction. Otherwise the cycle must pass through at least two gadgets G(l,k)G(l, k) and G(m,k)G(m, k)​​ and must visit the “bridge” between them, i.e either e(l)e(l) and s(m)s(m) or s(l)s(l)​ and e(m)e(m)​​, because otherwise at least one node out of e(l),s(m),s(l),e(m)e(l), s(m), s(l), e(m)​​ would not be part of a vertex-disjoint cycle. But this means that there exists no path between s(1)s(1) and e(n)e(n) apart from (s(1),e(n))(s(1), e(n)) meaning that those nodes are not part of a vertex-disjoint cycle. We therefore get a contradiction.

The reduction described above is obviously a polynomial-time reduction. This completes the proof of NP\mathcal{NP}-completeness of HC.

NP-completeness of finding a bijective mapping

Restricted bijective mapping is in NP\mathcal{NP}, because (a1,,an)(a_1, \dots, a_n) can be used as a certificate. It can be verified in polynomial time that (a1,,an)A1××An(a_1, \dots, a_n) \in A_1 \times \dots \times A_n and that aiaja_i \neq a_j for all ij,1i,jni \neq j, 1 \le i,j \le n.

In order to prove NP\mathcal{NP}​-hardness, we will use the idea described in the proof idea section and construct the following polynomial-time reduction from the 3-SAT problem:

  • Given a 3-SAT formula φ\varphi, construct a graph G=(V,E)G = (V, E) as described in the proof of NP\mathcal{NP}-completeness of HC above. We can assume without loss of generality that V={1,,n}V = \{1, \dots, n\} as it is possible to rename vertices of GG efficiently.
  • Let A:={1,,n}A := \{1, \dots, n\}.
  • For each 1in1 \le i \le n, compute Ai:={v{i,v}E}A_i := \{v \mid \{i, v\} \in E\}.
  • Finally, construct the collection of subsets S:={A1,,An}S := \{A_1, \dots, A_n\}.

It follows from the correctness of the reduction in the proof of NP\mathcal{NP}-completeness of HC, that φ\varphi is satisfiable iff GG has a hamiltonian circuit. In order to prove the correctness of the present reduction, we therefore need to argue why GG has a hamiltonian circuit if and only if: (a1,,an)A1××An:1i,jn:ijaiaj\exists (a_1, \dots, a_n) \in A_1 \times \dots \times A_n : \forall 1 \le i, j \le n : i \neq j \Rightarrow a_i \neq a_j

(\Rightarrow​​): Assuming there exists a hamiltonian circuit (p1,,pn,p1)Vn+1(p_1, \dots, p_n, p_1) \in V^{n+1}, we can choose (apn,ap1,,apn1)=(p1,p2,,pn)Apn×Ap1××Apn1(a_{p_n}, a_{p_1}, \dots, a_{p_{n-1}}) = (p_1, p_2, \dots, p_n) \in A_{p_n} \times A_{p_1} \times \dots \times A_{p_{n-1}}

nn distinct values from A1××AnA_1 \times \dots \times A_n.

(\Leftarrow​​): Let (a1,,an)A1××An(a_1, \dots, a_n) \in A_1 \times \dots \times A_n​​ such that aiaja_i \neq a_j​​ for iji \neq j​​. We can define the following mapping: ψ:VVxax\begin{aligned} \psi : V &\rightarrow V \\ x &\mapsto a_x \end{aligned}

It follows from the fact that aia_i​ values are pairwise distinct, that ψ\psi​ is injective. Moreover, ψ\psi​ is bijective as the set of vertices VV​​ is finite. As we argued in the np-completeness of hc section, GG cannot be partitioned into more than one vertex-disjoint cycle. This means that: vV,kN0:k<nψk(v)v\forall v \in V, k \in \mathbb{N}_0 : k < n \Rightarrow\psi^k(v) \neq v

However, because ψ\psi​ is bijective and each application of ψ\psi leads to a new value in VV, it holds that: vVkN:1knψk(v)=v\forall v \in V \exists k \in \mathbb{N} : 1 \le k \le n \wedge \psi^k(v) = v

It follows from the above two statements that vV:ψn(v)=v\forall v \in V : \psi^n(v) = v

and nn is the minimal number of times ψ\psi must be applied to reach vv starting in vv​. GG therefore contains {1,ψ(1)}E{ψ(1),ψ(ψ(1))}E{ψ(ψ(1)),ψ(ψ(ψ(1)))}E{ψn2(1),ψn1(1)}E{ψn1(1),ψn(1)1}E\begin{aligned} \{1, \psi(1)\} &\in E \\ \{\psi(1), \psi(\psi(1))\} &\in E \\ \{\psi(\psi(1)), \psi(\psi(\psi(1)))\} &\in E \\ \vdots \\ \{\psi^{n-2}(1), \psi^{n-1}(1)\} &\in E \\ \{\psi^{n-1}(1), \underbrace{\psi^n(1)}_{1}\} &\in E \end{aligned}

meaning that there exists a hamiltonian circuit (1,ψ(1),,ψn1(1),1)(1, \psi(1), \dots, \psi^{n-1}(1), 1)

This completes the proof.

Corollaries

NP-completeness of finding an injective mapping

Restricted injective mapping is obviously in NP\mathcal{NP}​.

The NP\mathcal{NP}​-hardness of the present problem follows directly from the NP\mathcal{NP}-hardness of Restricted bijective mapping as we can construct a trivial reduction from the latter problem that just passes the instance without any changes.

Counting the amount of injective mappings is NP-hard

Consider an instance of Restricted injective mapping, concretely a set AA and a collection S={A1,,An}S = \{A_1, \dots, A_n\} of subsets of AA. It may be interesting not only to determine whether we can choose (a1,,an)A1××An(a_1, \dots, a_n) \in A_1 \times \dots \times A_n such that aiaja_i \neq a_j for all ij,1i,jni \neq j, 1 \le i,j \le n, but also to count the amount of such possibilities and to assess the hardness of this combinational problem. With the inclusion–exclusion principle we can count the amount of possibilities of choosing (a1,,an)A1××An(a_1, \dots, a_n) \notin A_1 \times \dots \times A_n such that aiaja_i \neq a_j for all ij,1i,jni \neq j, 1 \le i,j \le n​​. Let Ai:=AAi\overline{A_i} := A \setminus A_i. To do that, we first define: Mi,u:={(a1,,an)A1××Anai=u}M_{i,u} := \{(a_1, \dots, a_n) \notin A_1 \times \dots \times A_n \mid a_i = u\}

The above amount of possibilities of choosing the wrong tuples that still satisfy the condition aiaja_i \neq a_j for iji \neq j is equal to: γ:=(uA1M1,u)(uAnMn,u)\gamma := \Bigg\vert \Bigg(\bigcup_{u \in \overline{A_1}} M_{1, u}\Bigg) \cup \dots \cup \Bigg(\bigcup_{u \in \overline{A_n}} M_{n, u}\Bigg)\Bigg\vert

This formula has O(nA)O(n \cdot \st A \st)​ terms and, given the value of γ\gamma​​, we can calculate the amount of possibilities we are interested in efficiently using the following formula: (i=0n1(Ai))γ\Bigg(\prod_{i=0}^{n - 1} (\st A \st - i)\Bigg) - \gamma

For these reasons, the calculation of γ\gamma​​​​ is the performance bottleneck of this problem. We can calculate γ\gamma ​using γ=k=1I(1)k1TIT=ktTMt\gamma = \sum_{k=1}^{\st I \st} (-1)^{k-1}\sum_{\substack{T \subseteq I \\ \st T \st = k}} \Big\vert \bigcap_{t \in T} M_t \Big\vert

where II​​​​ is the set of indices for the set MM​​​​ that come up in the formula for γ\gamma​​​​​, but this approach would unfortunately cause exponential blowup. The key question I now want to address is whether it may still be somehow possible to calculate γ\gamma​​​ more efficiently.

The problem

Name: Restricted injective mapping count

Input: A set AA, collection S={A1,,An}S = \{A_1, \dots, A_n\} of subsets of AA and kN0k \in \mathbb{N}_0.

Question: Is the amount of possibilities of choosing (a1,,an)A1××An(a_1, \dots, a_n) \in A_1 \times \dots \times A_n such that aiaja_i \neq a_j for all ij,1i,jni \neq j, 1 \le i,j \le n​ at least kk?

is NP\mathcal{NP}-hard, since Restricted injective mapping can be reduced to it (with a reduction that simply sets k:=1k := 1 and passes the instance of the problem without any changes). We can therefore conclude, that γ\gamma cannot be computed efficiently unless P=NP\mathcal{P} = \mathcal{NP}​. Intuitively, the present combinational problem uses the full power of the inclusion-exclusion principle (see this post for similar reasoning).


Copyright © 2019 — 2023 Alexander Mayorov. All rights reserved.