Resultant-based methods for solving bivariate polynomial equations

H. Khalil

This section describes an algorithm, which uses generalized eigenvalues and eigenvectors, for solving a system of two bivariate polynomials $p (x, y) = q (x, y) = 0$, with $p, q \in \mathbb{R} [x, y]$. It treats multiple roots, which are usually considered as obstacles for numerical linear algebra techniques. The numerical difficulties are handled with the help of singular value decomposition (SVD) of the matrix of eigenvectors associated to each eigenvalue. The extension to $n$ equations could be treated similarly.

Generalized eigenvalues and eigenvectors

Let $A, B$ be two square matrices of size $n \times n$. A {generalized eigenvalue} of $A$ and $B$ is a value in the set

\[ \lambda (A, B) : =\{\lambda \in \mathbb{C} ; \det (A - \lambda B) = 0\}. \]

A vector $x \neq 0$ is called a {generalized eigenvector} associated to the eigenvalue $\lambda \in \lambda (A, B)$ if $Ax = \lambda Bx$. The matrices $A$ and $B$ have $n$ generalized eigenvalues if and only if $\tmop{rank} (B) = n$. If $\tmop{rank} (B) < n$, then $\lambda (A, B)$ can be finite, empty, or infinite. Note that if $0 \neq \mu \in \lambda (A, B)$ then $1 / \mu \in \lambda (B, A)$. Moreover, if $B$ is invertible then $\lambda (A, B) = \lambda (B^{- 1} A, I) = \lambda (B^{- 1} A)$, which is the ordinary spectrum of $B^{- 1} A$.\

Recall that an $n \times n$ matrix $T (x)$ with polynomial coefficients can be equivalently written as a polynomial with $n \times n$ matrix coefficients. If $d = \max_{i, j} \{\deg (T_{ij} (x))\}$, we obtain $T (x) = T_d x^d + T_{d - 1} x^{d - 1} + \cdots + T_0$, where $T_i$ are $n \times n$ matrices.

Definition: The companion matrices of $T (x)$ are the two matrices $A$ and $B$ given by:

\[ \begin{array}{c} A = \left( \begin{array}{cccc} 0 & I & \cdots & 0\\ \vdots & \ddots & \ddots & \vdots\\ 0 & \cdots & 0 & I\\ T_0^t & T_1^t & \cdots & T_{d - 1}^t \end{array} \right), \hspace{0.25em} B = \left( \begin{array}{cccc} I & 0 & \cdots & 0\\ 0 & \ddots & & \vdots\\ \vdots & & I & 0\\ 0 & \cdots & 0 & - T_d^t \end{array} \right) \end{array} . \]

And we have the following interesting property :

Proposition: With the above notation, the following equivalence holds:

\[ \hspace{1em} T^t (x) v = 0 \Leftrightarrow (A - xB) \left( \begin{array}{c} v\\ xv\\ \vdots\\ x^{d - 1} v \end{array} \right) = 0. \]

Intersection of planar curves

From now on we will assume that $\mathbbm{K}$ is an algebraically closed field and suppose given two bivariate polynomials $p (x, y)$ and $q (x, y)$ in $\mathbbm{K} [x, y]$. Our aim will be to study and compute their common roots. This problem can be interpreted more geometrically. Polynomials $p$ and $q$ actually define two algebraic curves in the affine plane $\mathbb{A}^2$ (having coordinates $(x, y)$) and we would like to know their intersection points.

Hereafter we will consider systems $p (x, y) = q (x, y) = 0$ having only a {finite} number of roots. This condition is not quite restrictive since it is sufficient (and necessary) to require that polynomials $p$ and $q$ are coprime in $\mathbbm{K} [x, y]$ (otherwise one can divide them by their $\gcd$).

In the case where one of the curve is a line, i.e. one of the polynomial has degree 1, the problem is reduced to solving a univariate polynomial. Indeed, one may assume e.g. that $q (x, y) = y$ and thus we are looking for the roots of $p (x, 0) = 0$. Let us denote them by $z_1, \ldots, z_s$. Then we know that

\[ p (x, 0) = c (x - z_1)^{\mu_1} (x - z_2)^{\mu_2} \cdots (x - z_s)^{\mu_s}, \]

where the $z_i$'s are assumed to be distinct and $c$ is a non-zero constant in $\mathbbm{K}$. The integer $\mu_i$, for all $i = 1, \ldots, s$, is called the {multiplicity} of the root $z_i$, and it turns out that $\sum_{i = 1}^s \mu_i = \deg (p)$ if $p (x, y)$ do not vanish at infinity in the direction of the $y$-axis (in other words, if the homogeneous part of $p$ of highest degree do not vanish when $y = 0$). This later condition can be avoid in the projective setting: let $p^h (x, y, t)$ be the homogeneous polynomial obtained by homogenizing $p (x, y)$ with the new variable $t$, then we have

\[ p (x, 0, t) = c (x - z_1 t)^{\mu_1} (x - z_2 t)^{\mu_2} \cdots (x - z_s t)^{\mu_s} t^{\mu_{\infty}}, \]

where $\mu_{\infty}$ is an integer corresponding to the multiplicity of the root at infinity, and $\mu_{\infty} + \sum_{i = 1}^s \mu_i = \deg (p)$. Moreover, it turns out that the roots $z_1, \ldots, z_s$ and their corresponding multiplicities can be computed by eigenvalues and eigenvectors computation.

In the sequel we will generalize this approach to the case where $p$ and $q$ are bivariate polynomials of arbitrary degree. For this we first need to recall the notion of multiplicity in this context. Then we will show how to recover the roots from multiplication maps.

Intersection multiplicity

Let $p (x, y), q (x, y)$ be two coprime polynomials in $\mathbbm{K} [x, y]$, $\mathcal{C}_p$ and $\mathcal{C}_g$ the corresponding algebraic plane curves, $I : = (p, q)$ the ideal they generate in $\mathbbm{K} [x, y]$ and $\mathcal{A} : = \mathbbm{K} [x, y] / I$ the associated quotient ring. We denote by $z_1 = (x_1, y_1), \ldots, z_s = (x_s, y_s)$ the distinct intersection points in $\mathbb{A}^2$ of $\mathcal{C}_p$ and $\mathcal{C}_q$ (i.e. the distinct roots of the system $p (x, y) = q (x, y) = 0$).

A modern definition of the intersection multiplicity of $\mathcal{C}_p$ and $\mathcal{C}_q$ at a point $z_i$ is (see [Ful84])

\[ i (z_i, \mathcal{C}_p \cap \mathcal{C}_q) : = \dim_{\mathbbm{K}} \mathcal{A}_{z_i} < \infty, \]

where the point $z_i \in \mathbb{A}^2$ is here abusively (but usually) identified with its corresponding prime ideal in $\mathbbm{K} [x, y]$, $\mathcal{A}_{z_i}$ denoting the ordinary localization of the ring $\mathcal{A}$ by this prime ideal. As a result, the finite $\mathbbm{K}$-algebra $\mathcal{A}$ (which is actually finite if and only if $p (x, y)$ and $q (x, y)$ are coprime in $\mathbbm{K} [x, y]$) can be decomposed as the direct sum

\[ \mathcal{A} = \mathcal{A}_{z_1} \oplus \mathcal{A}_{z_2} \oplus \cdots \oplus \mathcal{A}_{z_s} \]

and consequently $\dim_{\mathbbm{K}} \mathcal{A} = \sum_{i = 1}^s i (z_i, \mathcal{C}_p \cap \mathcal{C}_q) .$

The intersection multiplicities can be computed using a resultant. The main idea is to project ``algebraically'' the intersection points on the $x$-axis. To do this let us see both polynomials $p$ and $q$ in $A [y]$ where $A : = \mathbbm{K} [x]$, that is to say as univariate polynomials in $y$ with coefficients in the ring $A$ which is a domain; we can rewrite

\begin{equation} \label{A[x]} p (x, y) = \sum_{i = 0}^{d_1} a_i (x) y^i, \hspace{.2cm} q (x, y) = \sum_{i = 0}^{d_2} b_i (x) y^i . \end{equation}

Their resultant (with respect to $y$) is an element in $A$ which is non-zero, since $p$ and $q$ are assumed to be coprime in $A [y]$, and which can be factorized, assuming that $a_{d_1} (x)$ and $b_{d_2} (x)$ do not vanish simultaneously, as

\begin{equation} \anchor BivariateResresform \tmop{Res} (p, q) = c \prod_{i = 1}^r (x - \alpha_i)^{\beta_i} \end{equation}

where the $\alpha_i$'s are distinct elements in $\mathbbm{K}$ and $\{\alpha_1, \ldots, \alpha_r \}=\{x_1, \ldots, x_s \}$ (as sets). For instance, if all the $x_i' s$ are distinct then we have $r = s$ and $\alpha_i = x_i$ for all $i = 1, \ldots, s$. Moreover, we have (see e.g. [Ful84] (S)):

Proposition: For all $i \in \{1, \ldots, r\}$, the integer $\beta_i$ equals the sum of all the intersection multiplicity of $\mathcal{C}_p$ and $\mathcal{C}_q$ at the points $z_j = (x_j, y_j)$ such that $x_j = \alpha_i$.

As a corollary, if all the $x_i$'s are distinct (this can be easily obtained by a linear change of coordinates $(x, y)$) then $i (z_i, \mathcal{C}_p \cap \mathcal{C}_q)$ is nothing but the valuation of $\tmop{Res} (p, q)$ at $x = x_i$. Another corollary of this result is the well-know Bezout theorem for algebraic plane curves, which is better stated in the projective context. Let us denote by $p^h (x, y, t)$ and $q^h (x, y, t)$ the homogeneous polynomials in $\mathbbm{K} [x, y, t]$ obtained from $p$ and $q$ by homogenization with the new variable $t$.

Proposition: [Bezout theorem] If $p$ and $q$ are coprime then the algebraic projective plane curves associated to $p^h (x, y, t)$ and $q^h (x, y, t)$ intersect in $\deg (p) \deg (q)$ points in $\mathbb{P}^2$, counted with multiplicities.

{proof} It follows straightforwardly from the previous proposition and the well-known fact that $\tmop{Res} (p^h, q^h)$ is a homogeneous polynomial in $\mathbbm{K} [x, t]$ of degree $\deg (p) \deg (q)$ (see e.g. [Lang02], [VWe48II]). {proof}

Intersection points

We take again the notation of A[x]} and [*] . We denote by $S (x)$ the Sylvester matrix of $p (x, y)$ and $q (x, y)$ seen in $A [y]$ (assuming that both as degree at least one in $y$); thus $\det (S (x)) = \tmop{Res} (p, q)$. From [*] , we deduce immediately that $\det (S (x))$ vanishes at a point $x_0 \in \mathbbm{K}$ if and only if either there exists $y_0$ such that $p (x_0, y_0) = q (x_0, y_0) = 0$, or either $a_{d_1} (x_0) = b_{d_2} (x_0) = 0$ (in which case the intersection point is at infinity). Therefore one may ask the following question : being given a point $x_0$ such that $\det (S (x_0)) = 0$, how to recover all the possible point $y_0$ such that $p (x_0, y_0) = q (x_0, y_0) = 0$?

Suppose given such a point $x_0$ and assume that $a_{d_1} (x_0)$ and $b_{d_2} (x_0)$ are not both zero. If $\tmop{kerS}^t (x_0)$ is of dimension one, then it is easy to check that there is only one point $y_0$ such that $p (x_0, y_0) = q (x_0, y_0) = 0$ and moreover any element in $\ker (S (x_0)^t)$ is a multiple of the vector $[1, y_0, \cdots, y_0^{d_1 + d_2 - 1}]$. Therefore to compute $y_0$, we only need to compute a basis of $\ker (S (x_0)^t)$ and compute the quotient of its second coordinate by its first one. In case of multiple roots, this construction can be generalized as follows:

Proposition: Let $\Lambda_1, \cdots, \Lambda_d$ be any basis of the kernel $\ker S (x_0)^t$, $\mathbf{\Lambda}$ be the matrix whose $i^{\text{{\footnotesize th}}}$ row is the vector $\Lambda_i$, $\Delta_0$ be the $d \times d$-submatrix of $\Lambda$ corresponding to the first $d$ columns and $\Delta_1$ be the $d \times d$-submatrix of $\Lambda$ corresponding to the columns number $2, 3, \ldots, d + 1$. Then $\lambda (\Delta_1, \Delta_0)$ is the set of roots of the equations $p (x_0, y) = q (x_0, y) = 0$ (i.e. the set of $y$-coordinates of the intersection points of $\mathcal{C}_p$ and $\mathcal{C}_q$ above $x = x_0$.

In this theorem, observe that the construction of $\Delta_0$ and $\Delta_1$ is always possible because we assumed that both polynomials $p$ and $q$ depend on the variable $y$, which implies that the number of rows in the matrix $S (x_0)$ is always strictly greater than the $\deg (\gcd (p (x_0, y), q (x_0, y)))$.

The previous theorem shows how to recover the $y$-coordinates of the roots of the system $p = q = 0$ from the Sylvester matrix. However, instead of the Sylvester matrix we can use the Bezout matrix to compute $\tmop{Res} (p, q) \in A [x]$ (see section [*] ). This matrix have all the required properties to replace the Sylvester matrix in all the previous results, except in proposition [*] . Indeed, the Bezout matrix being smaller than the Sylvester matrix, the condition given in remark [*] is not always fulfilled; using Bezout matrix in proposition [*] requires that for all roots $x_0$ of $\tmop{Res} (p, q)$,

\[ \max (\deg (p (x_0, y), \deg (q (x_0, y)) > \deg (\gcd (p (x_0, y), q (x_0, y)) . \]

This condition may failed: take for instance $x_0 = - 1$ in the system

\[ \left\{ \begin{array}{l} p (x, y) = x^2 y^2 - 2 y^2 + xy - y + x + 1\\ q (x, y) = y + xy \end{array} \right. \]

Note that the use of the Bezout matrix gives, in practice, a faster method because it is a smaller matrix than the Sylvester matrix, even if its computation takes more time (the save in the eigen-computations is greater).

The algorithm

We are now ready to describe our resultant-based solver. According to proposition [*] , we replace the computation of the resultant, its zeroes, and the kernel of $S^t (x_0)$ (or the Bezout matrix) for each root $x_0$ by the computation of generalized eigenvalues and eigenvectors of the associated companion matrices $A$ and $B$. This computation can be achieved with the QZ algorithm [GLVL96]. Here are some numerical problems which might occur: {enumeratealpha} For difficulty ``b'', we choose possibly a different $\epsilon$ and gather the points which lay in an interval of size $\epsilon$. According to the following proposition, the algorithm remains effective.

Proposition: Assume that $\{\xi_1 \cdots \xi_n \} \subseteq \lambda (A, B)$ with corresponding generalized eigenvectors $v_1^{(1)}, \cdots, v_{k_1}^{(1)}, \cdots, v_1^{(n)},$ $\cdots, v_{k_n}^{(n)}$ with $\sum k_i < d_1 + d_2$. We define $\mathbf{\Lambda}$ the matrix whose its rows are $v_1^{(1)}, \cdots, v_{k_1}^{(1)}, \cdots, v_1^{(n)}, \cdots, v_{k_n}^{(n)} .$ Then, if $\Delta_0$ is the first left $(k_1 + \cdots + k_n) \times (k_1 + \cdots + k_n)$ block of $\mathbf{\Lambda}$ and $\Delta_1$ the second $(k_1 + \cdots + k_n) \times (k_1 + \cdots + k_n)$ block, then the generalized eigenvalues of $\Delta_1$ and $\Delta_0$ give the set of ordinates of the intersections above $\xi_1 \cdots \xi_n$; i.e. this algorithm can be used to gather clusters of $x$-coordinates and regarding them as the projection of only one point.

For difficulties ``c'', it is necessary to use the singular value decomposition: If $x$ a generalized eigenvalue of $A$ and $B$, and $E$ the matrix of the associated eigenvectors, then the singular value decomposition of $E$ is

\[ E = U \Sigma V^t, \tmop{with} \Sigma = \left( \begin{array}{cccccc} \sigma_1 & & & & & \\ & \ddots & & & & \\ & & \sigma_r & & & \\ & & & 0 & & \\ & & & & \ddots & \\ & & & & & 0 \end{array} \right) \hspace{1em} \]

and $r$ is the rank of $E$. Let

\[ E' = E \cdot V = U \cdot \Sigma = [\sigma_1 u_1 \cdots \sigma_r u_r \hspace{0.25em} 0 \cdots 0] = [e'_1 \cdots e'_r \hspace{0.25em} 0 \cdots 0] . \]

The matrix $E'$ is also a matrix of eigenvectors, because the changes are done on the columns, and V is invertible. Hence $E'$ and $E$ have the same rank $r$, and $E'' = [e'_1 \cdots e'_r]$ is a matrix of linearly independent eigenvectors. Consequently $E''$ can be used to build $\Delta_0$ and $\Delta_1$.

This leads to the following complete algorithm:

{Algorithm { {Numerical approximation of the root of a square bivariate system} {{-0.3cm}}

{Input:} Two bivariate polynomials $p (x, y), q (x, y) \in \mathbbm{R} [x, y] .$

{Output:} an approximation of the real roots of $p (x, y) = 0$, $q (x, y) = 0$. }}{}


The function which implement this algorithm is
template<class C> Seq<VectDse<C> > solve(p,q,EigenRes<C>());

See also:



#include <synaps/msolve/projection/EigenRes.h>

using std::cout;
using std::cin;
using std::endl;

typedef MPol<double> mpol_t;

int main(int argc, char **argv)

  mpol_t p1("16*x0^4-160*x0^2*x1^2+400*x1^4-32*x0^3+160*x0*x1^2+6*x0^2-50*x1^2+10*x0+1.5625"),

  Seq<VectDse<double> > res;
  Seq<int> mux,muy;
  res= solve(p1,p2, EigenRes<double>(1e-3));
  res= solve(p1,p2,mux, EigenRes<double>(1e-6));
  res= solve(p1,p2,mux,muy, EigenRes<double>(1e-7));

  cout<<"Intersection points:"<<endl;
  for(unsigned i=0;i<res.size();++i)
    std::cout<<"("<<res[i][0]<<","<<res[i][1]<<") of multiplicity "<<mux[i]<<" "<<muy[i]<<std::endl;
#include <iostream>
int main()
  std::cout <<"  LAPACK is not connected"<<std::endl;