5164
|
1 \documentclass[11pt]{article} |
|
2 |
|
3 \newcommand{\m}[1]{{\bf{#1}}} % for matrices and vectors |
|
4 \newcommand{\tr}{^{\sf T}} % transpose |
|
5 |
|
6 \topmargin 0in |
|
7 \textheight 9in |
|
8 \oddsidemargin 0pt |
|
9 \evensidemargin 0pt |
|
10 \textwidth 6.5in |
|
11 |
|
12 %------------------------------------------------------------------------------ |
|
13 \begin{document} |
|
14 %------------------------------------------------------------------------------ |
|
15 |
|
16 \title{AMD Version 1.1 User Guide} |
|
17 \author{Patrick R. Amestoy\thanks{ENSEEIHT-IRIT, |
|
18 2 rue Camichel 31017 Toulouse, France. |
|
19 email: amestoy@enseeiht.fr. http://www.enseeiht.fr/$\sim$amestoy.} |
|
20 \and Timothy A. Davis\thanks{ |
|
21 Dept.~of Computer and Information Science and Engineering, |
|
22 Univ.~of Florida, Gainesville, FL, USA. |
|
23 email: davis@cise.ufl.edu. |
|
24 http://www.cise.ufl.edu/$\sim$davis. |
|
25 This work was supported by the National |
|
26 Science Foundation, under grants ASC-9111263, DMS-9223088, and CCR-0203270. |
|
27 Portions of the work were done while on sabbatical at Stanford University |
|
28 and Lawrence Berkeley National Laboratory (with funding from Stanford |
|
29 University and the SciDAC program). |
|
30 } |
|
31 \and Iain S. Duff\thanks{Rutherford Appleton Laboratory, Chilton, Didcot, |
|
32 Oxon OX11 0QX, England. email: i.s.duff@rl.ac.uk. |
|
33 http://www.numerical.rl.ac.uk/people/isd/isd.html. |
|
34 This work was supported by the EPSRC under grant GR/R46441. |
|
35 }} |
|
36 |
|
37 \date{January 29, 2004} |
|
38 \maketitle |
|
39 |
|
40 %------------------------------------------------------------------------------ |
|
41 \begin{abstract} |
|
42 AMD is a set of routines that implements the approximate minimum degree ordering |
|
43 algorithm to permute sparse matrices prior to |
|
44 numerical factorization. |
|
45 There are versions written in both C and Fortran 77. |
|
46 A MATLAB interface is included. |
|
47 \end{abstract} |
|
48 %------------------------------------------------------------------------------ |
|
49 |
|
50 Technical report TR-04-002, CISE Department, University of Florida, |
|
51 Gainesville, FL, 2004. |
|
52 |
|
53 AMD Version 1.1 (Jan. 21, 2004), Copyright\copyright 2004 by Timothy A. |
|
54 Davis, Patrick R. Amestoy, and Iain S. Duff. All Rights Reserved. |
|
55 |
|
56 {\bf AMD License:} |
|
57 Your use or distribution of AMD or any modified version of |
|
58 AMD implies that you agree to this License. |
|
59 |
|
60 THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY |
|
61 EXPRESSED OR IMPLIED. ANY USE IS AT YOUR OWN RISK. |
|
62 |
|
63 Permission is hereby granted to use or copy this program, provided |
|
64 that the Copyright, this License, and the Availability of the original |
|
65 version is retained on all copies. User documentation of any code that |
|
66 uses AMD or any modified version of AMD code must cite the |
|
67 Copyright, this License, the Availability note, and ``Used by permission.'' |
|
68 Permission to modify the code and to distribute modified code is granted, |
|
69 provided the Copyright, this License, and the Availability note are |
|
70 retained, and a notice that the code was modified is included. This |
|
71 software was developed with support from the National Science Foundation, |
|
72 and is provided to you free of charge. |
|
73 |
|
74 {\bf Availability:} |
|
75 http://www.cise.ufl.edu/research/sparse/amd |
|
76 |
|
77 {\bf Acknowledgments:} |
|
78 |
|
79 This work was supported by the National Science Foundation, under |
|
80 grants ASC-9111263 and DMS-9223088 and CCR-0203270. |
|
81 The conversion to C, the addition of the elimination tree |
|
82 post-ordering, and the handling of dense rows and columns |
|
83 were done while Davis was on sabbatical at |
|
84 Stanford University and Lawrence Berkeley National Laboratory. |
|
85 |
|
86 %------------------------------------------------------------------------------ |
|
87 \newpage |
|
88 \section{Overview} |
|
89 %------------------------------------------------------------------------------ |
|
90 |
|
91 AMD is a set of routines for preordering a sparse matrix prior to |
|
92 numerical factorization. It uses an approximate minimum degree ordering |
|
93 algorithm \cite{AmestoyDavisDuff96} to find a permutation matrix $\m{P}$ |
|
94 so that the Cholesky factorization $\m{PAP}\tr=\m{LL}\tr$ has fewer |
|
95 (often much fewer) nonzero entries than the Cholesky factorization of $\m{A}$. |
|
96 The algorithm is typically much faster than other ordering methods |
|
97 and minimum degree ordering |
|
98 algorithms that compute an exact degree \cite{GeorgeLiu89}. |
|
99 Some methods, such as approximate deficiency |
|
100 \cite{RothbergEisenstat98} and graph-partitioning based methods |
|
101 \cite{Chaco,KarypisKumar98e,PellegriniRomanAmestoy00,schu:01} |
|
102 can produce better orderings, depending on the matrix. |
|
103 |
|
104 The algorithm starts with an undirected graph representation of a |
|
105 symmetric sparse matrix $\m{A}$. Node $i$ in the graph corresponds to row |
|
106 and column $i$ of the matrix, and there is an edge $(i,j)$ in the graph if |
|
107 $a_{ij}$ is nonzero. |
|
108 The degree of a node is initialized to the number of off-diagonal nonzeros |
|
109 in row $i$, which is the size of the set of nodes |
|
110 adjacent to $i$ in the graph. |
|
111 |
|
112 The selection of a pivot $a_{ii}$ from the diagonal of $\m{A}$ and the first |
|
113 step of Gaussian elimination corresponds to one step of graph elimination. |
|
114 Numerical fill-in causes new nonzero entries in the matrix |
|
115 (fill-in refers to |
|
116 nonzeros in $\m{L}$ that are not in $\m{A}$). |
|
117 Node $i$ is eliminated and edges are added to its neighbors |
|
118 so that they form a clique (or {\em element}). To reduce fill-in, |
|
119 node $i$ is selected as the node of least degree in the graph. |
|
120 This process repeats until the graph is eliminated. |
|
121 |
|
122 The clique is represented implicitly. Rather than listing all the |
|
123 new edges in the graph, a single list of nodes is kept which represents |
|
124 the clique. This list corresponds to the nonzero pattern of the first |
|
125 column of $\m{L}$. As the elimination proceeds, some of these cliques |
|
126 become subsets of subsequent cliques, and are removed. This graph |
|
127 can be stored in place, that is |
|
128 using the same amount of memory as the original graph. |
|
129 |
|
130 The most costly part of the minimum degree algorithm is the recomputation |
|
131 of the degrees of nodes adjacent to the current pivot element. |
|
132 Rather than keep track of the exact degree, the approximate minimum degree |
|
133 algorithm finds an upper bound on the degree that is easier to compute. |
|
134 For nodes of least degree, this bound tends to be tight. Using the |
|
135 approximate degree instead of the exact degree leads to a substantial savings |
|
136 in run time, particularly for very irregularly structured matrices. |
|
137 It has no effect on the quality of the ordering. |
|
138 |
|
139 In the C version of AMD, the elimination phase is followed by an |
|
140 elimination tree post-ordering. This has no effect on fill-in, but |
|
141 reorganizes the ordering so that the subsequent numerical factorization is |
|
142 more efficient. It also includes a pre-processing phase in which nodes of |
|
143 very high degree are removed (without causing fill-in), and placed last in the |
|
144 permutation $\m{P}$. This reduces the run time substantially if the matrix |
|
145 has a few rows with many nonzero entries, and has little effect on the quality |
|
146 of the ordering. |
|
147 The C version operates on the |
|
148 symmetric nonzero pattern of $\m{A}+\m{A}\tr$, so it can be given |
|
149 an unsymmetric matrix, or either the lower or upper triangular part of |
|
150 a symmetric matrix. |
|
151 |
|
152 The two Fortran versions of AMD are essentially identical to two versions of |
|
153 the AMD algorithm discussed in an earlier paper \cite{AmestoyDavisDuff96} |
|
154 (approximate minimum external degree, both with and without aggressive |
|
155 absorption). |
|
156 For a discussion of the long history of the minimum degree algorithm, |
|
157 see \cite{GeorgeLiu89}. |
|
158 |
|
159 %------------------------------------------------------------------------------ |
|
160 \section{Availability} |
|
161 %------------------------------------------------------------------------------ |
|
162 |
|
163 In addition to appearing as a Collected Algorithm of the ACM, |
|
164 AMD Version 1.1 is available at http://www.cise.ufl.edu/research/sparse. |
|
165 The Fortran version is available as the routine {\tt MC47} in HSL |
|
166 (formerly the Harwell Subroutine Library) \cite{hsl:2002}. |
|
167 |
|
168 %------------------------------------------------------------------------------ |
|
169 \section{Using AMD in MATLAB} |
|
170 %------------------------------------------------------------------------------ |
|
171 |
|
172 To use AMD in MATLAB, you must first compile the AMD mexFunction. |
|
173 Just type {\tt make} in the Unix system shell, while in the {\tt AMD} |
|
174 directory. You can also type {\tt amd\_make} in MATLAB, while in the |
|
175 {\tt AMD/MATLAB} directory. Place the {\tt AMD/MATLAB} directory in your |
|
176 MATLAB path. This can be done on any system with MATLAB, including Windows. |
|
177 See Section~\ref{Install} for more details on how to install AMD. |
|
178 |
|
179 The MATLAB statement {\tt p=amd(A)} finds a permutation vector {\tt p} such |
|
180 that the Cholesky factorization {\tt chol(A(p,p))} is typically sparser than |
|
181 {\tt chol(A)}. |
|
182 If {\tt A} is unsymmetric, {\tt amd(A)} is identical to {\tt amd(A+A')} |
|
183 (ignoring numerical cancellation). |
|
184 If {\tt A} is not symmetric positive definite, |
|
185 but has substantial diagonal entries and a mostly symmetric nonzero pattern, |
|
186 then this ordering is also suitable for LU factorization. A partial pivoting |
|
187 threshold may be required to prevent pivots from being selected off the |
|
188 diagonal, such as the statement {\tt [L,U,P] = lu (A (p,p), 0.1)}. |
|
189 Type {\tt help lu} for more details. |
|
190 The statement {\tt [L,U,P,Q] = lu (A (p,p))} in MATLAB 6.5 is |
|
191 not suitable, however, because it uses UMFPACK Version 4.0 and thus |
|
192 does not attempt to select pivots from the diagonal. UMFPACK Version 4.1 |
|
193 uses several strategies, including a symmetric pivoting strategy, and |
|
194 will give you better results if you want to factorize an unsymmetric matrix |
|
195 of this type. Refer to the UMFPACK User Guide for more details, at |
|
196 http://www.cise.ufl.edu/research/sparse/umfpack. |
|
197 |
|
198 The AMD mexFunction is much faster than the built-in MATLAB symmetric minimum |
|
199 degree ordering methods, SYMAMD and SYMMMD. Its ordering quality is |
|
200 comparable to SYMAMD, and better than SYMMMD |
|
201 \cite{DavisGilbertLarimoreNg00pending}. |
|
202 |
|
203 An optional input argument can be used to modify the control parameters for |
|
204 AMD (aggressive absorption, dense row/column handling, and printing of |
|
205 statistics). An optional output |
|
206 argument provides statistics on the ordering, including an analysis of the |
|
207 fill-in and the floating-point operation count for a subsequent factorization. |
|
208 For more details (once AMD is installed), |
|
209 type {\tt help amd} in the MATLAB command window. |
|
210 |
|
211 %------------------------------------------------------------------------------ |
|
212 \section{Using AMD in a C program} |
|
213 \label{Cversion} |
|
214 %------------------------------------------------------------------------------ |
|
215 |
|
216 The C-callable AMD library consists of five user-callable routines and one |
|
217 include file. There are two versions of each of the routines, with |
|
218 {\tt int} and {\tt long} integers. |
|
219 The routines with prefix |
|
220 {\tt amd\_l\_} use {\tt long} integer arguments; the others use |
|
221 {\tt int} integer arguments. If you compile AMD in the standard |
|
222 ILP32 mode (32-bit {\tt int}'s, {\tt long}'s, and pointers) then the versions |
|
223 are essentially identical. You will be able to solve problems using up to 2GB |
|
224 of memory. If you compile AMD in the standard LP64 mode, the size of an |
|
225 {\tt int} remains 32-bits, but the size of a {\tt long} and a pointer both get |
|
226 promoted to 64-bits. |
|
227 |
|
228 The following routines are fully described in Section~\ref{Primary}: |
|
229 |
|
230 \begin{itemize} |
|
231 \item {\tt amd\_order} |
|
232 ({\tt long} version: {\tt amd\_l\_order}) |
|
233 {\footnotesize |
|
234 \begin{verbatim} |
|
235 #include "amd.h" |
|
236 int n, Ap [n+1], Ai [nz], P [n] ; |
|
237 double Control [AMD_CONTROL], Info [AMD_INFO] ; |
|
238 int result = amd_order (n, Ap, Ai, P, Control, Info) ; |
|
239 \end{verbatim} |
|
240 } |
|
241 Computes the approximate minimum degree ordering of an $n$-by-$n$ matrix |
|
242 $\m{A}$. Returns a permutation vector {\tt P} of size {\tt n}, where |
|
243 {\tt P[k] = i} if row and column {\tt i} are the {\tt k}th row and |
|
244 column in the permuted matrix. |
|
245 This routine allocates its own memory of size $1.2e+9n$ integers, |
|
246 where $e$ is the number of nonzeros in $\m{A}+\m{A}\tr$. |
|
247 It computes statistics about the matrix $\m{A}$, such as the symmetry of |
|
248 its nonzero pattern, the number of nonzeros in $\m{L}$, |
|
249 and the number of floating-point operations required for Cholesky and LU |
|
250 factorizations (which are returned in the {\tt Info} array). |
|
251 The user's input matrix is not modified. |
|
252 It returns {\tt AMD\_OK} if successful, {\tt AMD\_INVALID} if |
|
253 the matrix is invalid, or {\tt AMD\_OUT\_OF\_MEMORY} if out of memory. |
|
254 |
|
255 \item {\tt amd\_defaults} |
|
256 ({\tt long} version: {\tt amd\_l\_defaults}) |
|
257 {\footnotesize |
|
258 \begin{verbatim} |
|
259 #include "amd.h" |
|
260 double Control [AMD_CONTROL] ; |
|
261 amd_defaults (Control) ; |
|
262 \end{verbatim} |
|
263 } |
|
264 Sets the default control parameters in the {\tt Control} array. These can |
|
265 then be modified as desired before passing the array to the other AMD |
|
266 routines. |
|
267 |
|
268 \item {\tt amd\_control} |
|
269 ({\tt long} version: {\tt amd\_l\_control}) |
|
270 {\footnotesize |
|
271 \begin{verbatim} |
|
272 #include "amd.h" |
|
273 double Control [AMD_CONTROL] ; |
|
274 amd_control (Control) ; |
|
275 \end{verbatim} |
|
276 } |
|
277 Prints a description of the control parameters, and their values. |
|
278 |
|
279 \item {\tt amd\_info} |
|
280 ({\tt long} version: {\tt amd\_l\_info}) |
|
281 {\footnotesize |
|
282 \begin{verbatim} |
|
283 #include "amd.h" |
|
284 double Info [AMD_INFO] ; |
|
285 amd_info (Info) ; |
|
286 \end{verbatim} |
|
287 } |
|
288 Prints a description of the statistics computed by AMD, and their values. |
|
289 |
|
290 \item {\tt amd\_preprocess} |
|
291 ({\tt long} version: {\tt amd\_l\_info}) |
|
292 {\footnotesize |
|
293 \begin{verbatim} |
|
294 #include "amd.h" |
|
295 int n, Ap [n+1], Ai [nz], Rp [n+1], Ri [nz] ; |
|
296 int result = amd_preprocess (n, Ap, Ai, Rp, Ri) ; |
|
297 \end{verbatim} |
|
298 } |
|
299 Removes duplicate entries and sorts each column of its input $\m{A}$, |
|
300 and returns the nonzero pattern of the transpose, $\m{R}=\m{A}\tr$. |
|
301 It returns the transpose because this is the simplest way to sort |
|
302 a matrix and remove duplicate entries. Either $\m{A}$ or $\m{A}\tr$ |
|
303 can be passed to {\tt amd\_order} with little effect on the |
|
304 ordering (except for minor tie-breaking changes). |
|
305 |
|
306 \end{itemize} |
|
307 |
|
308 The nonzero pattern of the matrix $\m{A}$ is represented in compressed column |
|
309 form. |
|
310 For an $n$-by-$n$ matrix $\m{A}$ with {\tt nz} nonzero entries, the |
|
311 representation consists of two arrays: {\tt Ap} of size {\tt n+1} and {\tt Ai} |
|
312 of size {\tt nz}. The row indices of entries in column {\tt j} are stored in |
|
313 {\tt Ai[Ap[j]} $\ldots$ {\tt Ap[j+1]-1]}. |
|
314 For {\tt amd\_order}, |
|
315 no duplicate row indices may be present, and the row indices in any given |
|
316 column must be sorted in ascending order. |
|
317 The matrix is 0-based, and thus |
|
318 row indices must be in the range {\tt 0} to {\tt n-1}. |
|
319 The first entry {\tt Ap[0]} must be zero. |
|
320 The total number of entries in the matrix is thus {\tt nz = Ap[n]}. |
|
321 |
|
322 The matrix must be square, but it does not need to be symmetric. |
|
323 The {\tt amd\_order} routine constructs the nonzero pattern of |
|
324 $\m{B} = \m{A}+\m{A}\tr$ (without forming $\m{A}\tr$ explicitly), |
|
325 and then orders the matrix $\m{B}$. Thus, either the |
|
326 lower triangular part of $\m{A}$, the upper triangular part, |
|
327 or any combination may be passed. The transpose $\m{A}\tr$ may also be |
|
328 passed to {\tt amd\_order}. |
|
329 The diagonal entries may be present, but are ignored. |
|
330 |
|
331 The input to {\tt amd\_order} must have sorted columns because it uses |
|
332 an in-place algorithm to construct $\m{A}+\m{A}\tr$ without first constructing |
|
333 $\m{A}\tr$. This saves memory, but places an additional restriction on |
|
334 the input. If the input matrix has columns with unsorted and/or duplicate |
|
335 row indices, it is not valid as input to {\tt amd\_order}. To handle this |
|
336 case, the {\tt amd\_preprocess} routine is provided. It sorts, transposes, |
|
337 and removes duplicate entries from its input matrix, returning its result |
|
338 as another compressed-column matrix $\m{R}$ which can then be passed to |
|
339 {\tt amd\_order}. |
|
340 |
|
341 %------------------------------------------------------------------------------ |
|
342 \subsection{Control parameters} |
|
343 \label{control_param} |
|
344 %------------------------------------------------------------------------------ |
|
345 |
|
346 Control parameters are set an optional {\tt Control} array. |
|
347 It is optional in the sense that if |
|
348 a {\tt NULL} pointer is passed for the {\tt Control} input argument, |
|
349 then default control parameters are used. |
|
350 % |
|
351 \begin{itemize} |
|
352 \item {\tt Control[AMD\_DENSE]} (or {\tt Control(1)} in MATLAB): |
|
353 controls the threshold for ``dense'' |
|
354 rows/columns. A dense row/column in $\m{A}+\m{A}\tr$ |
|
355 can cause AMD to spend significant time |
|
356 in ordering the matrix. If {\tt Control[AMD\_DENSE]} $\ge 0$, |
|
357 rows/columns with |
|
358 more than {\tt Control[AMD\_DENSE]} $\sqrt{n}$ entries are ignored during |
|
359 the ordering, and placed last in the output order. The default |
|
360 value of {\tt Control[AMD\_DENSE]} is 10. If negative, no rows/columns |
|
361 are treated as ``dense.'' Rows/columns with 16 or fewer off-diagonal |
|
362 entries are never considered ``dense.'' |
|
363 % |
|
364 \item {\tt Control[AMD\_AGGRESSIVE]} (or {\tt Control(2)} in MATLAB): |
|
365 controls whether or not to use |
|
366 aggressive absorption, in which a prior element is absorbed into the current |
|
367 element if it is a subset of the current element, even if it is not |
|
368 adjacent to the current pivot element (refer to \cite{AmestoyDavisDuff96} |
|
369 for more details). The default value is nonzero, |
|
370 which means that aggressive absorption will be performed. This nearly always |
|
371 leads to a better ordering (because the approximate degrees are more |
|
372 accurate) and a lower execution time. There are cases where it can |
|
373 lead to a slightly worse ordering, however. To turn it off, set |
|
374 {\tt Control[AMD\_AGGRESSIVE]} to 0. |
|
375 % |
|
376 \end{itemize} |
|
377 |
|
378 Statistics are returned in the {\tt Info} array |
|
379 (if {\tt Info} is {\tt NULL}, then no statistics are returned). |
|
380 Refer to {\tt amd.h} file, for more details |
|
381 (14 different statistics are returned, so the list is not included here). |
|
382 |
|
383 %------------------------------------------------------------------------------ |
|
384 \subsection{Sample C program} |
|
385 %------------------------------------------------------------------------------ |
|
386 |
|
387 The following program, {\tt amd\_demo.c}, illustrates the basic use of AMD. |
|
388 See Section~\ref{Synopsis} for a short description |
|
389 of each calling sequence. |
|
390 |
|
391 {\footnotesize |
|
392 \begin{verbatim} |
|
393 #include <stdio.h> |
|
394 #include "amd.h" |
|
395 |
|
396 int n = 5 ; |
|
397 int Ap [ ] = { 0, 2, 6, 10, 12, 14} ; |
|
398 int Ai [ ] = { 0,1, 0,1,2,4, 1,2,3,4, 2,3, 1,4 } ; |
|
399 int P [5] ; |
|
400 |
|
401 int main (void) |
|
402 { |
|
403 int k ; |
|
404 (void) amd_order (n, Ap, Ai, P, (double *) NULL, (double *) NULL) ; |
|
405 for (k = 0 ; k < n ; k++) printf ("P [%d] = %d\n", k, P [k]) ; |
|
406 return (0) ; |
|
407 } |
|
408 \end{verbatim} |
|
409 } |
|
410 |
|
411 The {\tt Ap} and {\tt Ai} arrays represent the binary matrix |
|
412 \[ |
|
413 \m{A} = \left[ |
|
414 \begin{array}{rrrrr} |
|
415 1 & 1 & 0 & 0 & 0 \\ |
|
416 1 & 1 & 1 & 0 & 1 \\ |
|
417 0 & 1 & 1 & 1 & 0 \\ |
|
418 0 & 0 & 1 & 1 & 0 \\ |
|
419 0 & 1 & 1 & 0 & 1 \\ |
|
420 \end{array} |
|
421 \right]. |
|
422 \] |
|
423 The diagonal entries are ignored. |
|
424 % |
|
425 AMD constructs the pattern of $\m{A}+\m{A}\tr$, |
|
426 and returns a permutation vector of $(0, 3, 1, 4, 2)$. |
|
427 % |
|
428 Since the matrix is unsymmetric but with a mostly symmetric nonzero |
|
429 pattern, this would be a suitable permutation for an LU factorization of a |
|
430 matrix with this nonzero pattern and whose diagonal entries are not too small. |
|
431 The program uses default control settings and does not return any statistics |
|
432 about the ordering, factorization, or solution ({\tt Control} and {\tt Info} |
|
433 are both {\tt (double *) NULL}). It also ignores the status value returned by |
|
434 {\tt amd\_order}. |
|
435 |
|
436 More example programs are included with the AMD package. |
|
437 The {\tt amd\_demo.c} program provides a more detailed demo of AMD. |
|
438 Another example is the AMD mexFunction, {\tt amd\_mex.c}. |
|
439 |
|
440 %------------------------------------------------------------------------------ |
|
441 \subsection{A note about zero-sized arrays} |
|
442 %------------------------------------------------------------------------------ |
|
443 |
|
444 AMD uses several user-provided arrays of size {\tt n} or {\tt nz}. |
|
445 Either {\tt n} or {\tt nz} can be zero. |
|
446 If you attempt to {\tt malloc} an array of size zero, |
|
447 however, {\tt malloc} will return a null pointer which AMD will report |
|
448 as invalid. If you {\tt malloc} an array of |
|
449 size {\tt n} or {\tt nz} to pass to AMD, make sure that you handle the |
|
450 {\tt n} = 0 and {\tt nz = 0} cases correctly. |
|
451 |
|
452 %------------------------------------------------------------------------------ |
|
453 \section{Synopsis of C-callable routines} |
|
454 \label{Synopsis} |
|
455 %------------------------------------------------------------------------------ |
|
456 |
|
457 The matrix $\m{A}$ is {\tt n}-by-{\tt n} with {\tt nz} entries. |
|
458 |
|
459 {\footnotesize |
|
460 \begin{verbatim} |
|
461 #include "amd.h" |
|
462 int n, status, Ap [n+1], Ai [nz], P [n], Rp [n+1], Ri [nz] ; |
|
463 double Control [AMD_CONTROL], Info [AMD_INFO] ; |
|
464 amd_defaults (Control) ; |
|
465 status = amd_order (n, Ap, Ai, P, Control, Info) ; |
|
466 amd_control (Control) ; |
|
467 amd_info (Info) ; |
|
468 amd_preprocess (n, Ap, Ai, Rp, Ri) ; |
|
469 \end{verbatim} |
|
470 } |
|
471 |
|
472 The {\tt amd\_l\_*} routines are identical, except that all {\tt int} |
|
473 arguments become {\tt long}: |
|
474 |
|
475 {\footnotesize |
|
476 \begin{verbatim} |
|
477 #include "amd.h" |
|
478 long n, status, Ap [n+1], Ai [nz], P [n], Rp [n+1], Ri [nz] ; |
|
479 double Control [AMD_CONTROL], Info [AMD_INFO] ; |
|
480 amd_l_defaults (Control) ; |
|
481 status = amd_l_order (n, Ap, Ai, P, Control, Info) ; |
|
482 amd_l_control (Control) ; |
|
483 amd_l_info (Info) ; |
|
484 amd_l_preprocess (n, Ap, Ai, Rp, Ri) ; |
|
485 \end{verbatim} |
|
486 } |
|
487 |
|
488 %------------------------------------------------------------------------------ |
|
489 \section{Using AMD in a Fortran program} |
|
490 %------------------------------------------------------------------------------ |
|
491 |
|
492 Two Fortran versions of AMD are provided. The {\tt AMD} routine computes the |
|
493 approximate minimum degree ordering, using aggressive absorption. The |
|
494 {\tt AMDBAR} routine is identical, except that it does not perform aggressive |
|
495 absorption. The {\tt AMD} routine is essentially identical to the HSL |
|
496 routine {\tt MC47B/BD}. |
|
497 Note that earlier versions of the Fortran |
|
498 {\tt AMD} and {\tt AMDBAR} routines included an {\tt IOVFLO} argument, |
|
499 which is no longer present. |
|
500 |
|
501 In contrast to the C version, the Fortran routines require a symmetric |
|
502 nonzero pattern, with no diagonal entries present although the {\tt MC47A/AD} |
|
503 wrapper in HSL allows duplicates, ignores out-of-range entries, and only |
|
504 uses entries from the upper triangular part of the matrix. Although we |
|
505 have an experimental Fortran code for treating ``dense'' rows, the Fortran |
|
506 codes in this release do not treat |
|
507 ``dense'' rows and columns of $\m{A}$ differently, and thus their run time |
|
508 can be high if there are a few dense rows and columns in the matrix. |
|
509 They do not perform a post-ordering of the elimination tree, |
|
510 compute statistics on the ordering, or check the validity of their input |
|
511 arguments. These facilities are provided by {\tt MC47A/AD} and other |
|
512 subroutines from HSL. |
|
513 Only one {\tt integer} |
|
514 version of each Fortran routine is provided. |
|
515 Both Fortran routines overwrite the user's input |
|
516 matrix, in contrast to the C version. |
|
517 % |
|
518 The C version does not return the elimination or assembly tree. |
|
519 The Fortran version returns an assembly tree; |
|
520 refer to the User Guide for details. |
|
521 The following is the syntax of the {\tt AMD} Fortran routine. |
|
522 The {\tt AMDBAR} routine is identical except for the routine name. |
|
523 |
|
524 {\footnotesize |
|
525 \begin{verbatim} |
|
526 INTEGER N, IWLEN, PFREE, NCMPA, IW (IWLEN), PE (N), DEGREE (N), NV (N), |
|
527 $ NEXT (N), LAST (N), HEAD (N), ELEN (N), W (N), LEN (N) |
|
528 CALL AMD (N, PE, IW, LEN, IWLEN, PFREE, NV, NEXT, |
|
529 $ LAST, HEAD, ELEN, DEGREE, NCMPA, W) |
|
530 CALL AMDBAR (N, PE, IW, LEN, IWLEN, PFREE, NV, NEXT, |
|
531 $ LAST, HEAD, ELEN, DEGREE, NCMPA, W) |
|
532 \end{verbatim} |
|
533 } |
|
534 |
|
535 The input matrix is provided to {\tt AMD} and {\tt AMDBAR} |
|
536 in three arrays, {\tt PE}, of size {\tt N}, |
|
537 {\tt LEN}, of size {\tt N}, and {\tt IW}, of size {\tt IWLEN}. The size of |
|
538 {\tt IW} must be at least {\tt NZ+N}. The recommended size is |
|
539 {\tt 1.2*NZ + N}. |
|
540 On input, the indices of nonzero entries in row {\tt I} are stored in {\tt IW}. |
|
541 {\tt PE(I)} is the index in {\tt IW} of the start of row {\tt I}. |
|
542 {\tt LEN(I)} is the number of entries in row {\tt I}. |
|
543 The matrix is 1-based, with row and column indices in the range 1 to {\tt N}. |
|
544 Row {\tt I} is contained in |
|
545 {\tt IW (PE(I)} $\ldots \:$ {\tt PE(I) + LEN(I) - 1)}. |
|
546 The diagonal entries must not be present. The indices within each row must |
|
547 not contain any duplicates, but they need not be sorted. The rows |
|
548 themselves need not be in any particular order, and there may be empty space |
|
549 between the rows. If {\tt LEN(I)} is zero, then there are no off-diagonal |
|
550 entries in row {\tt I}, and {\tt PE(I)} is ignored. The integer |
|
551 {\tt PFREE} defines what part of {\tt IW} contains the user's input matrix, |
|
552 which is held in {\tt IW(1}~$\ldots~\:${\tt PFREE-1)}. |
|
553 The contents of {\tt IW} and {\tt LEN} are undefined on output, |
|
554 and {\tt PE} is modified to contain information about the ordering. |
|
555 |
|
556 As the algorithm proceeds, it modifies the {\tt IW} array, placing the |
|
557 pattern of the partially eliminated matrix in |
|
558 {\tt IW(PFREE} $\ldots \:${\tt IWLEN)}. |
|
559 If this space is exhausted, the space is compressed. |
|
560 The number of compressions performed on the {\tt IW} array is |
|
561 returned in the scalar {\tt NCMPA}. The value of {\tt PFREE} on output is the |
|
562 length of {\tt IW} required for no compressions to be needed. |
|
563 |
|
564 The output permutation is returned in the array {\tt LAST}, of size {\tt N}. |
|
565 If {\tt I=LAST(K)}, then {\tt I} is the {\tt K}th row in the permuted |
|
566 matrix. The inverse permutation is returned in the array {\tt ELEN}, where |
|
567 {\tt K=ELEN(I)} if {\tt I} is the {\tt K}th row in the permuted matrix. |
|
568 On output, the {\tt PE} and {\tt NV} arrays hold the assembly tree, |
|
569 a supernodal elimination tree that represents the relationship between |
|
570 columns of the Cholesky factor $\m{L}$. |
|
571 If {\tt NV(I)} $> 0$, then {\tt I} is a node in the assembly |
|
572 tree, and the parent of {\tt I} is {\tt -PE(I)}. If {\tt I} is a root of |
|
573 the tree, then {\tt PE(I)} is zero. The value of {\tt NV(I)} is the |
|
574 number of entries in the corresponding column of $\m{L}$, including the |
|
575 diagonal. |
|
576 If {\tt NV(I)} is zero, then {\tt I} is a non-principal node that is |
|
577 not in the assembly tree. Node {\tt -PE(I)} is the parent of node {\tt I} |
|
578 in a subtree, the root of which is a node in the assembly tree. All nodes |
|
579 in one subtree belong to the same supernode in the assembly tree. |
|
580 The other size {\tt N} arrays |
|
581 ({\tt DEGREE}, {\tt HEAD}, {\tt NEXT}, and {\tt W}) are used as workspace, |
|
582 and are not defined on input or output. |
|
583 |
|
584 If you want to use a simpler user-interface and compute the elimination |
|
585 tree post-ordering, you should be able to call the C routines {\tt amd\_order} |
|
586 or {\tt amd\_l\_order} from a Fortran program. Just be sure to take into |
|
587 account the 0-based indexing in the {\tt P}, {\tt Ap}, and {\tt Ai} arguments |
|
588 to {\tt amd\_order} and {\tt amd\_l\_order}. A sample interface is provided |
|
589 in the files {\tt AMD/Demo/amd\_f77cross.f} and |
|
590 {\tt AMD/Demo/amd\_f77wrapper.c}. To compile the {\tt amd\_f77cross} program, |
|
591 type {\tt make cross} in the {\tt AMD/Demo} directory. The |
|
592 Fortran-to-C calling conventions are highly non-portable, so this example |
|
593 is not guaranteed to work with your compiler C and Fortran compilers. |
|
594 The output of {\tt amd\_f77cross} is in {\tt amd\_f77cross.out}. |
|
595 |
|
596 %------------------------------------------------------------------------------ |
|
597 \section{Sample Fortran main program} |
|
598 %------------------------------------------------------------------------------ |
|
599 |
|
600 The following program illustrates the basic usage of the Fortran version of AMD. |
|
601 The {\tt AP} and {\tt AI} arrays represent the binary matrix |
|
602 \[ |
|
603 \m{A} = \left[ |
|
604 \begin{array}{rrrrr} |
|
605 1 & 1 & 0 & 0 & 0 \\ |
|
606 1 & 1 & 1 & 0 & 1 \\ |
|
607 0 & 1 & 1 & 1 & 1 \\ |
|
608 0 & 0 & 1 & 1 & 0 \\ |
|
609 0 & 1 & 1 & 0 & 1 \\ |
|
610 \end{array} |
|
611 \right] |
|
612 \] |
|
613 in a conventional 1-based column-oriented form, |
|
614 except that the diagonal entries are not present. |
|
615 The matrix has the same as nonzero pattern of $\m{A}+\m{A}\tr$ in the C |
|
616 program, in Section~\ref{Cversion}. |
|
617 The output permutation is $(4, 1, 3, 5, 2)$. |
|
618 It differs from the permutation returned by the C routine {\tt amd\_order} |
|
619 because a post-order of the elimination tree has not yet been performed. |
|
620 |
|
621 {\footnotesize |
|
622 \begin{verbatim} |
|
623 INTEGER N, NZ, J, K, P, IWLEN, PFREE, NCMPA |
|
624 PARAMETER (N = 5, NZ = 10, IWLEN = 17) |
|
625 INTEGER AP (N+1), AI (NZ), LAST (N), PE (N), LEN (N), ELEN (N), |
|
626 $ IW (IWLEN), DEGREE (N), NV (N), NEXT (N), HEAD (N), W (N) |
|
627 DATA AP / 1, 2, 5, 8, 9, 11/ |
|
628 DATA AI / 2, 1,3,5, 2,4,5, 3, 2,3 / |
|
629 C load the matrix into the AMD workspace |
|
630 DO 10 J = 1,N |
|
631 PE (J) = AP (J) |
|
632 LEN (J) = AP (J+1) - AP (J) |
|
633 10 CONTINUE |
|
634 DO 20 P = 1,NZ |
|
635 IW (P) = AI (P) |
|
636 20 CONTINUE |
|
637 PFREE = NZ + 1 |
|
638 C order the matrix (destroys the copy of A in IW, PE, and LEN) |
|
639 CALL AMD (N, PE, IW, LEN, IWLEN, PFREE, NV, NEXT, LAST, HEAD, |
|
640 $ ELEN, DEGREE, NCMPA, W) |
|
641 DO 60 K = 1, N |
|
642 PRINT 50, K, LAST (K) |
|
643 50 FORMAT ('P (',I2,') = ', I2) |
|
644 60 CONTINUE |
|
645 END |
|
646 \end{verbatim} |
|
647 } |
|
648 |
|
649 The {\tt Demo} directory contains an example of how the C version |
|
650 may be called from a Fortran program, but this is highly non-portable. |
|
651 For this reason, it is placed in the {\tt Demo} directory, not in the |
|
652 primary {\tt Source} directory. |
|
653 |
|
654 %------------------------------------------------------------------------------ |
|
655 \section{Installation} |
|
656 \label{Install} |
|
657 %------------------------------------------------------------------------------ |
|
658 |
|
659 The following discussion assumes you have the {\tt make} program, either in |
|
660 Unix, or in Windows with Cygwin. |
|
661 |
|
662 System-dependent configurations are in the {\tt AMD/Make} |
|
663 directory. You can edit the {\tt Make.include} |
|
664 file in that directory to customize the compilation. The default |
|
665 settings will work on most systems. |
|
666 Sample configuration files are provided |
|
667 for Linux, Sun Solaris, SGI IRIX, IBM AIX, and the DEC/Compaq Alpha. |
|
668 |
|
669 To compile and install the C-callable AMD library, |
|
670 go to the {\tt AMD} directory and type {\tt make}. |
|
671 The library will be placed in {\tt AMD/Lib/libamd.a}. |
|
672 Two demo programs of the AMD ordering routine will be compiled and tested in |
|
673 the {\tt AMD/Demo} directory. |
|
674 The outputs of these demo programs will then be compared with output |
|
675 files in the distribution. The AMD mexFunction for |
|
676 use in MATLAB will also be compiled. If you do not have MATLAB |
|
677 type {\tt make lib} instead. |
|
678 |
|
679 To compile and install the Fortran-callable AMD library, |
|
680 go to the {\tt AMD} directory and type {\tt make fortran}. |
|
681 The library will be placed in {\tt AMD/Lib/libamdf77.a}. |
|
682 A demo program will be compiled and tested in the {\tt AMD/Demo} directory. |
|
683 The output will be compared with an output file in the distribution. |
|
684 |
|
685 Typing {\tt make clean} will remove all but the final compiled libraries |
|
686 and demo programs. Typing {\tt make purge} removes all files not in the |
|
687 original distribution. |
|
688 If you compile AMD and then later change the {\tt Make.include} |
|
689 file or your system-specific configuration file such as {\tt Make.linux}, |
|
690 then you should type {\tt make purge} and then {\tt make} to recompile. |
|
691 |
|
692 Here are the various parameters that you can control in your |
|
693 {\tt Make.include} file: |
|
694 |
|
695 \begin{itemize} |
|
696 \item {\tt CC = } your C compiler, such as {\tt cc}. |
|
697 \item {\tt RANLIB = } your system's {\tt ranlib} program, if needed. |
|
698 \item {\tt CFLAGS = } optimization flags, such as {\tt -O}. |
|
699 \item {\tt LIB = } your libraries, such as {\tt -lm} or {\tt -lblas}. |
|
700 \item {\tt RM =} the command to delete a file. |
|
701 \item {\tt MV =} the command to rename a file. |
|
702 \item {\tt MEX =} the command to compile a MATLAB mexFunction. |
|
703 \item {\tt F77 =} the command to compile a Fortran program (optional). |
|
704 \item {\tt F77FLAGS =} the Fortran compiler flags (optional). |
|
705 \item {\tt F77LIB =} the Fortran libraries (optional). |
|
706 \end{itemize} |
|
707 |
|
708 The {\tt Make.include} includes some definitions regarding the BLAS. |
|
709 This is so that AMD and UMFPACK (which requires AMD) can share |
|
710 the same configuration files. If you wish to use AMD only, then |
|
711 you can ignore any references to the BLAS (the -DNBLAS compile flag). |
|
712 |
|
713 When you compile your program that uses the C-callable AMD library, |
|
714 you need to add the {\tt AMD/Lib/libamd.a} library |
|
715 and you need to tell your compiler to look in the |
|
716 {\tt AMD/Include} directory for include |
|
717 files. To compile a Fortran program that calls the Fortran AMD library, |
|
718 you need to add the {\tt AMD/Lib/libamdf77.a} library. |
|
719 See {\tt AMD/Demo/Makefile} for an example. |
|
720 |
|
721 If all you want to use is the AMD mexFunction in MATLAB, you can skip |
|
722 the use of the {\tt make} command entirely. Simply type |
|
723 {\tt amd\_make} in MATLAB while in the {\tt AMD/MATLAB} directory. |
|
724 This works on any system with MATLAB, including Windows. |
|
725 |
|
726 If you are including AMD as a subset of a larger library and do not want |
|
727 to link the C standard I/O library, or if you simply do not need to use |
|
728 them, you can safely remove the {\tt amd\_control.c} and {\tt amd\_info.c} |
|
729 files. Similarly, if you use default parameters (or define your |
|
730 own {\tt Control} array), then you can exclude the {\tt amd\_defaults.c} |
|
731 file. The {\tt amd\_preprocess.c} file is optional as well, if you |
|
732 can ensure that the input matrix to {\tt amd\_order} is always sorted |
|
733 and has no duplicate entries. |
|
734 Each of these files contains the user-callable routines of the same |
|
735 name. None of these auxiliary routines are directly called by |
|
736 {\tt amd\_order}. |
|
737 The {\tt amd\_dump.c} file contains debugging routines |
|
738 that are neither used nor compiled unless debugging is enabled. |
|
739 The {\tt amd\_internal.h} file must be edited to enable debugging; |
|
740 refer to the instructions in that file. Thus, it too can be excluded |
|
741 if compiled into a larger production program or library. |
|
742 The bare minimum files required to use just {\tt amd\_order} are |
|
743 {\tt amd.h} in the {\tt Include} directory, |
|
744 and |
|
745 {\tt amd\_1.c}, |
|
746 {\tt amd\_2.c}, |
|
747 {\tt amd\_aat.c}, |
|
748 {\tt and\_order.c}, |
|
749 {\tt amd\_postorder.c}, |
|
750 {\tt amd\_post\_tree.c}, |
|
751 {\tt amd\_valid.c}, |
|
752 and |
|
753 {\tt amd\_internal.h}, |
|
754 in the {\tt Source} directory. |
|
755 |
|
756 %------------------------------------------------------------------------------ |
|
757 \newpage |
|
758 \section{The AMD routines} |
|
759 \label{Primary} |
|
760 %------------------------------------------------------------------------------ |
|
761 |
|
762 The file {\tt AMD/Include/amd.h} listed below |
|
763 describes each user-callable routine in the C version of AMD, |
|
764 and gives details on their use. |
|
765 |
|
766 {\footnotesize |
|
767 \begin{verbatim} |
|
768 /* ========================================================================= */ |
|
769 /* === AMD: approximate minimum degree ordering =========================== */ |
|
770 /* ========================================================================= */ |
|
771 |
|
772 /* ------------------------------------------------------------------------- */ |
|
773 /* AMD Version 1.1 (Jan. 21, 2004), Copyright (c) 2004 by Timothy A. Davis, */ |
|
774 /* Patrick R. Amestoy, and Iain S. Duff. See ../README for License. */ |
|
775 /* email: davis@cise.ufl.edu CISE Department, Univ. of Florida. */ |
|
776 /* web: http://www.cise.ufl.edu/research/sparse/amd */ |
|
777 /* ------------------------------------------------------------------------- */ |
|
778 |
|
779 /* AMD finds a symmetric ordering P of a matrix A so that the Cholesky |
|
780 * factorization of P*A*P' has fewer nonzeros and takes less work than the |
|
781 * Cholesky factorization of A. If A is not symmetric, then it performs its |
|
782 * ordering on the matrix A+A'. Two sets of user-callable routines are |
|
783 * provided, one for "int" integers and the other for "long" integers. |
|
784 * |
|
785 * The method is based on the approximate minimum degree algorithm, discussed |
|
786 * in Amestoy, Davis, and Duff, "An approximate degree ordering algorithm", |
|
787 * SIAM Journal of Matrix Analysis and Applications, vol. 17, no. 4, pp. |
|
788 * 886-905, 1996. This package can perform both the AMD ordering (with |
|
789 * aggressive absorption), and the AMDBAR ordering (without aggressive |
|
790 * absorption) discussed in the above paper. This package differs from the |
|
791 * Fortran codes discussed in the paper: |
|
792 * |
|
793 * (1) it can ignore "dense" rows and columns, leading to faster run times |
|
794 * (2) it computes the ordering of A+A' if A is not symmetric |
|
795 * (3) it is followed by a depth-first post-ordering of the assembly tree |
|
796 * (or supernodal elimination tree) |
|
797 * |
|
798 * For historical reasons, the Fortran versions, amd.f and amdbar.f, have |
|
799 * been left (nearly) unchanged. They compute the identical ordering as |
|
800 * described in the above paper. |
|
801 */ |
|
802 |
|
803 #ifndef AMD_H |
|
804 #define AMD_H |
|
805 |
|
806 int amd_order ( /* returns 0 if OK, negative value if error */ |
|
807 int n, /* A is n-by-n. n must be >= 0. */ |
|
808 const int Ap [ ], /* column pointers for A, of size n+1 */ |
|
809 const int Ai [ ], /* row indices of A, of size nz = Ap [n] */ |
|
810 int P [ ], /* output permutation, of size n */ |
|
811 double Control [ ], /* input Control settings, of size AMD_CONTROL */ |
|
812 double Info [ ] /* output Info statistics, of size AMD_INFO */ |
|
813 ) ; |
|
814 |
|
815 long amd_l_order ( /* see above for description of arguments */ |
|
816 long n, |
|
817 const long Ap [ ], |
|
818 const long Ai [ ], |
|
819 long P [ ], |
|
820 double Control [ ], |
|
821 double Info [ ] |
|
822 ) ; |
|
823 |
|
824 /* Input arguments (not modified): |
|
825 * |
|
826 * n: the matrix A is n-by-n. |
|
827 * Ap: an int/long array of size n+1, containing the column pointers of A. |
|
828 * Ai: an int/long array of size nz, containing the row indices of A, |
|
829 * where nz = Ap [n]. |
|
830 * Control: a double array of size AMD_CONTROL, containing control |
|
831 * parameters. Defaults are used if Control is NULL. |
|
832 * |
|
833 * Output arguments (not defined on input): |
|
834 * |
|
835 * P: an int/long array of size n, containing the output permutation. If |
|
836 * row i is the kth pivot row, then P [k] = i. In MATLAB notation, |
|
837 * the reordered matrix is A (P,P). |
|
838 * Info: a double array of size AMD_INFO, containing statistical |
|
839 * information. Ignored if Info is NULL. |
|
840 * |
|
841 * On input, the matrix A is stored in column-oriented form. The row indices |
|
842 * of nonzero entries in column j are stored in Ai [Ap [j] ... Ap [j+1]-1]. |
|
843 * The row indices must appear in ascending order in each column, and there |
|
844 * must not be any duplicate entries. Row indices must be in the range 0 to |
|
845 * n-1. Ap [0] must be zero, and thus nz = Ap [n] is the number of nonzeros |
|
846 * in A. The array Ap is of size n+1, and the array Ai is of size nz = Ap [n]. |
|
847 * The matrix does not need to be symmetric, and the diagonal does not need to |
|
848 * be present (if diagonal entries are present, they are ignored except for |
|
849 * the output statistic Info [AMD_NZDIAG]). The arrays Ai and Ap are not |
|
850 * modified. This form of the Ap and Ai arrays to represent the nonzero |
|
851 * pattern of the matrix A is the same as that used internally by MATLAB. |
|
852 * If you wish to use a more flexible input structure, please see the |
|
853 * umfpack_*_triplet_to_col routines in the UMFPACK package, at |
|
854 * http://www.cise.ufl.edu/research/sparse/umfpack, or use the amd_preprocess |
|
855 * routine discussed below. |
|
856 * |
|
857 * Restrictions: n >= 0. Ap [0] = 0. Ap [j] <= Ap [j+1] for all j in the |
|
858 * range 0 to n-1. nz = Ap [n] >= 0. For all j in the range 0 to n-1, |
|
859 * and for all p in the range Ap [j] to Ap [j+1]-2, Ai [p] < Ai [p+1] must |
|
860 * hold. Ai [0..nz-1] must be in the range 0 to n-1. To avoid integer |
|
861 * overflow, (2.4*nz + 8*n) < INT_MAX / sizeof (int) for must hold for the |
|
862 * "int" version. (2.4*nz + 8*n) < LONG_MAX / sizeof (long) must hold |
|
863 * for the "long" version. Finally, Ai, Ap, and P must not be NULL. If |
|
864 * any of these restrictions are not met, AMD returns AMD_INVALID. |
|
865 * |
|
866 * AMD returns: |
|
867 * |
|
868 * AMD_OK if the matrix is valid and sufficient memory can be allocated to |
|
869 * perform the ordering. |
|
870 * |
|
871 * AMD_OUT_OF_MEMORY if not enough memory can be allocated. |
|
872 * |
|
873 * AMD_INVALID if the input arguments n, Ap, Ai are invalid, or if P is |
|
874 * NULL. |
|
875 * |
|
876 * The AMD routine first forms the pattern of the matrix A+A', and then |
|
877 * computes a fill-reducing ordering, P. If P [k] = i, then row/column i of |
|
878 * the original is the kth pivotal row. In MATLAB notation, the permuted |
|
879 * matrix is A (P,P), except that 0-based indexing is used instead of the |
|
880 * 1-based indexing in MATLAB. |
|
881 * |
|
882 * The Control array is used to set various parameters for AMD. If a NULL |
|
883 * pointer is passed, default values are used. The Control array is not |
|
884 * modified. |
|
885 * |
|
886 * Control [AMD_DENSE]: controls the threshold for "dense" rows/columns. |
|
887 * A dense row/column in A+A' can cause AMD to spend a lot of time in |
|
888 * ordering the matrix. If Control [AMD_DENSE] >= 0, rows/columns |
|
889 * with more than Control [AMD_DENSE] * sqrt (n) entries are ignored |
|
890 * during the ordering, and placed last in the output order. The |
|
891 * default value of Control [AMD_DENSE] is 10. If negative, no |
|
892 * rows/columns are treated as "dense". Rows/columns with 16 or |
|
893 * fewer off-diagonal entries are never considered "dense". |
|
894 * |
|
895 * Control [AMD_AGGRESSIVE]: controls whether or not to use aggressive |
|
896 * absorption, in which a prior element is absorbed into the current |
|
897 * element if is a subset of the current element, even if it is not |
|
898 * adjacent to the current pivot element (refer to Amestoy, Davis, |
|
899 * & Duff, 1996, for more details). The default value is nonzero, |
|
900 * which means to perform aggressive absorption. This nearly always |
|
901 * leads to a better ordering (because the approximate degrees are |
|
902 * more accurate) and a lower execution time. There are cases where |
|
903 * it can lead to a slightly worse ordering, however. To turn it off, |
|
904 * set Control [AMD_AGGRESSIVE] to 0. |
|
905 * |
|
906 * Control [2..4] are not used in the current version, but may be used in |
|
907 * future versions. |
|
908 * |
|
909 * The Info array provides statistics about the ordering on output. If it is |
|
910 * not present, the statistics are not returned. This is not an error |
|
911 * condition. |
|
912 * |
|
913 * Info [AMD_STATUS]: the return value of AMD, either AMD_OK, |
|
914 * AMD_OUT_OF_MEMORY, or AMD_INVALID. |
|
915 * |
|
916 * Info [AMD_N]: n, the size of the input matrix |
|
917 * |
|
918 * Info [AMD_NZ]: the number of nonzeros in A, nz = Ap [n] |
|
919 * |
|
920 * Info [AMD_SYMMETRY]: the symmetry of the matrix A. It is the number |
|
921 * of "matched" off-diagonal entries divided by the total number of |
|
922 * off-diagonal entries. An entry A(i,j) is matched if A(j,i) is also |
|
923 * an entry, for any pair (i,j) for which i != j. In MATLAB notation, |
|
924 * S = spones (A) ; |
|
925 * B = tril (S, -1) + triu (S, 1) ; |
|
926 * symmetry = nnz (B & B') / nnz (B) ; |
|
927 * |
|
928 * Info [AMD_NZDIAG]: the number of entries on the diagonal of A. |
|
929 * |
|
930 * Info [AMD_NZ_A_PLUS_AT]: the number of nonzeros in A+A', excluding the |
|
931 * diagonal. If A is perfectly symmetric (Info [AMD_SYMMETRY] = 1) |
|
932 * with a fully nonzero diagonal, then Info [AMD_NZ_A_PLUS_AT] = nz-n |
|
933 * (the smallest possible value). If A is perfectly unsymmetric |
|
934 * (Info [AMD_SYMMETRY] = 0, for an upper triangular matrix, for |
|
935 * example) with no diagonal, then Info [AMD_NZ_A_PLUS_AT] = 2*nz |
|
936 * (the largest possible value). |
|
937 * |
|
938 * Info [AMD_NDENSE]: the number of "dense" rows/columns of A+A' that were |
|
939 * removed from A prior to ordering. These are placed last in the |
|
940 * output order P. |
|
941 * |
|
942 * Info [AMD_MEMORY]: the amount of memory used by AMD, in bytes. In the |
|
943 * current version, this is 1.2 * Info [AMD_NZ_A_PLUS_AT] + 9*n |
|
944 * times the size of an integer. This is at most 2.4nz + 9n. This |
|
945 * excludes the size of the input arguments Ai, Ap, and P, which have |
|
946 * a total size of nz + 2*n + 1 integers. |
|
947 * |
|
948 * Info [AMD_NCMPA]: the number of garbage collections performed. |
|
949 * |
|
950 * Info [AMD_LNZ]: the number of nonzeros in L (excluding the diagonal). |
|
951 * This is a slight upper bound because mass elimination is combined |
|
952 * with the approximate degree update. It is a rough upper bound if |
|
953 * there are many "dense" rows/columns. The rest of the statistics, |
|
954 * below, are also slight or rough upper bounds, for the same reasons. |
|
955 * The post-ordering of the assembly tree might also not exactly |
|
956 * correspond to a true elimination tree postordering. |
|
957 * |
|
958 * Info [AMD_NDIV]: the number of divide operations for a subsequent LDL' |
|
959 * or LU factorization of the permuted matrix A (P,P). |
|
960 * |
|
961 * Info [AMD_NMULTSUBS_LDL]: the number of multiply-subtract pairs for a |
|
962 * subsequent LDL' factorization of A (P,P). |
|
963 * |
|
964 * Info [AMD_NMULTSUBS_LU]: the number of multiply-subtract pairs for a |
|
965 * subsequent LU factorization of A (P,P), assuming that no numerical |
|
966 * pivoting is required. |
|
967 * |
|
968 * Info [AMD_DMAX]: the maximum number of nonzeros in any column of L, |
|
969 * including the diagonal. |
|
970 * |
|
971 * Info [14..19] are not used in the current version, but may be used in |
|
972 * future versions. |
|
973 */ |
|
974 |
|
975 /* ------------------------------------------------------------------------- */ |
|
976 /* AMD preprocess */ |
|
977 /* ------------------------------------------------------------------------- */ |
|
978 |
|
979 /* amd_preprocess: sorts, removes duplicate entries, and transposes the |
|
980 * nonzero pattern of a column-form matrix A, to obtain the matrix R. |
|
981 * |
|
982 * Alternatively, you can consider this routine as constructing a row-form |
|
983 * matrix from a column-form matrix. Duplicate entries are allowed in A (and |
|
984 * removed in R). The columns of R are sorted. Checks its input A for errors. |
|
985 * |
|
986 * On input, A can have unsorted columns, and can have duplicate entries. |
|
987 * Ap [0] must still be zero, and Ap must be monotonically nondecreasing. |
|
988 * Row indices must be in the range 0 to n-1. |
|
989 * |
|
990 * On output, if this routine returns AMD_OK, then the matrix R is a valid |
|
991 * input matrix for AMD_order. It has sorted columns, with no duplicate |
|
992 * entries in each column. Since AMD_order operates on the matrix A+A', it |
|
993 * can just as easily use A or A', so the transpose has no significant effect |
|
994 * (except for minor tie-breaking, which can lead to a minor effect in the |
|
995 * quality of the ordering). As an example, compare the output of amd_demo.c |
|
996 * and amd_demo2.c. |
|
997 * |
|
998 * This routine transposes A to get R because that's the simplest way to |
|
999 * sort and remove duplicate entries from a matrix. |
|
1000 * |
|
1001 * Allocates 2*n integer work arrays, and free's them when done. |
|
1002 * |
|
1003 * If you wish to call amd_order, but do not know if your matrix has unsorted |
|
1004 * columns or duplicate entries, then you can use the following code, which is |
|
1005 * fairly efficient. amd_order will not allocate any internal matrix until |
|
1006 * it checks that the input matrix is valid, so the method below is memory- |
|
1007 * efficient as well. This code snippet assumes that Rp and Ri are already |
|
1008 * allocated, and are the same size as Ap and Ai respectively. |
|
1009 |
|
1010 result = amd_order (n, p, Ap, Ai, Control, Info) ; |
|
1011 if (result == AMD_INVALID) |
|
1012 { |
|
1013 if (amd_preprocess (n, Ap, Ai, Rp, Ri) == AMD_OK) |
|
1014 { |
|
1015 result = amd_order (n, p, Rp, Ri, Control, Info) ; |
|
1016 } |
|
1017 } |
|
1018 |
|
1019 * amd_preprocess will still return AMD_INVALID if any row index in Ai is out |
|
1020 * of range or if the Ap array is invalid. These errors are not corrected by |
|
1021 * amd_preprocess since they represent a more serious error that should be |
|
1022 * flagged with the AMD_INVALID error code. |
|
1023 */ |
|
1024 |
|
1025 int amd_preprocess |
|
1026 ( |
|
1027 int n, |
|
1028 const int Ap [ ], |
|
1029 const int Ai [ ], |
|
1030 int Rp [ ], |
|
1031 int Ri [ ] |
|
1032 ) ; |
|
1033 |
|
1034 long amd_l_preprocess |
|
1035 ( |
|
1036 long n, |
|
1037 const long Ap [ ], |
|
1038 const long Ai [ ], |
|
1039 long Rp [ ], |
|
1040 long Ri [ ] |
|
1041 ) ; |
|
1042 |
|
1043 /* Input arguments (not modified): |
|
1044 * |
|
1045 * n: the matrix A is n-by-n. |
|
1046 * Ap: an int/long array of size n+1, containing the column pointers of A. |
|
1047 * Ai: an int/long array of size nz, containing the row indices of A, |
|
1048 * where nz = Ap [n]. |
|
1049 * The nonzero pattern of column j of A is in Ai [Ap [j] ... Ap [j+1]-1]. |
|
1050 * Ap [0] must be zero, and Ap [j] <= Ap [j+1] must hold for all j in the |
|
1051 * range 0 to n-1. Row indices in Ai must be in the range 0 to n-1. |
|
1052 * The row indices in any one column need not be sorted, and duplicates |
|
1053 * may exist. |
|
1054 * |
|
1055 * Output arguments (not defined on input): |
|
1056 * |
|
1057 * Rp: an int/long array of size n+1, containing the column pointers of R. |
|
1058 * Ri: an int/long array of size rnz, containing the row indices of R, |
|
1059 * where rnz = Rp [n]. Note that Rp [n] will be less than Ap [n] if |
|
1060 * duplicates appear in A. In general, Rp [n] <= Ap [n]. |
|
1061 * The data structure for R is the same as A, except that each column of |
|
1062 * R contains sorted row indices, and no duplicates appear in any column. |
|
1063 * |
|
1064 * amd_preprocess returns: |
|
1065 * |
|
1066 * AMD_OK if the matrix A is valid and sufficient memory can be allocated |
|
1067 * to perform the preprocessing. |
|
1068 * |
|
1069 * AMD_OUT_OF_MEMORY if not enough memory can be allocated. |
|
1070 * |
|
1071 * AMD_INVALID if the input arguments n, Ap, Ai are invalid, or if Rp or |
|
1072 * Ri are NULL. |
|
1073 */ |
|
1074 |
|
1075 /* ------------------------------------------------------------------------- */ |
|
1076 /* AMD Control and Info arrays */ |
|
1077 /* ------------------------------------------------------------------------- */ |
|
1078 |
|
1079 /* amd_defaults: sets the default control settings */ |
|
1080 void amd_defaults (double Control [ ]) ; |
|
1081 void amd_l_defaults (double Control [ ]) ; |
|
1082 |
|
1083 /* amd_control: prints the control settings */ |
|
1084 void amd_control (double Control [ ]) ; |
|
1085 void amd_l_control (double Control [ ]) ; |
|
1086 |
|
1087 /* amd_info: prints the statistics */ |
|
1088 void amd_info (double Info [ ]) ; |
|
1089 void amd_l_info (double Info [ ]) ; |
|
1090 |
|
1091 #define AMD_CONTROL 5 /* size of Control array */ |
|
1092 #define AMD_INFO 20 /* size of Info array */ |
|
1093 |
|
1094 /* contents of Control */ |
|
1095 #define AMD_DENSE 0 /* "dense" if degree > Control [0] * sqrt (n) */ |
|
1096 #define AMD_AGGRESSIVE 1 /* do aggressive absorption if Control [1] != 0 */ |
|
1097 |
|
1098 /* default Control settings */ |
|
1099 #define AMD_DEFAULT_DENSE 10.0 /* default "dense" degree 10*sqrt(n) */ |
|
1100 #define AMD_DEFAULT_AGGRESSIVE 1 /* do aggressive absorption by default */ |
|
1101 |
|
1102 /* contents of Info */ |
|
1103 #define AMD_STATUS 0 /* return value of amd_order and amd_l_order */ |
|
1104 #define AMD_N 1 /* A is n-by-n */ |
|
1105 #define AMD_NZ 2 /* number of nonzeros in A */ |
|
1106 #define AMD_SYMMETRY 3 /* symmetry of pattern (1 is sym., 0 is unsym.) */ |
|
1107 #define AMD_NZDIAG 4 /* # of entries on diagonal */ |
|
1108 #define AMD_NZ_A_PLUS_AT 5 /* nz in A+A' */ |
|
1109 #define AMD_NDENSE 6 /* number of "dense" rows/columns in A */ |
|
1110 #define AMD_MEMORY 7 /* amount of memory used by AMD */ |
|
1111 #define AMD_NCMPA 8 /* number of garbage collections in AMD */ |
|
1112 #define AMD_LNZ 9 /* approx. nz in L, excluding the diagonal */ |
|
1113 #define AMD_NDIV 10 /* number of fl. point divides for LU and LDL' */ |
|
1114 #define AMD_NMULTSUBS_LDL 11 /* number of fl. point (*,-) pairs for LDL' */ |
|
1115 #define AMD_NMULTSUBS_LU 12 /* number of fl. point (*,-) pairs for LU */ |
|
1116 #define AMD_DMAX 13 /* max nz. in any column of L, incl. diagonal */ |
|
1117 |
|
1118 /* ------------------------------------------------------------------------- */ |
|
1119 /* return values of AMD */ |
|
1120 /* ------------------------------------------------------------------------- */ |
|
1121 |
|
1122 #define AMD_OK 0 /* success */ |
|
1123 #define AMD_OUT_OF_MEMORY -1 /* malloc failed */ |
|
1124 #define AMD_INVALID -2 /* input arguments are not valid */ |
|
1125 |
|
1126 #endif |
|
1127 \end{verbatim} |
|
1128 } |
|
1129 |
|
1130 %------------------------------------------------------------------------------ |
|
1131 \newpage |
|
1132 % References |
|
1133 %------------------------------------------------------------------------------ |
|
1134 |
|
1135 \bibliographystyle{plain} |
|
1136 \bibliography{AMD_UserGuide} |
|
1137 |
|
1138 \end{document} |