Mercurial > octave

--- a/doc/interpreter/vectorize.txi	Fri Jan 07 17:56:00 2022 +0900
+++ b/doc/interpreter/vectorize.txi	Fri Jan 07 18:05:24 2022 +0900
@@ -409,8 +409,54 @@
       +=  -=  .*=  ./=  .\=  .^=  &=  |=
 @end example

-Beware of resorting to broadcasting if a simpler operation will suffice.
-For matrices @var{a} and @var{b}, consider the following:
+Here is a real example of the power of broadcasting.  The Floyd-Warshall
+algorithm is used to calculate the shortest path lengths between every pair of
+vertices in a graph.  A naive implementation for a graph adjacency matrix of
+order @var{n} might look like this:
+
+@example
+@group
+for k = 1:n
+  for i = 1:n
+    for j = 1:n
+      dist(i,j) = min (dist(i,j), dist(i,k) + dist(k,j));
+    endfor
+  endfor
+endfor
+@end group
+@end example
+
+Upon vectorizing the innermost loop, it might look like this:
+
+@example
+@group
+for k = 1:n
+  for i = 1:n
+    dist(i,:) = min (dist(i,:), dist(i,k) + dist(k,:));
+  endfor
+endfor
+@end group
+@end example
+
+Using broadcasting in both directions, it looks like this:
+
+@example
+@group
+for k = 1:n
+  dist = min (dist, dist(:,k) + dist(k,:));
+endfor
+@end group
+@end example
+
+The relative time performance of the three techniques for a given graph with
+100 vertices is 7.3 seconds for the naive code, 87 milliseconds for the
+singly vectorized code, and 1.3 milliseconds for the fully broadcast code.
+For a graph with 1000 vertices, vectorization takes 11.7 seconds while
+broadcasting takes only 1.15 seconds.  Therefore in general it is worth
+writing code with broadcasting semantics for performance.
+
+However, beware of resorting to broadcasting if a simpler operation will
+suffice.  For matrices @var{a} and @var{b}, consider the following:

 @example
 @var{c} = sum (permute (@var{a}, [1, 3, 2]) .* permute (@var{b}, [3, 2, 1]), 3);