changeset 30605:02e4fd516f0a stable

doc: add example about broadcasting and vectorization (bug #61601). * doc/interpreter/vectorize.txi: Add a comparison example of broadcasting and vectorization.
author Arun Giridhar <arungiridhar@gmail.com>
date Fri, 07 Jan 2022 18:00:20 +0900
parents 824c67e2bd66
children 8df8aa9ea5e7 c98873eb2852
files doc/interpreter/vectorize.txi
diffstat 1 files changed, 48 insertions(+), 2 deletions(-) [+]
line wrap: on
line diff
--- a/doc/interpreter/vectorize.txi	Thu Jan 06 14:21:03 2022 -0800
+++ b/doc/interpreter/vectorize.txi	Fri Jan 07 18:00:20 2022 +0900
@@ -409,8 +409,54 @@
       +=  -=  .*=  ./=  .\=  .^=  &=  |=
 @end example
 
-Beware of resorting to broadcasting if a simpler operation will suffice.
-For matrices @var{a} and @var{b}, consider the following:
+Here is a real example of the power of broadcasting.  The Floyd-Warshall
+algorithm is used to calculate the shortest path lengths between every pair of
+vertices in a graph.  A naive implementation for a graph adjacency matrix of
+order @var{n} might look like this:
+
+@example
+@group
+for k = 1:n
+  for i = 1:n
+    for j = 1:n
+      dist(i,j) = min (dist(i,j), dist(i,k) + dist(k,j));
+    endfor
+  endfor
+endfor
+@end group
+@end example
+
+Upon vectorizing the innermost loop, it might look like this:
+
+@example
+@group
+for k = 1:n
+  for i = 1:n
+    dist(i,:) = min (dist(i,:), dist(i,k) + dist(k,:));
+  endfor
+endfor
+@end group
+@end example
+
+Using broadcasting in both directions, it looks like this:
+
+@example
+@group
+for k = 1:n
+  dist = min (dist, dist(:,k) + dist(k,:));
+endfor
+@end group
+@end example
+
+The relative time performance of the three techniques for a given graph with
+100 vertices is 7.3 seconds for the naive code, 87 milliseconds for the
+singly vectorized code, and 1.3 milliseconds for the fully broadcast code.
+For a graph with 1000 vertices, vectorization takes 11.7 seconds while
+broadcasting takes only 1.15 seconds.  Therefore in general it is worth
+writing code with broadcasting semantics for performance.
+
+However, beware of resorting to broadcasting if a simpler operation will
+suffice.  For matrices @var{a} and @var{b}, consider the following:
 
 @example
 @var{c} = sum (permute (@var{a}, [1, 3, 2]) .* permute (@var{b}, [3, 2, 1]), 3);