# HG changeset patch
# User Arun Giridhar <arungiridhar@gmail.com>
# Date 1641546020 -32400
# Node ID 02e4fd516f0aa05a407e153a1801e2dbffe73813
# Parent  824c67e2bd66d691d29d3b9f1d101a5233e860b2
doc: add example about broadcasting and vectorization (bug #61601).

* doc/interpreter/vectorize.txi: Add a comparison example of broadcasting
 and vectorization.

diff -r 824c67e2bd66 -r 02e4fd516f0a doc/interpreter/vectorize.txi
--- a/doc/interpreter/vectorize.txi	Thu Jan 06 14:21:03 2022 -0800
+++ b/doc/interpreter/vectorize.txi	Fri Jan 07 18:00:20 2022 +0900
@@ -409,8 +409,54 @@
       +=  -=  .*=  ./=  .\=  .^=  &=  |=
 @end example
 
-Beware of resorting to broadcasting if a simpler operation will suffice.
-For matrices @var{a} and @var{b}, consider the following:
+Here is a real example of the power of broadcasting.  The Floyd-Warshall
+algorithm is used to calculate the shortest path lengths between every pair of
+vertices in a graph.  A naive implementation for a graph adjacency matrix of
+order @var{n} might look like this:
+
+@example
+@group
+for k = 1:n
+  for i = 1:n
+    for j = 1:n
+      dist(i,j) = min (dist(i,j), dist(i,k) + dist(k,j));
+    endfor
+  endfor
+endfor
+@end group
+@end example
+
+Upon vectorizing the innermost loop, it might look like this:
+
+@example
+@group
+for k = 1:n
+  for i = 1:n
+    dist(i,:) = min (dist(i,:), dist(i,k) + dist(k,:));
+  endfor
+endfor
+@end group
+@end example
+
+Using broadcasting in both directions, it looks like this:
+
+@example
+@group
+for k = 1:n
+  dist = min (dist, dist(:,k) + dist(k,:));
+endfor
+@end group
+@end example
+
+The relative time performance of the three techniques for a given graph with
+100 vertices is 7.3 seconds for the naive code, 87 milliseconds for the
+singly vectorized code, and 1.3 milliseconds for the fully broadcast code.
+For a graph with 1000 vertices, vectorization takes 11.7 seconds while
+broadcasting takes only 1.15 seconds.  Therefore in general it is worth
+writing code with broadcasting semantics for performance.
+
+However, beware of resorting to broadcasting if a simpler operation will
+suffice.  For matrices @var{a} and @var{b}, consider the following:
 
 @example
 @var{c} = sum (permute (@var{a}, [1, 3, 2]) .* permute (@var{b}, [3, 2, 1]), 3);