# HG changeset patch # User Arun Giridhar # Date 1641546020 -32400 # Node ID 02e4fd516f0aa05a407e153a1801e2dbffe73813 # Parent 824c67e2bd66d691d29d3b9f1d101a5233e860b2 doc: add example about broadcasting and vectorization (bug #61601). * doc/interpreter/vectorize.txi: Add a comparison example of broadcasting and vectorization. diff -r 824c67e2bd66 -r 02e4fd516f0a doc/interpreter/vectorize.txi --- a/doc/interpreter/vectorize.txi Thu Jan 06 14:21:03 2022 -0800 +++ b/doc/interpreter/vectorize.txi Fri Jan 07 18:00:20 2022 +0900 @@ -409,8 +409,54 @@ += -= .*= ./= .\= .^= &= |= @end example -Beware of resorting to broadcasting if a simpler operation will suffice. -For matrices @var{a} and @var{b}, consider the following: +Here is a real example of the power of broadcasting. The Floyd-Warshall +algorithm is used to calculate the shortest path lengths between every pair of +vertices in a graph. A naive implementation for a graph adjacency matrix of +order @var{n} might look like this: + +@example +@group +for k = 1:n + for i = 1:n + for j = 1:n + dist(i,j) = min (dist(i,j), dist(i,k) + dist(k,j)); + endfor + endfor +endfor +@end group +@end example + +Upon vectorizing the innermost loop, it might look like this: + +@example +@group +for k = 1:n + for i = 1:n + dist(i,:) = min (dist(i,:), dist(i,k) + dist(k,:)); + endfor +endfor +@end group +@end example + +Using broadcasting in both directions, it looks like this: + +@example +@group +for k = 1:n + dist = min (dist, dist(:,k) + dist(k,:)); +endfor +@end group +@end example + +The relative time performance of the three techniques for a given graph with +100 vertices is 7.3 seconds for the naive code, 87 milliseconds for the +singly vectorized code, and 1.3 milliseconds for the fully broadcast code. +For a graph with 1000 vertices, vectorization takes 11.7 seconds while +broadcasting takes only 1.15 seconds. Therefore in general it is worth +writing code with broadcasting semantics for performance. + +However, beware of resorting to broadcasting if a simpler operation will +suffice. For matrices @var{a} and @var{b}, consider the following: @example @var{c} = sum (permute (@var{a}, [1, 3, 2]) .* permute (@var{b}, [3, 2, 1]), 3);