changeset 31421:21962c678648

mean.m: more Matlab compatibility (bugs #58116, #50571). Background: This improved function was released with the Statistics Package. * scripts/statistics/mean.m: now accepts vector dimensions and options to handle `NaN` values. The option `"a"` (arithmetic mean), `"g"` (geometric mean), and `"h"` (harmonic mean) are no longer accepted, only the arithmetic mean is computed. For the geometric and harmonic mean, please use respective functions `geomean` and `harmmean` from the Octave Statistics package. * etc/NEWS.8.md: announce changes. Co-authored-by: Kai T. Ohlhus <k.ohlhus@gmail.com>
author Andreas Bertsatos <abertsatos@biol.uoa.gr>
date Sun, 13 Nov 2022 01:44:54 +0900
parents e98fb9b4be86
children ff5bac1b9372
files etc/NEWS.8.md scripts/statistics/mean.m
diffstat 2 files changed, 227 insertions(+), 118 deletions(-) [+]
line wrap: on
line diff
--- a/etc/NEWS.8.md	Sat Nov 12 14:16:14 2022 +0100
+++ b/etc/NEWS.8.md	Sun Nov 13 01:44:54 2022 +0900
@@ -33,7 +33,7 @@
   Previously, the product of two large primes took much longer to factorize
   than highly composite inputs.
 
-- `Refine` option is now implemented in functions `ode45`, `ode23`, 
+- `Refine` option is now implemented in functions `ode45`, `ode23`,
   and `ode23s`.
 
 ### Graphical User Interface
@@ -70,6 +70,12 @@
 - `quadgk` now stops iterating when `error <= tolerance` while the previous
   condition was `error < tolerance`.
 
+- `mean` now accepts vector dimensions and options to handle `NaN` values.
+  The option `"a"` (arithmetic mean), `"g"` (geometric mean), and `"h"`
+  (harmonic mean) are no longer accepted, only the arithmetic mean is computed.
+  For the geometric and harmonic mean, please use respective functions
+  `geomean` and `harmmean` from the Octave Statistics package.
+
 - `var` and `std` now optionally output a second argument containing the mean
   or weighted mean.
 
@@ -85,9 +91,9 @@
   Object      | Property         | Default State
   ------------|------------------|------------
   `figure`    | `"dockcontrols"` | `"on"`
-  
+
 - `ode45`, `ode23`, and `ode23s` have improved results for options `Events`,
-  `OutputFcn`, and `Refine`, along with corrected orientation of struct 
+  `OutputFcn`, and `Refine`, along with corrected orientation of struct
   outputs.
 
 ### Alphabetical list of new functions added in Octave 8
--- a/scripts/statistics/mean.m	Sat Nov 12 14:16:14 2022 +0100
+++ b/scripts/statistics/mean.m	Sun Nov 13 01:44:54 2022 +0900
@@ -25,14 +25,17 @@
 
 ## -*- texinfo -*-
 ## @deftypefn  {} {@var{y} =} mean (@var{x})
+## @deftypefnx {} {@var{y} =} mean (@var{x}, "all")
 ## @deftypefnx {} {@var{y} =} mean (@var{x}, @var{dim})
-## @deftypefnx {} {@var{y} =} mean (@var{x}, @var{opt})
-## @deftypefnx {} {@var{y} =} mean (@var{x}, @var{dim}, @var{opt})
+## @deftypefnx {} {@var{y} =} mean (@var{x}, @var{vecdim})
 ## @deftypefnx {} {@var{y} =} mean (@dots{}, @var{outtype})
-## Compute the mean of the elements of the vector @var{x}.
+## @deftypefnx {} {@var{y} =} mean (@dots{}, @var{nanflag})
+## Compute the mean of the elements of @var{x}.
 ##
-## The mean is defined as
-##
+## @itemize
+## @item
+## If @var{x} is a vector, then @code{mean(@var{x})} returns the
+## mean of the elements in @var{x} defined as
 ## @tex
 ## $$ {\rm mean}(x) = \bar{x} = {1\over N} \sum_{i=1}^N x_i $$
 ## where $N$ is the number of elements of @var{x}.
@@ -48,136 +51,187 @@
 ## where @math{N} is the length of the @var{x} vector.
 ##
 ## @end ifnottex
-## If @var{x} is a matrix, compute the mean for each column and return them
-## in a row vector.
 ##
-## If the optional argument @var{dim} is given, operate along this dimension.
-##
-## The optional argument @var{opt} selects the type of mean to compute.
-## The following options are recognized:
+## @item
+## If @var{x} is a matrix, then @code{mean(@var{x})} returns a row vector
+## with the mean of each columns in @var{x}.
 ##
-## @table @asis
-## @item @qcode{"a"}
-## Compute the (ordinary) arithmetic mean.  [default]
+## @item
+## If @var{x} is a multidimensional array, then @code{mean(@var{x})}
+## operates along the first nonsingleton dimension of @var{x}.
+## @end itemize
 ##
-## @item @qcode{"g"}
-## Compute the geometric mean.
+## @code{mean(@var{x}, "all")} returns the mean of all the elements in @var{x}.
 ##
-## @item @qcode{"h"}
-## Compute the harmonic mean.
-## @end table
+## @code{mean(@var{x}, @var{dim})} returns the mean along the
+## operating dimension @var{dim} of @var{x}.
 ##
-## The optional argument @var{outtype} selects the data type of the
-## output value.  The following options are recognized:
-##
-## @table @asis
-## @item @qcode{"default"}
-## Output will be of class double unless @var{x} is of class single,
-## in which case the output will also be single.
+## @code{mean(@var{x}, @var{vecdim})} returns the mean over the
+## dimensions specified in the vector @var{vecdim}.  For example, if @var{x}
+## is a 2-by-3-by-4 array, then @code{mean(@var{x}, [1 2])} returns a 1-by-4
+## array.  Each element of the output array is the mean of the elements on
+## the corresponding page of @var{x}.  NOTE! @var{vecdim} MUST index at least
+## N-2 dimensions of @var{x}, where @code{N = length (size (@var{x}))} and
+## N < 8.  If @var{vecdim} indexes all dimensions of @var{x}, then it is
+## equivalent to @code{mean(@var{x}, "all")}.
 ##
-## @item @qcode{"double"}
-## Output will be of class double.
+## @code{mean(@dots{}, @var{outtype})} returns the mean with a specified data
+## type, using any of the input arguments in the previous syntaxes.
+## @var{outtype} can be "default", "double", or "native".
 ##
-## @item @qcode{"native"}
-## Output will be the same class as @var{x} unless @var{x} is of class
-## logical in which case it returns of class double.
+## @code{mean(@dots{}, @var{nanflag})} specifies whether to exclude NaN values
+## from the calculation, using any of the input argument combinations in
+## previous syntaxes.  By default, NaN values are included in the calculation
+## (@var{nanflag} has the value "includenan").  To exclude NaN values, set the
+## value of @var{nanflag} to "omitnan".
 ##
-## @end table
-##
-## Both @var{dim} and @var{opt} are optional.  If both are supplied, either
-## may appear first.
 ## @seealso{median, mode}
 ## @end deftypefn
 
 function y = mean (x, varargin)
 
-  if (nargin < 1 || nargin > 4)
+  if (nargin < 1 || nargin > 4 || any (cellfun (@isnumeric, varargin(2:end))))
+    print_usage ();
+  endif
+
+  ## Check all char arguments.
+  all_flag = false;
+  omitnan = false;
+  outtype = "default";
+
+  for i = 1:length (varargin)
+    if (ischar (varargin{i}))
+      switch (varargin{i})
+        case "all"
+          all_flag = true;
+        case "omitnan"
+          omitnan = true;
+        case "includenan"
+          omitnan = false;
+        case {"default", "double", "native"}
+          outtype = varargin{i};
+        otherwise
+          print_usage ();
+      endswitch
+    endif
+  endfor
+  varargin(cellfun (@ischar, varargin)) = [];
+
+  if (((length (varargin) == 1) && ! (isnumeric (varargin{1}))) ...
+      || (length (varargin) > 1))
     print_usage ();
   endif
 
   if (! (isnumeric (x) || islogical (x)))
-    error ("mean: X must be a numeric vector or matrix");
-  endif
-  nd = ndims (x);
-  sz = size (x);
-
-  ## We support too many options...
-
-  ## If OUTTYPE is set, it must be the last option.  If DIM and
-  ## MEAN_TYPE exist, they must be the first two options
-
-  out_type = "default";
-  if (numel (varargin))
-    maybe_out_type = tolower (varargin{end});
-    if (any (strcmpi (maybe_out_type, {"default", "double", "native"})))
-      out_type = maybe_out_type;
-      varargin(end) = [];
-    endif
-  endif
-
-  scalars = cellfun (@isscalar, varargin);
-  chars = cellfun (@ischar, varargin);
-  numerics = cellfun (@isnumeric, varargin);
-
-  dim_mask = numerics & scalars;
-  mean_type_mask = chars & scalars;
-  if (! all (dim_mask | mean_type_mask))
-    print_usage ();
+    error ("mean: X must be either a numeric or boolean vector or matrix");
   endif
 
-  switch (nnz (dim_mask))
-    case 0 # Find the first non-singleton dimension
-      (dim = find (sz > 1, 1)) || (dim = 1);
-    case 1
-      dim = varargin{dim_mask};
-      if (dim != fix (dim) || dim < 1)
-        error ("mean: DIM must be an integer and a valid dimension");
+  if (length (varargin) == 0)
+
+    ## Single numeric input argument, no dimensions given.
+    if (all_flag)
+      n = length (x(:));
+      if (omitnan)
+        n = length (x(! isnan (x)));
+        x(isnan (x)) = 0;
+      endif
+      y = sum (x(:), 1) ./ n;
+    else
+      sz = size (x);
+      dim = find (sz > 1, 1);
+      if length (dim) == 0
+        dim = 1;
+      endif
+      n = size (x, dim);
+      if (omitnan)
+        n = sum (! isnan (x), dim);
+        x(isnan (x)) = 0;
       endif
-    otherwise
-      print_usage ();
-  endswitch
+      y = sum (x, dim) ./ n;
+    endif
+
+  else
+
+    ## Two numeric input arguments, dimensions given.  Note scalar is vector!
+    vecdim = varargin{1};
+    if (! (isvector (vecdim) && all (vecdim)) || any (rem (vecdim, 1)))
+      error ("mean: Dimension must be a positive integer scalar or vector");
+    endif
+
+    if (isscalar (vecdim))
+
+      n = size (x, vecdim);
+      if (omitnan)
+        n = sum (! isnan (x), vecdim);
+        x(isnan (x)) = 0;
+      endif
+      y = sum (x, vecdim) ./ n;
+
+    else
+
+      sz = size (x);
+      ndims = length (sz);
+      misdim = [1:ndims];
 
-  switch (nnz (mean_type_mask))
-    case 0
-      mean_type = "a";
-    case 1
-      mean_type = varargin{mean_type_mask};
-    otherwise
-      print_usage ();
-  endswitch
+      ## keep remaining dimensions
+      for i = 1:length (vecdim)
+        misdim(misdim == vecdim(i)) = [];
+      endfor
+
+      switch (length (misdim))
+        ## if all dimensions are given, compute x(:)
+        case 0
+          n = length (x(:));
+          if (omitnan)
+            n = length (x(! isnan (x)));
+            x(isnan (x)) = 0;
+          endif
+          y = sum (x(:), 1) ./ n;
 
-  ## The actual mean computation
-  n = size (x, dim);
-  switch (mean_type)
-    case "a"
-      y = sum (x, dim) / n;
-    case "g"
-      if (! any (x(:) < 0))
-        y = exp (sum (log (x), dim) ./ n);
-      else
-        error ("mean: X must not contain any negative values");
-      endif
-    case "h"
-      y = n ./ sum (1 ./ x, dim);
-    otherwise
-      error ("mean: mean type '%s' not recognized", mean_type);
-  endswitch
+        ## for 1 dimension left, return column vector
+        case 1
+          x = permute (x, [misdim, vecdim]);
+          for i = 1:size (x, 1)
+            x_vec = x(i,:,:,:,:,:,:)(:);
+            if (omitnan)
+              x_vec = x_vec(! isnan (x_vec));
+            endif
+            y(i) = sum (x_vec, 1) ./ length (x_vec);
+          endfor
+
+        ## for 2 dimensions left, return matrix
+        case 2
+          x = permute (x, [misdim, vecdim]);
+          for i = 1:size (x, 1)
+            for j = 1:size (x, 2)
+              x_vec = x(i,j,:,:,:,:,:)(:);
+              if (omitnan)
+                x_vec = x_vec(! isnan (x_vec));
+              endif
+              y(i,j) = sum (x_vec, 1) ./ length (x_vec);
+            endfor
+          endfor
+
+        ## for more that 2 dimensions left, print usage
+        otherwise
+          error ("vecdim must index at least N-2 dimensions of X");
+      endswitch
+    endif
+
+  endif
 
   ## Convert output as requested
-  switch (out_type)
+  switch (outtype)
     case "default"
       ## do nothing, the operators already do the right thing
     case "double"
       y = double (y);
     case "native"
-      if (islogical (x))
-        ## ignore it, return double anyway
-      else
+      if (! islogical (x))
         y = cast (y, class (x));
       endif
     otherwise
-      ## this should have been filtered out during input check, but...
-      error ("mean: OUTTYPE '%s' not recognized", out_type);
+      error ("mean: OUTTYPE '%s' not recognized", outtype);
   endswitch
 
 endfunction
@@ -191,28 +245,26 @@
 %! assert (mean (y), 0);
 %! assert (mean (z), [0, 10]);
 
-## Test small numbers
-%!assert (mean (repmat (0.1,1,1000), "g"), 0.1, 20*eps)
-
 %!assert (mean (magic (3), 1), [5, 5, 5])
 %!assert (mean (magic (3), 2), [5; 5; 5])
-%!assert (mean ([2 8], "g"), 4)
-%!assert (mean ([4 4 2], "h"), 3)
 %!assert (mean (logical ([1 0 1 1])), 0.75)
 %!assert (mean (single ([1 0 1 1])), single (0.75))
 %!assert (mean ([1 2], 3), [1 2])
 
 ## Test input validation
 %!error <Invalid call to mean.  Correct usage is> mean ()
+%!error <Invalid call to mean.  Correct usage is> mean (1, 2, 3)
 %!error <Invalid call to mean.  Correct usage is> mean (1, 2, 3, 4)
-%!error <X must be a numeric> mean ({1:5})
-%!error <Invalid call to mean.  Correct usage is> mean (1, 2, 3)
-%!error <Invalid call to mean.  Correct usage is> mean (1, ones (2,2))
-%!error <DIM must be an integer> mean (1, 1.5)
-%!error <DIM must be .* a valid dimension> mean (1, 0)
-%!error <X must not contain any negative values> mean ([1 -1], "g")
-%!error <mean type 'b' not recognized> mean (1, "b")
+%!error <Invalid call to mean.  Correct usage is> mean (1, "all", 3)
+%!error <Invalid call to mean.  Correct usage is> mean (1, "b")
 %!error <Invalid call to mean.  Correct usage is> mean (1, 1, "foo")
+%!error <X must be either a numeric or boolean> mean ({1:5})
+%!error <X must be either a numeric or boolean> mean ("char")
+%!error <Dimension must be a positive integer> mean (1, ones (2,2))
+%!error <Dimension must be a positive integer> mean (1, 1.5)
+%!error <Dimension must be a positive integer> mean (1, 0)
+%!error <vecdim must index at least N-2 dimensions of X> ...
+%!  mean (repmat ([1:20;6:25], [5 2 6 3 5]), [1 2])
 
 ## Test outtype option
 %!test
@@ -240,3 +292,54 @@
 %! assert (mean (in, "default"), mean (in));
 %! assert (mean (in, "default"), out);
 %! assert (mean (in, "native"), out);  # logical ignores native option
+
+## Test single input and optional arguments "all", DIM, "omitnan")
+%!test
+%! x = [-10:10];
+%! y = [x;x+5;x-5];
+%! assert (mean (x), 0);
+%! assert (mean (y, 2), [0, 5, -5]');
+%! assert (mean (y, "all"), 0);
+%! y(2,4) = NaN;
+%! assert (mean (y', "omitnan"), [0 5.35 -5]);
+%! z = y + 20;
+%! assert (mean (z, "all"), NaN);
+%! m = [20 NaN 15];
+%! assert (mean (z'), m);
+%! assert (mean (z', "includenan"), m);
+%! m = [20 25.35 15];
+%! assert (mean (z', "omitnan"), m);
+%! assert (mean (z, 2, "omitnan"), m');
+%! assert (mean (z, 2, "native", "omitnan"), m');
+%! assert (mean (z, 2, "omitnan", "native"), m');
+
+## Test boolean input
+%!test
+%! assert (mean (true, "all"), 1);
+%! assert (mean (false), 0);
+%! assert (mean ([true false true]), 2/3, 4e-14);
+%! assert (mean ([true false true], 1), [1 0 1]);
+%! assert (mean ([true false NaN], 1), [1 0 NaN]);
+%! assert (mean ([true false NaN], 2), NaN);
+%! assert (mean ([true false NaN], 2, "omitnan"), 0.5);
+%! assert (mean ([true false NaN], 2, "omitnan", "native"), 0.5);
+
+## Test dimension indexing with vecdim in n-dimensional arrays
+%!test
+%! x = repmat ([1:20;6:25], [5 2 6 3]);
+%! assert (size (mean (x, [3 2])), [10 3]);
+%! assert (size (mean (x, [1 2])), [6 3]);
+%! assert (size (mean (x, [1 2 4])), [1 6]);
+%! assert (size (mean (x, [1 4 3])), [1 40]);
+%! assert (size (mean (x, [1 2 3 4])), [1 1]);
+
+## Test results with vecdim in n-dimensional arrays and "omitnan"
+%!test
+%! x = repmat ([1:20;6:25], [5 2 6 3]);
+%! m = repmat ([10.5;15.5], [5,3]);
+%! assert (mean (x, [3 2]), m, 4e-14);
+%! x(2,5,6,3) = NaN;
+%! m(2,3) = NaN;
+%! assert (mean (x, [3 2]), m, 4e-14);
+%! m(2,3) = 15.52301255230125;
+%! assert (mean (x, [3 2], "omitnan"), m, 4e-14);