# HG changeset patch # User Rik # Date 1294118588 28800 # Node ID e151e23f73bc5eed5a384fb3b1344ba86e5cd0b3 # Parent 20f53b3a558f9232276fbb58d00489731962b6ff Overhaul base statistics functions and documentation of same. Add or improve input validation. Add input validation tests. Add functional tests. Improve or re-write documentation strings. diff -r 20f53b3a558f -r e151e23f73bc doc/ChangeLog --- a/doc/ChangeLog Mon Jan 03 18:36:49 2011 -0800 +++ b/doc/ChangeLog Mon Jan 03 21:23:08 2011 -0800 @@ -1,3 +1,10 @@ +2011-01-03 Rik + + * interpreter/octave.texi: Add new menu item "Correlation and + Regression Analysis" + * interpreter/stats.txi: Update documentation chapter on + basic statistics. + 2010-12-31 Rik * interpreter/expr.txi: Add isindex function to documentation diff -r 20f53b3a558f -r e151e23f73bc doc/interpreter/octave.texi --- a/doc/interpreter/octave.texi Mon Jan 03 18:36:49 2011 -0800 +++ b/doc/interpreter/octave.texi Mon Jan 03 21:23:08 2011 -0800 @@ -658,7 +658,7 @@ * Basic Statistical Functions:: * Statistical Plots:: * Tests:: -* Models:: +* Correlation and Regression Analysis:: * Distributions:: * Random Number Generation:: diff -r 20f53b3a558f -r e151e23f73bc doc/interpreter/stats.txi --- a/doc/interpreter/stats.txi Mon Jan 03 18:36:49 2011 -0800 +++ b/doc/interpreter/stats.txi Mon Jan 03 21:23:08 2011 -0800 @@ -21,12 +21,12 @@ @chapter Statistics Octave has support for various statistical methods. This includes -basic descriptive statistics, statistical tests, random number generation, -and much more. +basic descriptive statistics, probability distributions, statistical tests, +random number generation, and much more. -The functions that analyze data all assume that multi-dimensional data +The functions that analyze data all assume that multidimensional data is arranged in a matrix where each row is an observation, and each -column is a variable. So, the matrix defined by +column is a variable. Thus, the matrix defined by @example @group @@ -42,91 +42,101 @@ different arrangements. It should be noted that the statistics functions don't test for data -containing NaN, NA, or Inf. Such values need to be handled explicitly. +containing NaN, NA, or Inf. These values need to be detected and dealt +with explicitly. See @ref{doc-isnan,,isnan}, @ref{doc-isna,,isna}, +@ref{doc-isinf,,isinf}, @ref{doc-isfinite,,isfinite}. @menu * Descriptive Statistics:: * Basic Statistical Functions:: * Statistical Plots:: +* Correlation and Regression Analysis:: +* Distributions:: * Tests:: -* Models:: -* Distributions:: * Random Number Generation:: @end menu @node Descriptive Statistics @section Descriptive Statistics -Octave can compute various statistics such as the moments of a data set. +One principal goal of descriptive statistics is to represent the essence of a +large data set concisely. Octave provides the mean, median, and mode functions +which all summarize a data set with just a single number corresponding to +the central tendency of the data. @DOCSTRING(mean) @DOCSTRING(median) -@DOCSTRING(quantile) +@DOCSTRING(mode) -@DOCSTRING(prctile) +Using just one number, such as the mean, to represent an entire data set may +not give an accurate picture of the data. One way to characterize the fit is +to measure the dispersion of the data. Octave provides several functions for +measuring dispersion. + +@DOCSTRING(range) + +@DOCSTRING(iqr) @DOCSTRING(meansq) @DOCSTRING(std) +In addition to knowing the size of a dispersion it is useful to know the shape +of the data set. For example, are data points massed to the left or right +of the mean? Octave provides several common measures to describe the shape +of the data set. Octave can also calculate moments allowing arbitrary shape +measures to be developed. + @DOCSTRING(var) -@DOCSTRING(mode) - -@DOCSTRING(cov) - -@DOCSTRING(cor) - -@DOCSTRING(corrcoef) +@DOCSTRING(skewness) @DOCSTRING(kurtosis) -@DOCSTRING(skewness) +@DOCSTRING(moment) + +A summary view of a data set can be generated quickly with the +@code{statistics} function. @DOCSTRING(statistics) -@DOCSTRING(moment) - @node Basic Statistical Functions @section Basic Statistical Functions -Octave also supports various helpful statistical functions. - -@DOCSTRING(mahalanobis) +Octave supports various helpful statistical functions. Many are useful as +initial steps to prepare a data set for further analysis. Others provide +different measures from those of the basic descriptive statistics. @DOCSTRING(center) @DOCSTRING(studentize) -@DOCSTRING(nchoosek) +@DOCSTRING(histc) + +@DOCSTRING(cut) -@DOCSTRING(histc) +@c FIXME: really want to put a reference to unique here +@c @DOCSTRING(values) + +@DOCSTRING(nchoosek) @DOCSTRING(perms) -@DOCSTRING(table) - -@DOCSTRING(spearman) +@DOCSTRING(ranks) @DOCSTRING(run_count) -@DOCSTRING(ranks) - -@DOCSTRING(range) - @DOCSTRING(probit) @DOCSTRING(logit) @DOCSTRING(cloglog) -@DOCSTRING(kendall) +@DOCSTRING(mahalanobis) -@DOCSTRING(iqr) - -@DOCSTRING(cut) +@DOCSTRING(table) @node Statistical Plots @section Statistical Plots @@ -146,127 +156,23 @@ @DOCSTRING(ppplot) -@node Tests -@section Tests +@node Correlation and Regression Analysis +@section Correlation and Regression Analysis -Octave can perform several different statistical tests. The following -table summarizes the available tests. +@c FIXME: Need Intro Here + +@DOCSTRING(cov) + +@DOCSTRING(cor) -@tex -\vskip 6pt -{\hbox to \hsize {\hfill\vbox{\offinterlineskip \tabskip=0pt -\halign{ -\vrule height2.0ex depth1.ex width 0.6pt #\tabskip=0.3em & -# \hfil & \vrule # & # \hfil & # \vrule width 0.6pt \tabskip=0pt\cr -\noalign{\hrule height 0.6pt} -& @strong{Hypothesis} && {\bf Test Functions} &\cr -\noalign{\hrule} -& Equal mean values && anova, hotelling\_test2, t\_test\_2, &\cr -& && welch\_test, wilcoxon\_test, z\_test\_2 &\cr -& Equal medians && kruskal\_wallis\_test, sign\_test &\cr -& Equal variances && bartlett\_test, manova, var\_test &\cr -& Equal distributions && chisquare\_test\_homogeneity, &\cr -& && kolmogorov\_smirnov\_test\_2, u\_test &\cr -& Equal marginal frequencies && mcnemar\_test &\cr -& Equal success probabilities && prop\_test\_2 &\cr -& Independent observations && chisquare\_test\_independence, &\cr -& && run\_test &\cr -& Uncorrelated observations && cor\_test &\cr -& Given mean value && hotelling\_test, t\_test, z\_test &\cr -& Observations from distribution && kolmogorov\_smirnov\_test &\cr -& Regression && f\_test\_regression, t\_test\_regression &\cr -\noalign{\hrule height 0.6pt} -}}\hfill}} -@end tex -@ifnottex -@multitable @columnfractions .4 .5 -@item @strong{Hypothesis} - @tab @strong{Test Functions} -@item Equal mean values - @tab @code{anova}, @code{hotelling_test2}, @code{t_test_2}, - @code{welch_test}, @code{wilcoxon_test}, @code{z_test_2} -@item Equal medians - @tab @code{kruskal_wallis_test}, @code{sign_test} -@item Equal variances - @tab @code{bartlett_test}, @code{manova}, @code{var_test} -@item Equal distributions - @tab @code{chisquare_test_homogeneity}, @code{kolmogorov_smirnov_test_2}, - @code{u_test} -@item Equal marginal frequencies - @tab @code{mcnemar_test} -@item Equal success probabilities - @tab @code{prop_test_2} -@item Independent observations - @tab @code{chisquare_test_independence}, @code{run_test} -@item Uncorrelated observations - @tab @code{cor_test} -@item Given mean value - @tab @code{hotelling_test}, @code{t_test}, @code{z_test} -@item Observations from given distribution - @tab @code{kolmogorov_smirnov_test} -@item Regression - @tab @code{f_test_regression}, @code{t_test_regression} -@end multitable -@end ifnottex +@DOCSTRING(corrcoef) + +@DOCSTRING(spearman) -The tests return a p-value that describes the outcome of the test. -Assuming that the test hypothesis is true, the p-value is the probability -of obtaining a worse result than the observed one. So large p-values -corresponds to a successful test. Usually a test hypothesis is accepted -if the p-value exceeds @math{0.05}. - -@DOCSTRING(anova) - -@DOCSTRING(bartlett_test) - -@DOCSTRING(chisquare_test_homogeneity) - -@DOCSTRING(chisquare_test_independence) - -@DOCSTRING(cor_test) - -@DOCSTRING(f_test_regression) - -@DOCSTRING(hotelling_test) - -@DOCSTRING(hotelling_test_2) - -@DOCSTRING(kolmogorov_smirnov_test) - -@DOCSTRING(kolmogorov_smirnov_test_2) - -@DOCSTRING(kruskal_wallis_test) +@DOCSTRING(kendall) -@DOCSTRING(manova) - -@DOCSTRING(mcnemar_test) - -@DOCSTRING(prop_test_2) - -@DOCSTRING(run_test) - -@DOCSTRING(sign_test) - -@DOCSTRING(t_test) - -@DOCSTRING(t_test_2) +@c FIXME: Need discussion of ols & gls and references to them in optim.txi -@DOCSTRING(t_test_regression) - -@DOCSTRING(u_test) - -@DOCSTRING(var_test) - -@DOCSTRING(welch_test) - -@DOCSTRING(wilcoxon_test) - -@DOCSTRING(z_test) - -@DOCSTRING(z_test_2) - -@node Models -@section Models @DOCSTRING(logistic_regression) @@ -275,12 +181,11 @@ Octave has functions for computing the Probability Density Function (PDF), the Cumulative Distribution function (CDF), and the quantile -(the inverse of the CDF) of a large number of distributions. +(the inverse of the CDF) for a large number of distributions. The following table summarizes the supported distributions (in alphabetical order). -@c Do the table explicitly in TeX if possible to get a better layout. @tex \vskip 6pt {\hbox to \hsize {\hfill\vbox{\offinterlineskip \tabskip=0pt @@ -414,133 +319,252 @@ @end multitable @end ifnottex +@DOCSTRING(betapdf) + @DOCSTRING(betacdf) @DOCSTRING(betainv) -@DOCSTRING(betapdf) +@DOCSTRING(binopdf) @DOCSTRING(binocdf) @DOCSTRING(binoinv) -@DOCSTRING(binopdf) +@DOCSTRING(cauchy_pdf) @DOCSTRING(cauchy_cdf) @DOCSTRING(cauchy_inv) -@DOCSTRING(cauchy_pdf) +@DOCSTRING(chi2pdf) @DOCSTRING(chi2cdf) @DOCSTRING(chi2inv) -@DOCSTRING(chi2pdf) +@DOCSTRING(discrete_pdf) @DOCSTRING(discrete_cdf) @DOCSTRING(discrete_inv) -@DOCSTRING(discrete_pdf) +@DOCSTRING(empirical_pdf) @DOCSTRING(empirical_cdf) @DOCSTRING(empirical_inv) -@DOCSTRING(empirical_pdf) +@DOCSTRING(exppdf) @DOCSTRING(expcdf) @DOCSTRING(expinv) -@DOCSTRING(exppdf) +@DOCSTRING(fpdf) @DOCSTRING(fcdf) @DOCSTRING(finv) -@DOCSTRING(fpdf) +@DOCSTRING(gampdf) @DOCSTRING(gamcdf) @DOCSTRING(gaminv) -@DOCSTRING(gampdf) +@DOCSTRING(geopdf) @DOCSTRING(geocdf) @DOCSTRING(geoinv) -@DOCSTRING(geopdf) +@DOCSTRING(hygepdf) @DOCSTRING(hygecdf) @DOCSTRING(hygeinv) -@DOCSTRING(hygepdf) +@DOCSTRING(kolmogorov_smirnov_cdf) -@DOCSTRING(kolmogorov_smirnov_cdf) +@DOCSTRING(laplace_pdf) @DOCSTRING(laplace_cdf) @DOCSTRING(laplace_inv) -@DOCSTRING(laplace_pdf) +@DOCSTRING(logistic_pdf) @DOCSTRING(logistic_cdf) @DOCSTRING(logistic_inv) -@DOCSTRING(logistic_pdf) +@DOCSTRING(lognpdf) @DOCSTRING(logncdf) @DOCSTRING(logninv) -@DOCSTRING(lognpdf) +@DOCSTRING(nbinpdf) @DOCSTRING(nbincdf) @DOCSTRING(nbininv) -@DOCSTRING(nbinpdf) +@DOCSTRING(normpdf) @DOCSTRING(normcdf) @DOCSTRING(norminv) -@DOCSTRING(normpdf) +@DOCSTRING(poisspdf) @DOCSTRING(poisscdf) @DOCSTRING(poissinv) -@DOCSTRING(poisspdf) +@DOCSTRING(tpdf) @DOCSTRING(tcdf) @DOCSTRING(tinv) -@DOCSTRING(tpdf) +@DOCSTRING(unidpdf) @DOCSTRING(unidcdf) @DOCSTRING(unidinv) -@DOCSTRING(unidpdf) +@DOCSTRING(unifpdf) @DOCSTRING(unifcdf) @DOCSTRING(unifinv) -@DOCSTRING(unifpdf) +@DOCSTRING(wblpdf) @DOCSTRING(wblcdf) @DOCSTRING(wblinv) -@DOCSTRING(wblpdf) +@node Tests +@section Tests + +Octave can perform many different statistical tests. The following +table summarizes the available tests. + +@tex +\vskip 6pt +{\hbox to \hsize {\hfill\vbox{\offinterlineskip \tabskip=0pt +\halign{ +\vrule height2.0ex depth1.ex width 0.6pt #\tabskip=0.3em & +# \hfil & \vrule # & # \hfil & # \vrule width 0.6pt \tabskip=0pt\cr +\noalign{\hrule height 0.6pt} +& @strong{Hypothesis} && {\bf Test Functions} &\cr +\noalign{\hrule} +& Equal mean values && anova, hotelling\_test2, t\_test\_2, &\cr +& && welch\_test, wilcoxon\_test, z\_test\_2 &\cr +& Equal medians && kruskal\_wallis\_test, sign\_test &\cr +& Equal variances && bartlett\_test, manova, var\_test &\cr +& Equal distributions && chisquare\_test\_homogeneity, &\cr +& && kolmogorov\_smirnov\_test\_2, u\_test &\cr +& Equal marginal frequencies && mcnemar\_test &\cr +& Equal success probabilities && prop\_test\_2 &\cr +& Independent observations && chisquare\_test\_independence, &\cr +& && run\_test &\cr +& Uncorrelated observations && cor\_test &\cr +& Given mean value && hotelling\_test, t\_test, z\_test &\cr +& Observations from distribution && kolmogorov\_smirnov\_test &\cr +& Regression && f\_test\_regression, t\_test\_regression &\cr +\noalign{\hrule height 0.6pt} +}}\hfill}} +@end tex +@ifnottex +@multitable @columnfractions .4 .5 +@item @strong{Hypothesis} + @tab @strong{Test Functions} +@item Equal mean values + @tab @code{anova}, @code{hotelling_test2}, @code{t_test_2}, + @code{welch_test}, @code{wilcoxon_test}, @code{z_test_2} +@item Equal medians + @tab @code{kruskal_wallis_test}, @code{sign_test} +@item Equal variances + @tab @code{bartlett_test}, @code{manova}, @code{var_test} +@item Equal distributions + @tab @code{chisquare_test_homogeneity}, @code{kolmogorov_smirnov_test_2}, + @code{u_test} +@item Equal marginal frequencies + @tab @code{mcnemar_test} +@item Equal success probabilities + @tab @code{prop_test_2} +@item Independent observations + @tab @code{chisquare_test_independence}, @code{run_test} +@item Uncorrelated observations + @tab @code{cor_test} +@item Given mean value + @tab @code{hotelling_test}, @code{t_test}, @code{z_test} +@item Observations from given distribution + @tab @code{kolmogorov_smirnov_test} +@item Regression + @tab @code{f_test_regression}, @code{t_test_regression} +@end multitable +@end ifnottex + +The tests return a p-value that describes the outcome of the test. +Assuming that the test hypothesis is true, the p-value is the probability +of obtaining a worse result than the observed one. So large p-values +corresponds to a successful test. Usually a test hypothesis is accepted +if the p-value exceeds 0.05. + +@DOCSTRING(anova) + +@DOCSTRING(bartlett_test) + +@DOCSTRING(chisquare_test_homogeneity) + +@DOCSTRING(chisquare_test_independence) + +@DOCSTRING(cor_test) + +@DOCSTRING(f_test_regression) + +@DOCSTRING(hotelling_test) + +@DOCSTRING(hotelling_test_2) + +@DOCSTRING(kolmogorov_smirnov_test) + +@DOCSTRING(kolmogorov_smirnov_test_2) + +@DOCSTRING(kruskal_wallis_test) + +@DOCSTRING(manova) + +@DOCSTRING(mcnemar_test) + +@DOCSTRING(prop_test_2) + +@DOCSTRING(run_test) + +@DOCSTRING(sign_test) + +@DOCSTRING(t_test) + +@DOCSTRING(t_test_2) + +@DOCSTRING(t_test_regression) + +@DOCSTRING(u_test) + +@DOCSTRING(var_test) + +@DOCSTRING(welch_test) + +@DOCSTRING(wilcoxon_test) + +@DOCSTRING(z_test) + +@DOCSTRING(z_test_2) @node Random Number Generation @section Random Number Generation diff -r 20f53b3a558f -r e151e23f73bc scripts/ChangeLog --- a/scripts/ChangeLog Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/ChangeLog Mon Jan 03 21:23:08 2011 -0800 @@ -1,3 +1,63 @@ +2011-01-03 Rik + + * statistics/base/center.m, statistics/base/corrcoef.m, + statistics/base/kendall.m, statistics/base/mean.m, + statistics/base/meansq.m, statistics/base/skewness.m, + statistics/base/studentize.m, statistics/base/var.m, + statistics/base/run_count.m, statistics/base/ranks.m: Improve input + validation. Add function tests. Improve docstring. + + * statistics/base/moment.m, statistics/base/prctile.m, + statistics/base/spearman.m, statistics/base/std.m : Improve input + validation. Add input validation tests. Improve docstring. + + * statistics/base/cloglog.m: Add function tests. + + * statistics/base/cor.m: Replace with call to corrcoef, now only an + alias. + + * statistics/base/cov.m: Add normalization option. Improve input + validation. Add function tests. Improve docstring. + + * statistics/base/cut.m: Use lowercase variable names. Improve + docstring. + + * statistics/base/histc.m, statistics/base/median.m: Use same variable + name in documentation and in function. Add input validation tests. + Improve docstring. + + * statistics/base/iqr.m, statistics/base/mode.m: Add input validation + tests. Improve docstring. + + * statistics/base/kurtosis.m: Return same class as input variable. Add + input validation tests. Improve docstring. + + + * statistics/base/logit.m, statistics/base/range.m: Add function tests. + Improve docstring. + + * statistics/base/mahalanobis.m: Use lower case variable names. + Improve input validation. Add input validation tests. + + * statistics/base/ols.m, statistics/base/gls.m: Use isargout to only + calculate necessary outputs. Use lowercase variable names. Add + functional tests. Improve docstring. + + * statistics/base/ppplot.m: Add input validation tests. + + * statistics/base/qqplot.m: Add ability to call "XXXinv" or "XXX_inv" + functions. Improve input validation. Improve docstring. + + * statistics/base/quantile.m: Use defaults for input arguments to + simplify code. Improve input validation. Add input validation tests. + Improve docstring. + + * statistics/base/statistics.m: Use lowercase variable names. Improve + input validation. Add input validation tests. Improve docstring. + + * statistics/base/table.m: Switch from deprecated function 'values' to + 'unique'. Add input validation tests. Improve docstring. + 2011-01-02 Ben Abbott * plot/legend.m: Only one legend per axes (bug 32022). Add / modify diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/center.m --- a/scripts/statistics/base/center.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/center.m Mon Jan 03 21:23:08 2011 -0800 @@ -23,8 +23,8 @@ ## @deftypefnx {Function File} {} center (@var{x}, @var{dim}) ## If @var{x} is a vector, subtract its mean. ## If @var{x} is a matrix, do the above for each column. -## If the optional argument @var{dim} is given, perform the above -## operation along this dimension +## If the optional argument @var{dim} is given, operate along this dimension. +## @seealso{studentize} ## @end deftypefn ## Author: KH @@ -36,18 +36,49 @@ print_usage (); endif - if (nargin < 2) - dim = [find(size (x) != 1, 1), 1](1); # First non-singleton dim. + if (!isnumeric (x)) + error ("center: X must be a numeric vector or matrix"); endif - n = size (x, dim); if (isinteger (x)) x = double (x); endif + nd = ndims (x); + sz = size (x); + if (nargin != 2) + ## Find the first non-singleton dimension. + dim = find (sz > 1, 1); + if (isempty (dim)) + dim = 1; + endif + else + if (!(isscalar (dim) && dim == fix (dim)) + || !(1 <= dim && dim <= nd)) + error ("center: DIM must be an integer and a valid dimension"); + endif + endif + + n = size (x, dim); + if (n == 0) retval = x; else retval = bsxfun (@minus, x, sum (x, dim) / n); endif + endfunction + +%!assert(center ([1,2,3]), [-1,0,1]) +%!assert(center (int8 ([1,2,3])), [-1,0,1]) +%!assert(center (ones (3,2,0,2)), zeros (3,2,0,2)) +%!assert(center (magic (3)), [3,-4,1;-2,0,2;-1,4,-3]) + +%% Test input validation +%!error center () +%!error center (1, 2, 3) +%!error center ([true true]) +%!error center (1, ones(2,2)) +%!error center (1, 1.5) +%!error center (1, 0) +%!error center (1, 3) diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/cloglog.m --- a/scripts/statistics/base/cloglog.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/cloglog.m Mon Jan 03 21:23:08 2011 -0800 @@ -46,3 +46,11 @@ y = - log (- log (x)); endfunction + +%!assert(cloglog(0), -Inf) +%!assert(cloglog(1), Inf) +%!assert(cloglog(1/e), 0) + +%% Test input validation +%!error cloglog () +%!error cloglog (1, 2) diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/cor.m --- a/scripts/statistics/base/cor.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/cor.m Mon Jan 03 21:23:08 2011 -0800 @@ -18,50 +18,21 @@ ## . ## -*- texinfo -*- -## @deftypefn {Function File} {} cor (@var{x}, @var{y}) -## Compute correlation. -## -## The (@var{i}, @var{j})-th entry of @code{cor (@var{x}, @var{y})} is -## the correlation between the @var{i}-th variable in @var{x} and the -## @var{j}-th variable in @var{y}. +## @deftypefn {Function File} {} cor (@var{x}) +## @deftypefnx {Function File} {} cor (@var{x}, @var{y}) +## Compute matrix of correlation coefficients. ## -## @tex -## $$ -## {\rm corrcoef}(x,y) = {{\rm cov}(x,y) \over {\rm std}(x) {\rm std}(y)} -## $$ -## @end tex -## @ifnottex -## -## @example -## corrcoef(x,y) = cov(x,y)/(std(x)*std(y)) -## @end example -## -## @end ifnottex -## For matrices, each row is an observation and each column a variable; -## vectors are always observations and may be row or column vectors. -## -## @code{cor (@var{x})} is equivalent to @code{cor (@var{x}, @var{x})}. -## -## Note that the @code{corrcoef} function does the same as @code{cor}. +## This is an alias for @code{corrcoef}. +## @seealso{corrcoef} ## @end deftypefn -## Author: KH -## Description: Compute correlations - -function retval = cor (x, y) +function retval = cor (x, y = []) if (nargin < 1 || nargin > 2) print_usage (); endif - if (nargin == 2) - c = cov (x, y); - s = std (x)' * std (y); - retval = c ./ s; - elseif (nargin == 1) - c = cov (x); - s = reshape (sqrt (diag (c)), 1, columns (c)); - retval = c ./ (s' * s); - endif + retval = corrcoef (x, y); endfunction + diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/corrcoef.m --- a/scripts/statistics/base/corrcoef.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/corrcoef.m Mon Jan 03 21:23:08 2011 -0800 @@ -18,14 +18,14 @@ ## . ## -*- texinfo -*- -## @deftypefn {Function File} {} corrcoef (@var{x}, @var{y}) -## Compute correlation. +## @deftypefn {Function File} {} corrcoef (@var{x}) +## @deftypefnx {Function File} {} corrcoef (@var{x}, @var{y}) +## Compute matrix of correlation coefficients. ## ## If each row of @var{x} and @var{y} is an observation and each column is -## a variable, the (@var{i}, @var{j})-th entry of +## a variable, then the @w{(@var{i}, @var{j})-th} entry of ## @code{corrcoef (@var{x}, @var{y})} is the correlation between the ## @var{i}-th variable in @var{x} and the @var{j}-th variable in @var{y}. -## ## @tex ## $$ ## {\rm corrcoef}(x,y) = {{\rm cov}(x,y) \over {\rm std}(x) {\rm std}(y)} @@ -38,27 +38,44 @@ ## @end example ## ## @end ifnottex -## If called with one argument, compute @code{corrcoef (@var{x}, @var{x})}. +## If called with one argument, compute @code{corrcoef (@var{x}, @var{x})}, +## the correlation between the columns of @var{x}. +## @seealso{cov} ## @end deftypefn ## Author: Kurt Hornik ## Created: March 1993 ## Adapted-By: jwe -function retval = corrcoef (x, y) +function retval = corrcoef (x, y = []) if (nargin < 1 || nargin > 2) print_usage (); endif + if (! (isnumeric (x) && isnumeric (y))) + error ("corrcoef: X and Y must be numeric matrices or vectors"); + endif + + if (ndims (x) != 2 || ndims (y) != 2) + error ("corrcoef: X and Y must be 2-D matrices or vectors"); + endif + + if (isscalar (x)) + retval = 1; + return; + endif + + ## No check for division by zero error, which happens only when + ## there is a constant vector and should be rare. if (nargin == 2) c = cov (x, y); s = std (x)' * std (y); retval = c ./ s; - elseif (nargin == 1) + else c = cov (x); - s = reshape (sqrt (diag (c)), 1, columns (c)); - retval = c ./ (s' * s); + s = sqrt (diag (c)); + retval = c ./ (s * s'); endif endfunction @@ -70,7 +87,13 @@ %! assert((size (cc1) == [10, 10] && size (cc2) == [10, 10] %! && abs (cc1 - cc2) < sqrt (eps))); -%!error corrcoef (); +%!assert(corrcoef (5), 1); +%% Test input validation +%!error corrcoef (); %!error corrcoef (1, 2, 3); +%!error corrcoef ([true, true]); +%!error corrcoef ([1, 2], [true, true]); +%!error corrcoef (ones (2,2,2)); +%!error corrcoef (ones (2,2), ones (2,2,2)); diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/cov.m --- a/scripts/statistics/base/cov.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/cov.m Mon Jan 03 21:23:08 2011 -0800 @@ -18,11 +18,14 @@ ## . ## -*- texinfo -*- -## @deftypefn {Function File} {} cov (@var{x}, @var{y}) -## Compute covariance. +## @deftypefn {Function File} {} cov (@var{x}) +## @deftypefnx {Function File} {} cov (@var{x}, @var{opt}) +## @deftypefnx {Function File} {} cov (@var{x}, @var{y}) +## @deftypefnx {Function File} {} cov (@var{x}, @var{y}, @var{opt}) +## Compute the covariance matrix. ## -## If each row of @var{x} and @var{y} is an observation and each column is -## a variable, the (@var{i}, @var{j})-th entry of +## If each row of @var{x} and @var{y} is an observation, and each column is +## a variable, then the @w{(@var{i}, @var{j})-th} entry of ## @code{cov (@var{x}, @var{y})} is the covariance between the @var{i}-th ## variable in @var{x} and the @var{j}-th variable in @var{y}. ## @tex @@ -31,36 +34,79 @@ ## $$ ## where $\bar{x}$ and $\bar{y}$ are the mean values of $x$ and $y$. ## @end tex -## If called with one argument, compute @code{cov (@var{x}, @var{x})}. +## @ifnottex +## +## @example +## cov (x) = 1/N-1 * SUM_i (x(i) - mean(x)) * (y(i) - mean(y)) +## @end example +## +## @end ifnottex +## +## If called with one argument, compute @code{cov (@var{x}, @var{x})}, the +## covariance between the columns of @var{x}. +## +## The argument @var{opt} determines the type of normalization to use. +## Valid values are +## +## @table @asis +## @item 0: +## normalize with @math{N-1}, provides the best unbiased estimator of the +## covariance [default] +## +## @item 1: +## normalize with @math{N}, this provides the second moment around the mean +## @end table +## @seealso{corrcoef, cor} ## @end deftypefn ## Author: KH ## Description: Compute covariances -function c = cov (x, y) +function c = cov (x, y = [], opt = 0) + + if (nargin < 1 || nargin > 3) + print_usage (); + endif + + if (! (isnumeric (x) && isnumeric (y))) + error ("cov: X and Y must be numeric matrices or vectors"); + endif - if (nargin < 1 || nargin > 2) - print_usage (); + if (ndims (x) != 2 || ndims (y) != 2) + error ("cov: X and Y must be 2-D matrices or vectors"); + endif + + if (nargin == 2 && isscalar(y)) + opt = y; + endif + + if (opt != 0 && opt != 1) + error ("cov: normalization OPT must be 0 or 1"); + endif + + if (isscalar (x)) + c = 0; + return; endif if (rows (x) == 1) x = x'; endif n = rows (x); - - if (nargin == 2) + + if (nargin == 1 || isscalar(y)) + x = center (x, 1); + c = conj (x' * x / (n - 1 + opt)); + else if (rows (y) == 1) y = y'; endif if (rows (y) != n) - error ("cov: x and y must have the same number of observations"); + error ("cov: X and Y must have the same number of observations"); endif x = center (x, 1); y = center (y, 1); - c = conj (x' * y / (n - 1)); - elseif (nargin == 1) - x = center (x, 1); - c = conj (x' * x / (n - 1)); + c = conj (x' * y / (n - 1 + opt)); endif endfunction @@ -71,7 +117,28 @@ %! cx2 = cov (x, x); %! assert(size (cx1) == [10, 10] && size (cx2) == [10, 10] && norm(cx1-cx2) < 1e1*eps); -%!error cov (); +%!test +%! x = [1:5]; +%! c = cov (x); +%! assert(isscalar (c)); +%! assert(c, 2.5); + +%!test +%! x = [1:5]; +%! c = cov (x, 0); +%! assert(c, 2.5); +%! c = cov (x, 1); +%! assert(c, 2); -%!error cov (1, 2, 3); +%!assert(cov (5), 0); +%% Test input validation +%!error cov (); +%!error cov (1, 2, 3, 4); +%!error cov ([true, true]); +%!error cov ([1, 2], [true, true]); +%!error cov (ones (2,2,2)); +%!error cov (ones (2,2), ones (2,2,2)); +%!error cov (1, 3); +%!error cov (ones (2,2), ones (3,2)); + diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/cut.m --- a/scripts/statistics/base/cut.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/cut.m Mon Jan 03 21:23:08 2011 -0800 @@ -19,7 +19,7 @@ ## -*- texinfo -*- ## @deftypefn {Function File} {} cut (@var{x}, @var{breaks}) -## Create categorical data out of numerical or continuous data by +## Create categorical data from numerical or continuous data by ## cutting into intervals. ## ## If @var{breaks} is a scalar, the data is cut into that many @@ -30,35 +30,36 @@ ## which group each point in @var{x} belongs to. Groups are labelled ## from 1 to the number of groups; points outside the range of ## @var{breaks} are labelled by @code{NaN}. +## @seealso{histc} ## @end deftypefn ## Author: KH ## Description: Cut data into intervals -function group = cut (X, BREAKS) +function group = cut (x, breaks) if (nargin != 2) print_usage (); endif - if (! isvector (X)) + if (!isvector (x)) error ("cut: X must be a vector"); endif - if isscalar (BREAKS) - BREAKS = linspace (min (X), max (X), BREAKS + 1); - BREAKS(1) = BREAKS(1) - 1; - elseif isvector (BREAKS) - BREAKS = sort (BREAKS); + if isscalar (breaks) + breaks = linspace (min (x), max (x), breaks + 1); + breaks(1) = breaks(1) - 1; + elseif isvector (breaks) + breaks = sort (breaks); else error ("cut: BREAKS must be a scalar or vector"); endif - group = NaN (size (X)); - m = length (BREAKS); - if any (k = find ((X >= min (BREAKS)) & (X < max (BREAKS)))) + group = NaN (size (x)); + m = length (breaks); + if any (k = find ((x >= min (breaks)) & (x < max (breaks)))) n = length (k); - group(k) = sum ((ones (m, 1) * reshape (X(k), 1, n)) - >= (reshape (BREAKS, m, 1) * ones (1, n))); + group(k) = sum ((ones (m, 1) * reshape (x(k), 1, n)) + >= (reshape (breaks, m, 1) * ones (1, n))); endif endfunction diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/gls.m --- a/scripts/statistics/base/gls.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/gls.m Mon Jan 03 21:23:08 2011 -0800 @@ -25,7 +25,7 @@ ## with $\bar{e} = 0$ and cov(vec($e$)) = $(s^2)o$, ## @end tex ## @ifnottex -## @math{y = x b + e} with @math{mean (e) = 0} and +## @w{@math{y = x*b + e}} with @math{mean (e) = 0} and ## @math{cov (vec (e)) = (s^2) o}, ## @end ifnottex ## where @@ -37,8 +37,8 @@ ## @ifnottex ## @math{y} is a @math{t} by @math{p} matrix, @math{x} is a @math{t} by ## @math{k} matrix, @math{b} is a @math{k} by @math{p} matrix, @math{e} -## is a @math{t} by @math{p} matrix, and @math{o} is a @math{t p} by -## @math{t p} matrix. +## is a @math{t} by @math{p} matrix, and @math{o} is a @math{t*p} by +## @math{t*p} matrix. ## @end ifnottex ## ## @noindent @@ -54,41 +54,82 @@ ## The GLS estimator for @math{s^2}. ## ## @item r -## The matrix of GLS residuals, @math{r = y - x beta}. +## The matrix of GLS residuals, @math{r = y - x*beta}. ## @end table +## @seealso{ols} ## @end deftypefn ## Author: Teresa Twaroch ## Created: May 1993 ## Adapted-By: jwe -function [BETA, v, R] = gls (Y, X, O) +function [beta, v, r] = gls (y, x, o) if (nargin != 3) print_usage (); endif - [rx, cx] = size (X); - [ry, cy] = size (Y); + if (! (isnumeric (x) && isnumeric (y) && isnumeric (o))) + error ("gls: X, Y, and O must be numeric matrices or vectors"); + endif + + if (ndims (x) != 2 || ndims (y) != 2 || ndims (o) != 2) + error ("gls: X, Y and O must be 2-D matrices or vectors"); + endif + + [rx, cx] = size (x); + [ry, cy] = size (y); + [ro, co] = size (o); if (rx != ry) - error ("gls: incorrect matrix dimensions"); + error ("gls: number of rows of X and Y must be equal"); + endif + if (!issquare(o) || ro != ry*cy) + error ("gls: matrix O must be square matrix with rows = rows (Y) * cols (Y)"); + endif + + o = o^(-1/2); + z = kron (eye (cy), x); + z = o * z; + y1 = o * reshape (y, ry*cy, 1); + u = z' * z; + r = rank (u); + + if (r == cx*cy) + b = inv (u) * z' * y1; + else + b = pinv (z) * y1; endif - O = O^(-1/2); - Z = kron (eye (cy), X); - Z = O * Z; - Y1 = O * reshape (Y, ry*cy, 1); - U = Z' * Z; - r = rank (U); + beta = reshape (b, cx, cy); - if (r == cx*cy) - B = inv (U) * Z' * Y1; - else - B = pinv (Z) * Y1; + if (isargout (2) || isargout (3)) + r = y - x * beta; + if (isargout (2)) + v = (reshape (r, ry*cy, 1))' * (o^2) * reshape (r, ry*cy, 1) / (rx*cy - r); + endif endif - BETA = reshape (B, cx, cy); - R = Y - X * BETA; - v = (reshape (R, ry*cy, 1))' * (O^2) * reshape (R, ry*cy, 1) / (rx*cy - r); +endfunction + + +%!test +%! x = [1:5]'; +%! y = 3*x + 2; +%! x = [x, ones(5,1)]; +%! o = diag (ones (5,1)); +%! assert (gls (y,x,o), [3; 2], 50*eps) -endfunction +%% Test input validation +%!error gls () +%!error gls (1) +%!error gls (1, 2) +%!error gls (1, 2, 3, 4) +%!error gls ([true, true], [1, 2], ones (2)) +%!error gls ([1, 2], [true, true], ones (2)) +%!error gls ([1, 2], [1, 2], true (2)) +%!error gls (ones (2,2,2), ones (2,2), ones (4,4)) +%!error gls (ones (2,2), ones (2,2,2), ones (4,4)) +%!error gls (ones (2,2), ones (2,2), ones (4,4,4)) +%!error gls (ones(1,2), ones(2,2), ones (2,2)) +%!error gls (ones(2,2), ones(2,2), ones (2,2)) + diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/histc.m --- a/scripts/statistics/base/histc.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/histc.m Mon Jan 03 21:23:08 2011 -0800 @@ -18,39 +18,38 @@ ## . ## -*- texinfo -*- -## @deftypefn {Function File} {@var{n} =} histc (@var{y}, @var{edges}) -## @deftypefnx {Function File} {@var{n} =} histc (@var{y}, @var{edges}, @var{dim}) +## @deftypefn {Function File} {@var{n} =} histc (@var{x}, @var{edges}) +## @deftypefnx {Function File} {@var{n} =} histc (@var{x}, @var{edges}, @var{dim}) ## @deftypefnx {Function File} {[@var{n}, @var{idx}] =} histc (@dots{}) ## Produce histogram counts. ## -## When @var{y} is a vector, the function counts the number of elements of -## @var{y} that fall in the histogram bins defined by @var{edges}. This must be -## a vector of monotonically non-decreasing values that define the edges of the -## histogram bins. So, @code{@var{n} (k)} contains the number of elements in -## @var{y} for which @code{@var{edges} (k) <= @var{y} < @var{edges} (k+1)}. -## The final element of @var{n} contains the number of elements of @var{y} -## that was equal to the last element of @var{edges}. +## When @var{x} is a vector, the function counts the number of elements of +## @var{x} that fall in the histogram bins defined by @var{edges}. This must be +## a vector of monotonically increasing values that define the edges of the +## histogram bins. @code{@var{n}(k)} contains the number of elements in +## @var{x} for which @code{@var{edges}(k) <= @var{x} < @var{edges}(k+1)}. +## The final element of @var{n} contains the number of elements of @var{x} +## exactly equal to the last element of @var{edges}. ## -## When @var{y} is a @math{N}-dimensional array, the same operation as above is -## repeated along dimension @var{dim}. If not specified @var{dim} defaults +## When @var{x} is an @math{N}-dimensional array, the computation is +## carried out along dimension @var{dim}. If not specified @var{dim} defaults ## to the first non-singleton dimension. ## -## If a second output argument is requested an index matrix is also returned. -## The @var{idx} matrix has same size as @var{y}. Each element of @var{idx} +## When a second output argument is requested an index matrix is also returned. +## The @var{idx} matrix has the same size as @var{x}. Each element of @var{idx} ## contains the index of the histogram bin in which the corresponding element -## of @var{y} was counted. -## +## of @var{x} was counted. ## @seealso{hist} ## @end deftypefn -function [n, idx] = histc (data, edges, dim) - ## Check input +function [n, idx] = histc (x, edges, dim) + if (nargin < 2 || nargin > 3) print_usage (); endif - if (!isreal (data)) - error ("histc: Y argument must be real-valued, not complex"); + if (!isreal (x)) + error ("histc: X argument must be real-valued, not complex"); endif num_edges = numel (edges); @@ -63,14 +62,14 @@ else ## Make sure 'edges' is sorted edges = edges (:); - if (! issorted (edges) || edges(1) > edges(end)) + if (!issorted (edges) || edges(1) > edges(end)) warning ("histc: edge values not sorted on input"); edges = sort (edges); endif endif - nd = ndims (data); - sz = size (data); + nd = ndims (x); + sz = size (x); if (nargin < 3) ## Find the first non-singleton dimension. dim = find (sz > 1, 1); @@ -78,14 +77,14 @@ dim = 1; endif else - if (!(isscalar (dim) && dim == round (dim)) + if (!(isscalar (dim) && dim == fix (dim)) || !(1 <= dim && dim <= nd)) error ("histc: DIM must be an integer and a valid dimension"); endif endif nsz = sz; - nsz (dim) = num_edges; + nsz(dim) = num_edges; ## the splitting point is 3 bins @@ -104,22 +103,22 @@ ## Prepare indices idx1 = cell (1, dim-1); for k = 1:length (idx1) - idx1 {k} = 1:sz (k); + idx1 {k} = 1:sz(k); endfor idx2 = cell (length (sz) - dim); for k = 1:length (idx2) - idx2 {k} = 1:sz (k+dim); + idx2 {k} = 1:sz(k+dim); endfor ## Compute the histograms for k = 1:num_edges-1 - b = (edges (k) <= data & data < edges (k+1)); + b = (edges (k) <= x & x < edges (k+1)); n (idx1 {:}, k, idx2 {:}) = sum (b, dim); if (nargout > 1) idx (b) = k; endif endfor - b = (data == edges (end)); + b = (x == edges (end)); n (idx1 {:}, num_edges, idx2 {:}) = sum (b, dim); if (nargout > 1) idx (b) = num_edges; @@ -130,48 +129,50 @@ ## This is the O(M*log(N) + N) algorithm. ## Look-up indices. - idx = lookup (edges, data); - ## Zero invalid ones (including NaNs). data < edges(1) are already zero. - idx(! (data <= edges(end))) = 0; + idx = lookup (edges, x); + ## Zero invalid ones (including NaNs). x < edges(1) are already zero. + idx(! (x <= edges(end))) = 0; - ## Don't accumulate the histogram if not needed. In that case, - ## histc() is just a (Matlab-compatible) wrapper for lookup. - if (isargout (1)) - iidx = idx; + iidx = idx; - ## In case of matrix input, we adjust the indices. - if (! isvector (data)) - nl = prod (sz(1:dim-1)); - nn = sz(dim); - nu = prod (sz(dim+1:end)); - if (nl != 1) - iidx = (iidx-1) * nl; - iidx += reshape (kron (ones (1, nn*nu), 1:nl), sz); - endif - if (nu != 1) - ne =length (edges); - iidx += reshape (kron (nl*ne*(0:nu-1), ones (1, nl*nn)), sz); - endif + ## In case of matrix input, we adjust the indices. + if (! isvector (x)) + nl = prod (sz(1:dim-1)); + nn = sz(dim); + nu = prod (sz(dim+1:end)); + if (nl != 1) + iidx = (iidx-1) * nl; + iidx += reshape (kron (ones (1, nn*nu), 1:nl), sz); endif - - ## Select valid elements. - iidx = iidx(idx != 0); + if (nu != 1) + ne =length (edges); + iidx += reshape (kron (nl*ne*(0:nu-1), ones (1, nl*nn)), sz); + endif + endif - ## Call accumarray to sum the indexed elements. - n = accumarray (iidx(:), 1, nsz); - endif + ## Select valid elements. + iidx = iidx(idx != 0); + + ## Call accumarray to sum the indexed elements. + n = accumarray (iidx(:), 1, nsz); endif endfunction %!test -%! data = linspace (0, 10, 1001); -%! n = histc (data, 0:10); +%! x = linspace (0, 10, 1001); +%! n = histc (x, 0:10); %! assert (n, [repmat(100, 1, 10), 1]); %!test -%! data = repmat (linspace (0, 10, 1001), [2, 1, 3]); -%! n = histc (data, 0:10, 2); +%! x = repmat (linspace (0, 10, 1001), [2, 1, 3]); +%! n = histc (x, 0:10, 2); %! assert (n, repmat ([repmat(100, 1, 10), 1], [2, 1, 3])); +%!error histc (); +%!error histc (1); +%!error histc (1, 2, 3, 4); +%!error histc ([1:10 1+i], 2); +%!error histc (1:10, []); +%!error histc (1, 1, 3); diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/iqr.m --- a/scripts/statistics/base/iqr.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/iqr.m Mon Jan 03 21:23:08 2011 -0800 @@ -18,13 +18,17 @@ ## . ## -*- texinfo -*- -## @deftypefn {Function File} {} iqr (@var{x}, @var{dim}) -## If @var{x} is a vector, return the interquartile range, i.e., the -## difference between the upper and lower quartile, of the input data. +## @deftypefn {Function File} {} iqr (@var{x}) +## @deftypefnx {Function File} {} iqr (@var{x}, @var{dim}) +## Return the interquartile range, i.e., the difference between the upper +## and lower quartile of the input data. If @var{x} is a matrix, do the +## above for first non-singleton dimension of @var{x}. ## -## If @var{x} is a matrix, do the above for first non-singleton -## dimension of @var{x}. If the option @var{dim} argument is given, -## then operate along this dimension. +## If the optional argument @var{dim} is given, operate along this dimension. +## +## As a measure of dispersion, the interquartile range is less affected by +## outliers than either @code{range} or @code{std}. +## @seealso{range, std} ## @end deftypefn ## Author KH @@ -36,8 +40,8 @@ print_usage (); endif - if (!ismatrix(x) || ischar(x)) - error ("iqr: X must be a numeric matrix or vector"); + if (!(ismatrix (x) && isnumeric (x)) || isscalar(x)) + error ("iqr: X must be a numeric vector or matrix"); endif nd = ndims (x); @@ -50,7 +54,7 @@ dim = 1; endif else - if (!(isscalar (dim) && dim == round (dim)) + if (!(isscalar (dim) && dim == fix (dim)) || !(1 <= dim && dim <= nd)) error ("iqr: DIM must be an integer and a valid dimension"); endif @@ -76,3 +80,16 @@ endfor endfunction + +%!assert (iqr (1:101), 50); + +%%!test +%%! x = [1:100]; +%%! n = iqr (x, 0:10); +%%! assert (n, [repmat(100, 1, 10), 1]); + +%!error iqr (); +%!error iqr (1, 2, 3); +%!error iqr (1); +%!error iqr ([true, true]); +%!error iqr (1:10, 3); diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/kendall.m --- a/scripts/statistics/base/kendall.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/kendall.m Mon Jan 03 21:23:08 2011 -0800 @@ -18,15 +18,10 @@ ## . ## -*- texinfo -*- -## @deftypefn {Function File} {} kendall (@var{x}, @var{y}) -## Compute Kendall's @var{tau} for each of the variables specified by -## the input arguments. -## -## For matrices, each row is an observation and each column a variable; -## vectors are always observations and may be row or column vectors. -## -## @code{kendall (@var{x})} is equivalent to @code{kendall (@var{x}, -## @var{x})}. +## @deftypefn {Function File} {} kendall (@var{x}) +## @deftypefnx {Function File} {} kendall (@var{x}, @var{y}) +## @cindex Kendall's Tau +## Compute Kendall's @var{tau}. ## ## For two data vectors @var{x}, @var{y} of common length @var{n}, ## Kendall's @var{tau} is the correlation of the signs of all rank @@ -65,15 +60,27 @@ ## @ifnottex ## @code{(2 * (2@var{n}+5)) / (9 * @var{n} * (@var{n}-1))}. ## @end ifnottex +## +## @code{kendall (@var{x})} is equivalent to @code{kendall (@var{x}, +## @var{x})}. +## @seealso{ranks, spearman} ## @end deftypefn ## Author: KH ## Description: Kendall's rank correlation tau -function tau = kendall (x, y) +function tau = kendall (x, y = []) + + if (nargin < 1 || nargin > 2) + print_usage (); + endif - if ((nargin < 1) || (nargin > 2)) - print_usage (); + if (! (isnumeric (x) && isnumeric (y))) + error ("kendall: X and Y must be numeric matrices or vectors"); + endif + + if (ndims (x) != 2 || ndims (y) != 2) + error ("kendall: X and Y must be 2-D matrices or vectors"); endif if (rows (x) == 1) @@ -86,7 +93,7 @@ y = y'; endif if (rows (y) != n) - error ("kendall: x and y must have the same number of observations"); + error ("kendall: X and Y must have the same number of observations"); else x = [x, y]; endif @@ -94,10 +101,20 @@ r = ranks (x); m = sign (kron (r, ones (n, 1)) - kron (ones (n, 1), r)); - tau = cor (m); + tau = corrcoef (m); if (nargin == 2) tau = tau (1 : c, (c + 1) : columns (x)); endif endfunction + + +%% Test input validation +%!error kendall (); +%!error kendall (1, 2, 3); +%!error kendall ([true, true]); +%!error kendall (ones(1,2), [true, true]); +%!error kendall (ones (2,2,2)); +%!error kendall (ones (2,2), ones (2,2,2)); +%!error kendall (ones (2,2), ones (3,2)); diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/kurtosis.m --- a/scripts/statistics/base/kurtosis.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/kurtosis.m Mon Jan 03 21:23:08 2011 -0800 @@ -18,11 +18,12 @@ ## . ## -*- texinfo -*- -## @deftypefn {Function File} {} kurtosis (@var{x}, @var{dim}) -## If @var{x} is a vector of length @math{N}, return the kurtosis +## @deftypefn {Function File} {} kurtosis (@var{x}) +## @deftypefnx {Function File} {} kurtosis (@var{x}, @var{dim}) +## Compute the kurtosis of the elements of the vector @var{x}. ## @tex ## $$ -## {\rm kurtosis} (x) = {1\over N \sigma(x)^4} \sum_{i=1}^N (x_i-\bar{x})^4 - 3 +## {\rm kurtosis} (x) = {1\over N \sigma^4} \sum_{i=1}^N (x_i-\bar{x})^4 - 3 ## $$ ## where $\bar{x}$ is the mean value of $x$. ## @end tex @@ -33,10 +34,15 @@ ## @end example ## ## @end ifnottex -## @noindent -## of @var{x}. If @var{x} is a matrix, return the kurtosis over the -## first non-singleton dimension. The optional argument @var{dim} -## can be given to force the kurtosis to be given over that dimension. +## If @var{x} is a matrix, return the kurtosis over the +## first non-singleton dimension of the matrix. If the optional +## @var{dim} argument is given, operate along this dimension. +## +## Note: The definition of kurtosis above yields a kurtosis of zero for the +## stdnormal distribution and is sometimes referred to as "excess kurtosis". +## To calculate kurtosis without the normalization factor of @math{-3} use +## @code{moment (@var{x}, 4, 'c') / std (@var{x})^4}. +## @seealso{var,skewness,moment} ## @end deftypefn ## Author: KH @@ -49,8 +55,8 @@ print_usage (); endif - if (!ismatrix(x) || ischar(x)) - error ("kurtosis: X must be a numeric matrix or vector"); + if (!isnumeric (x)) + error ("kurtosis: X must be a numeric vector or matrix"); endif nd = ndims (x); @@ -62,7 +68,7 @@ dim = 1; endif else - if (!(isscalar (dim) && dim == round (dim)) + if (!(isscalar (dim) && dim == fix (dim)) || !(1 <= dim && dim <= nd)) error ("kurtosis: DIM must be an integer and a valid dimension"); endif @@ -73,20 +79,26 @@ idx = ones (1, nd); idx(dim) = c; x = x - repmat (mean (x, dim), idx); - retval = zeros (sz); + retval = zeros (sz, class (x)); s = std (x, [], dim); - x = sum(x.^4, dim); + x = sum (x.^4, dim); ind = find (s > 0); retval(ind) = x(ind) ./ (c * s(ind) .^ 4) - 3; endfunction + %!test %! x = [-1; 0; 0; 0; 1]; %! y = [x, 2*x]; %! assert(all (abs (kurtosis (y) - [-1.4, -1.4]) < sqrt (eps))); -%!error kurtosis (); +%% Test input validation +%!error kurtosis () +%!error kurtosis (1, 2, 3) +%!error kurtosis (true(1,2)) +%!error kurtosis (1, ones(2,2)) +%!error kurtosis (1, 1.5) +%!error kurtosis (1, 0) +%!error kurtosis (1, 3) -%!error kurtosis (1, 2, 3); - diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/logit.m --- a/scripts/statistics/base/logit.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/logit.m Mon Jan 03 21:23:08 2011 -0800 @@ -32,6 +32,7 @@ ## @end example ## ## @end ifnottex +## @seealso{logistic_cdf} ## @end deftypefn ## Author: KH @@ -39,10 +40,20 @@ function y = logit (p) - if (nargin == 1) - y = logistic_inv (p); - else + if (nargin != 1) print_usage (); endif + y = logistic_inv (p); + endfunction + +%!test +%! p = [0.01:0.01:0.99]; +%! assert(logit (p), log (p ./ (1-p)), 25*eps) + +%!assert(logit ([-1, 0, 0.5, 1, 2]), [NaN, -Inf, 0, +Inf, NaN]) + +%% Test input validation +%!error logit () +%!error logit (1, 2) diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/mahalanobis.m --- a/scripts/statistics/base/mahalanobis.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/mahalanobis.m Mon Jan 03 21:23:08 2011 -0800 @@ -29,35 +29,46 @@ ## Created: July 1993 ## Adapted-By: jwe -function retval = mahalanobis (X, Y) +function retval = mahalanobis (x, y) if (nargin != 2) print_usage (); endif - [xr, xc] = size (X); - [yr, yc] = size (Y); + if (! (isnumeric (x) && isnumeric (y))) + error ("mahalanobis: X and Y must be numeric matrices or vectors"); + endif + + if (ndims (x) != 2 || ndims (y) != 2) + error ("mahalanobis: X and Y must be 2-D matrices or vectors"); + endif + + [xr, xc] = size (x); + [yr, yc] = size (y); if (xc != yc) error ("mahalanobis: X and Y must have the same number of columns"); endif - Xm = sum (X) / xr; - Ym = sum (Y) / yr; + xm = mean (x); + ym = mean (y); - X = X - ones (xr, 1) * Xm; - Y = Y - ones (yr, 1) * Ym; + x = x - ones (xr, 1) * xm; + y = y - ones (yr, 1) * ym; - W = (X' * X + Y' * Y) / (xr + yr - 2); + w = (x' * x + y' * y) / (xr + yr - 2); - Winv = inv (W); + winv = inv (w); - retval = (Xm - Ym) * Winv * (Xm - Ym)'; + retval = (xm - ym) * winv * (xm - ym)'; endfunction +%% Test input validation %!error mahalanobis (); - %!error mahalanobis (1, 2, 3); - - +%!error mahalanobis ([true], [true]); +%!error mahalanobis ([1, 2], [true, true]); +%!error mahalanobis (ones (2,2,2)); +%!error mahalanobis (ones (2,2), ones (2,2,2)); +%!error mahalanobis (ones (2,2), ones (2,3)); diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/mean.m --- a/scripts/statistics/base/mean.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/mean.m Mon Jan 03 21:23:08 2011 -0800 @@ -22,7 +22,7 @@ ## @deftypefnx {Function File} {} mean (@var{x}, @var{dim}) ## @deftypefnx {Function File} {} mean (@var{x}, @var{opt}) ## @deftypefnx {Function File} {} mean (@var{x}, @var{dim}, @var{opt}) -## If @var{x} is a vector, compute the mean of the elements of @var{x} +## Compute the mean of the elements of the vector @var{x}. ## @tex ## $$ {\rm mean}(x) = \bar{x} = {1\over N} \sum_{i=1}^N x_i $$ ## @end tex @@ -50,8 +50,7 @@ ## Compute the harmonic mean. ## @end table ## -## If the optional argument @var{dim} is supplied, work along dimension -## @var{dim}. +## If the optional argument @var{dim} is given, operate along this dimension. ## ## Both @var{dim} and @var{opt} are optional. If both are supplied, ## either may appear first. @@ -63,6 +62,14 @@ function y = mean (x, opt1, opt2) + if (nargin < 1 || nargin > 3) + print_usage (); + endif + + if (!isnumeric (x)) + error ("mean: X must be a numeric vector or matrix"); + endif + need_dim = 0; if (nargin == 1) @@ -84,26 +91,31 @@ opt = opt2; dim = opt1; else - error ("mean: expecting opt to be a string"); + error ("mean: OPT must be a string"); endif else print_usage (); endif + nd = ndims (x); + sz = size (x); if (need_dim) - t = find (size (x) != 1); - if (isempty (t)) + ## Find the first non-singleton dimension. + dim = find (sz > 1, 1); + if (isempty (dim)) dim = 1; - else - dim = t(1); endif endif - if (dim > ndims (x)) + if (!(isscalar (dim) && dim == fix (dim)) + || !(1 <= dim && dim <= nd)) + error ("mean: DIM must be an integer and a valid dimension"); + endif + + if (dim > nd) n = 1; - else - sz = size (x); - n = sz (dim); + else + n = sz(dim); endif if (strcmp (opt, "a")) @@ -122,9 +134,22 @@ %! x = -10:10; %! y = x'; %! z = [y, y+10]; -%! assert(mean (x) == 0 && mean (y) == 0 && mean (z) == [0, 10]); +%! assert(mean (x) == 0); +%! assert(mean (y) == 0); +%! assert(mean (z) == [0, 10]); +%!assert(mean ([2 8], 'g'), 4); +%!assert(mean ([4 4 2], 'h'), 3); + +%% Test input validation %!error mean (); - +%!error mean (1, 2, 3, 4); +%!error mean ({1:5}); +%!error mean (true(1, 5)); %!error mean (1, 2, 3); +%!error mean (1, ones(2,2)); +%!error mean (1, 1.5); +%!error mean (1, 0); +%!error mean (1, 3); +%!error mean (1, 'b'); diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/meansq.m --- a/scripts/statistics/base/meansq.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/meansq.m Mon Jan 03 21:23:08 2011 -0800 @@ -21,10 +21,27 @@ ## -*- texinfo -*- ## @deftypefn {Function File} {} meansq (@var{x}) ## @deftypefnx {Function File} {} meansq (@var{x}, @var{dim}) -## For vector arguments, return the mean square of the values. +## Compute the mean square of the elements of the vector @var{x}. +## @tex +## $$ +## {\rm meansq} (x) = {\sum_{i=1}^N {x_i}^2 \over N} +## $$ +## where $\bar{x}$ is the mean value of $x$. +## @end tex +## @ifnottex +## +## @example +## @group +## std (x) = 1/N SUM_i x(i)^2 +## @end group +## @end example +## +## @end ifnottex ## For matrix arguments, return a row vector containing the mean square -## of each column. With the optional @var{dim} argument, returns the -## mean squared of the values along this dimension. +## of each column. +## +## If the optional argument @var{dim} is given, operate along this dimension. +## @seealso{var,std,moment} ## @end deftypefn ## Author: KH @@ -36,15 +53,39 @@ print_usage (); endif + if (!isnumeric (x)) + error ("mean: X must be a numeric vector or matrix"); + endif + + nd = ndims (x); + sz = size (x); if (nargin < 2) - t = find (size (x) != 1); - if (isempty (t)) + ## Find the first non-singleton dimension. + dim = find (sz > 1, 1); + if (isempty (dim)) dim = 1; - else - dim = t(1); endif endif + if (!(isscalar (dim) && dim == fix (dim)) + || !(1 <= dim && dim <= nd)) + error ("mean: DIM must be an integer and a valid dimension"); + endif + y = sumsq (x, dim) / size (x, dim); endfunction + + +%!assert(meansq (1:5), 11) +%!assert(meansq (magic (4)), [94.5, 92.5, 92.5, 94.5]) + +%% Test input validation +%!error meansq () +%!error meansq (1, 2, 3) +%!error kurtosis ([true true]) +%!error meansq (1, ones(2,2)) +%!error meansq (1, 1.5) +%!error meansq (1, 0) +%!error meansq (1, 3) + diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/median.m --- a/scripts/statistics/base/median.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/median.m Mon Jan 03 21:23:08 2011 -0800 @@ -21,8 +21,8 @@ ## -*- texinfo -*- ## @deftypefn {Function File} {} median (@var{x}) ## @deftypefnx {Function File} {} median (@var{x}, @var{dim}) -## If @var{x} is a vector, compute the median value of the elements of -## @var{x}. If the elements of @var{x} are sorted, the median is defined +## Compute the median value of the elements of the vector @var{x}. +## If the elements of @var{x} are sorted, the median is defined ## as ## @tex ## $$ @@ -50,25 +50,41 @@ ## Author: jwe -function retval = median (a, dim) +function retval = median (x, dim) if (nargin != 1 && nargin != 2) print_usage (); endif - if (nargin < 2) - [~, dim] = max (size (a) != 1); # First non-singleton dim. + + if (!isnumeric (x)) + error ("median: X must be a numeric vector or matrix"); endif - if (numel (a) > 0) - n = size (a, dim); + nd = ndims (x); + sz = size (x); + if (nargin != 2) + ## Find the first non-singleton dimension. + dim = find (sz > 1, 1); + if (isempty (dim)) + dim = 1; + endif + else + if (!(isscalar (dim) && dim == fix (dim)) + || !(1 <= dim && dim <= nd)) + error ("median: DIM must be an integer and a valid dimension"); + endif + endif + + if (numel (x) > 0) + n = size (x, dim); k = floor ((n+1) / 2); if (mod (n, 2) == 1) - retval = nth_element (a, k, dim); + retval = nth_element (x, k, dim); else - retval = mean (nth_element (a, k:k+1, dim), dim); + retval = mean (nth_element (x, k:k+1, dim), dim); endif ## Inject NaNs where needed, to be consistent with Matlab. - retval(any (isnan (a), dim)) = NaN; + retval(any (isnan (x), dim)) = NaN; else error ("median: invalid matrix argument"); endif @@ -81,13 +97,19 @@ %! y = [1, 2, 3, 4, 5, 6, 7]; %! y2 = y'; %! -%! assert((median (x) == median (x2) && median (x) == 3.5 -%! && median (y) == median (y2) && median (y) == 4 -%! && median ([x2, 2*x2]) == [3.5, 7] -%! && median ([y2, 3*y2]) == [4, 12])); +%! assert(median (x) == median (x2) && median (x) == 3.5); +%! assert(median (y) == median (y2) && median (y) == 4); +%! assert(median ([x2, 2*x2]) == [3.5, 7]); +%! assert(median ([y2, 3*y2]) == [4, 12]); -%!assert (median ([1, 2, 3, NaN]), NaN) +%!assert(median ([1,2,NaN;4,5,6;NaN,8,9]), [NaN, 5, NaN]); +%% Test input validation %!error median (); %!error median (1, 2, 3); +%!error median ({1:5}); +%!error median (true(1,5)); +%!error median (1, ones(2,2)); +%!error median (1, 1.5); +%!error median (1, 0); diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/mode.m --- a/scripts/statistics/base/mode.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/mode.m Mon Jan 03 21:23:08 2011 -0800 @@ -25,8 +25,7 @@ ## dimension and returns the value with the highest frequency. If two, or ## more, values have the same frequency @code{mode} returns the smallest. ## -## If the optional argument @var{dim} is supplied, work along dimension -## @var{dim}. +## If the optional argument @var{dim} is given, operate along this dimension. ## ## The return variable @var{f} is the number of occurrences of the mode in ## in the dataset. The cell array @var{c} contains all of the elements @@ -40,8 +39,8 @@ print_usage (); endif - if (!ismatrix(x) || ischar(x)) - error ("mode: X must be a numeric matrix or vector"); + if (!isnumeric(x)) + error ("mode: X must be a numeric vector or matrix"); endif nd = ndims (x); @@ -60,9 +59,9 @@ endif sz2 = sz; - sz2 (dim) = 1; + sz2(dim) = 1; sz3 = ones (1, nd); - sz3 (dim) = sz (dim); + sz3(dim) = sz(dim); if (issparse (x)) t2 = sparse (sz(1), sz(2)); @@ -131,25 +130,36 @@ %! x(:,:,3) = circshift (toeplitz (1:3), 2); %!test %! [m, f, c] = mode (x, 1); -%! assert (reshape (m, [3, 3]), [1 1 1; 2 2 2; 1 1 1]) -%! assert (reshape (f, [3, 3]), [1 1 1; 2 2 2; 1 1 1]) +%! assert (reshape (m, [3, 3]), [1 1 1; 2 2 2; 1 1 1]); +%! assert (reshape (f, [3, 3]), [1 1 1; 2 2 2; 1 1 1]); %! c = reshape (c, [3, 3]); -%! assert (c{1}, [1; 2; 3]) -%! assert (c{2}, 2) -%! assert (c{3}, [1; 2; 3]) +%! assert (c{1}, [1; 2; 3]); +%! assert (c{2}, 2); +%! assert (c{3}, [1; 2; 3]); %!test %! [m, f, c] = mode (x, 2); -%! assert (reshape (m, [3, 3]), [1 1 2; 2 1 1; 1 2 1]) -%! assert (reshape (f, [3, 3]), [1 1 2; 2 1 1; 1 2 1]) +%! assert (reshape (m, [3, 3]), [1 1 2; 2 1 1; 1 2 1]); +%! assert (reshape (f, [3, 3]), [1 1 2; 2 1 1; 1 2 1]); %! c = reshape (c, [3, 3]); -%! assert (c{1}, [1; 2; 3]) -%! assert (c{2}, 2) -%! assert (c{3}, [1; 2; 3]) +%! assert (c{1}, [1; 2; 3]); +%! assert (c{2}, 2); +%! assert (c{3}, [1; 2; 3]); %!test %! [m, f, c] = mode (x, 3); -%! assert (reshape (m, [3, 3]), [1 2 1; 1 2 1; 1 2 1]) -%! assert (reshape (f, [3, 3]), [1 2 1; 1 2 1; 1 2 1]) +%! assert (reshape (m, [3, 3]), [1 2 1; 1 2 1; 1 2 1]); +%! assert (reshape (f, [3, 3]), [1 2 1; 1 2 1; 1 2 1]); %! c = reshape (c, [3, 3]); -%! assert (c{1}, [1; 2; 3]) -%! assert (c{2}, [1; 2; 3]) -%! assert (c{3}, [1; 2; 3]) +%! assert (c{1}, [1; 2; 3]); +%! assert (c{2}, [1; 2; 3]); +%! assert (c{3}, [1; 2; 3]); + +%% Test input validation +%!error mode () +%!error mode (1, 2, 3) +%!error mode ({1 2 3}) +%!error mode (true(1,3)) +%!error mode (1, ones(2,2)) +%!error mode (1, 1.5) +%!error mode (1, 0) +%!error mode (1, 3) + diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/moment.m --- a/scripts/statistics/base/moment.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/moment.m Mon Jan 03 21:23:08 2011 -0800 @@ -18,25 +18,89 @@ ## . ## -*- texinfo -*- -## @deftypefn {Function File} {} moment (@var{x}, @var{p}, @var{opt}, @var{dim}) -## If @var{x} is a vector, compute the @var{p}-th moment of @var{x}. +## @deftypefn {Function File} {} moment (@var{x}, @var{p}) +## @deftypefnx {Function File} {} moment (@var{x}, @var{p}, @var{type}) +## @deftypefnx {Function File} {} moment (@var{x}, @var{p}, @var{dim}) +## @deftypefnx {Function File} {} moment (@var{x}, @var{p}, @var{type}, @var{dim}) +## @deftypefnx {Function File} {} moment (@var{x}, @var{p}, @var{dim}, @var{type}) +## Compute the @var{p}-th moment of the vector @var{x} about zero. +## @tex +## $$ +## {\rm moment} (x) = { \sum_{i=1}^N {x_i}^p \over N } +## $$ +## @end tex +## @ifnottex +## +## @example +## @group +## moment (x) = 1/N SUM_i x(i)^p +## @end group +## @end example +## +## @end ifnottex ## ## If @var{x} is a matrix, return the row vector containing the ## @var{p}-th moment of each column. ## -## With the optional string opt, the kind of moment to be computed can -## be specified. If opt contains @code{"c"} or @code{"a"}, central -## and/or absolute moments are returned. For example, +## The optional string @var{type} specifies the type of moment to be computed. +## Valid options are: +## @table @code +## @item "c" +## Central Moment. The moment about the mean defined as +## @tex +## $$ +## {\sum_{i=1}^N (x_i - \bar{x})^p \over N} +## $$ +## @end tex +## @ifnottex +## +## @example +## @group +## 1/N SUM_i (x(i) - mean(x))^p +## @end group +## @end example +## +## @end ifnottex +## +## @item "a" +## Absolute Moment. The moment about zero ignoring sign defined as +## @tex +## $$ +## {\sum_{i=1}^N {\left| x_i \right|}^p \over N} +## $$ +## @end tex +## @ifnottex ## ## @example -## moment (x, 3, "ac") +## @group +## 1/N SUM_i ( abs(x(i)) )^p +## @end group ## @end example ## -## @noindent -## computes the third central absolute moment of @var{x}. +## @end ifnottex +## +## @item "ac" +## Absolute Central Moment. Defined as +## @tex +## $$ +## {\sum_{i=1}^N {\left| x_i - \bar{x} \right|}^p \over N} +## $$ +## @end tex +## @ifnottex ## -## If the optional argument @var{dim} is supplied, work along dimension -## @var{dim}. +## @example +## @group +## 1/N SUM_i ( abs(x(i) - mean(x)) )^p +## @end group +## @end example +## +## @end ifnottex +## @end table +## +## If the optional argument @var{dim} is given, operate along this dimension. +## +## If both @var{type} and @var{dim} are given they may appear in any order. +## @seealso{var,skewness,kurtosis} ## @end deftypefn ## Can easily be made to work for continuous distributions (using quad) @@ -51,58 +115,80 @@ print_usage (); endif + if (!isnumeric(x) || isempty(x) ) + error ("moment: X must be a non-empty numeric matrix or vector"); + endif + + if (!(isnumeric(p) && isscalar(p))) + error ("moment: P must be a numeric scalar"); + endif + need_dim = 0; if (nargin == 2) - opt = ""; + type = ""; need_dim = 1; elseif (nargin == 3) if (ischar (opt1)) - opt = opt1; + type = opt1; need_dim = 1; else dim = opt1; - opt = ""; + type = ""; endif elseif (nargin == 4) if (ischar (opt1)) - opt = opt1; + type = opt1; dim = opt2; elseif (ischar (opt2)) - opt = opt2; + type = opt2; dim = opt1; else - error ("moment: expecting opt to be a string"); - endif - else - print_usage (); - endif - - if (need_dim) - t = find (size (x) != 1); - if (isempty (t)) - dim = 1; - else - dim = t(1); + error ("moment: expecting TYPE to be a string"); endif endif + nd = ndims (x); sz = size (x); - n = sz (dim); - - if (numel (x) < 1) - error ("moment: x must not be empty"); + if (need_dim) + ## Find the first non-singleton dimension. + dim = find (sz > 1, 1); + if (isempty (dim)) + dim = 1; + endif + else + if (!(isscalar (dim) && dim == fix (dim)) || + !(1 <= dim && dim <= nd)) + error ("moment: DIM must be an integer and a valid dimension"); + endif endif - if any (opt == "c") + n = sz(dim); + + if any (type == "c") rng = ones (1, length (sz)); rng(dim) = sz(dim); x = x - repmat (sum (x, dim), rng) / n; endif - if any (opt == "a") + if any (type == "a") x = abs (x); endif m = sum (x .^ p, dim) / n; endfunction + + +%% Test input validation +%!error moment () +%!error moment (1) +%!error moment (1, 2, 3, 4, 5) +%!error moment ([true true], 2) +%!error moment (ones(2,0,3), 2) +%!error moment (1, true) +%!error moment (1, ones(2,2)) +%!error moment (1, 2, 3, 4) +%!error moment (1, 2, ones(2,2)) +%!error moment (1, 2, 1.5) +%!error moment (1, 2, 4) + diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/ols.m --- a/scripts/statistics/base/ols.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/ols.m Mon Jan 03 21:23:08 2011 -0800 @@ -48,9 +48,17 @@ ## ## @table @var ## @item beta -## The OLS estimator for @var{b}, @code{@var{beta} = pinv (@var{x}) * -## @var{y}}, where @code{pinv (@var{x})} denotes the pseudoinverse of -## @var{x}. +## The OLS estimator for @math{b}. +## @tex +## $beta$ is calculated directly via $(x^Tx)^{-1} x^T y$ if the matrix $x^Tx$ is +## of full rank. +## @end tex +## @ifnottex +## @var{beta} is calculated directly via @code{inv (x'*x) * x' * y} if the +## matrix @code{x'*x} is of full rank. +## @end ifnottex +## Otherwise, @code{@var{beta} = pinv (@var{x}) * @var{y}} where +## @code{pinv (@var{x})} denotes the pseudoinverse of @var{x}. ## ## @item sigma ## The OLS estimator for the matrix @var{s}, @@ -66,34 +74,63 @@ ## @item r ## The matrix of OLS residuals, @code{@var{r} = @var{y} - @var{x}*@var{beta}}. ## @end table +## @seealso{gls, pinv} ## @end deftypefn ## Author: Teresa Twaroch ## Created: May 1993 ## Adapted-By: jwe -function [BETA, SIGMA, R] = ols (Y, X) +function [beta, sigma, r] = ols (y, x) if (nargin != 2) print_usage (); endif - [nr, nc] = size (X); - [ry, cy] = size (Y); - if (nr != ry) - error ("ols: incorrect matrix dimensions"); + if (! (isnumeric (x) && isnumeric (y))) + error ("ols: X and Y must be numeric matrices or vectors"); + endif + + if (ndims (x) != 2 || ndims (y) != 2) + error ("ols: X and Y must be 2-D matrices or vectors"); endif - Z = X' * X; - r = rank (Z); + [nr, nc] = size (x); + [ry, cy] = size (y); + if (nr != ry) + error ("ols: number of rows of X and Y must be equal"); + endif + + z = x' * x; + r = rank (z); if (r == nc) - BETA = inv (Z) * X' * Y; + beta = inv (z) * x' * y; else - BETA = pinv (X) * Y; + beta = pinv (x) * y; + endif + + if (isargout (2) || isargout (3)) + r = y - x * beta; + endif + if (isargout (2)) + sigma = r' * r / (nr - r); endif - R = Y - X * BETA; - SIGMA = R' * R / (nr - r); +endfunction + +%!test +%! x = [1:5]'; +%! y = 3*x + 2; +%! x = [x, ones(5,1)]; +%! assert (ols(y,x), [3; 2], 50*eps) -endfunction +%% Test input validation +%!error ols (); +%!error ols (1); +%!error ols (1, 2, 3); +%!error ols ([true, true], [1, 2]); +%!error ols ([1, 2], [true, true]); +%!error ols (ones (2,2,2), ones (2,2)); +%!error ols (ones (2,2), ones (2,2,2)); +%!error ols (ones(1,2), ones(2,2)); diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/ppplot.m --- a/scripts/statistics/base/ppplot.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/ppplot.m Mon Jan 03 21:23:08 2011 -0800 @@ -54,7 +54,7 @@ endif if (! isvector (x)) - error ("ppplot: x must be a vector"); + error ("ppplot: X must be a vector"); endif s = sort (x); @@ -63,7 +63,7 @@ if (nargin == 1) F = @stdnormal_cdf; else - F = str2func (sprintf ("%s_cdf", dist)); + F = str2func (sprintf ("%scdf", dist)); endif; if (nargin <= 2) y = feval (F, s); @@ -77,3 +77,8 @@ endif endfunction + +%% Test input validation +%!error ppplot (); +%!error ppplot (ones(2,2)); + diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/prctile.m --- a/scripts/statistics/base/prctile.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/prctile.m Mon Jan 03 21:23:08 2011 -0800 @@ -20,8 +20,8 @@ ## @deftypefn {Function File} {@var{y} =} prctile (@var{x}, @var{p}) ## @deftypefnx {Function File} {@var{q} =} prctile (@var{x}, @var{p}, @var{dim}) ## For a sample @var{x}, compute the quantiles, @var{y}, corresponding -## to the cumulative probability values, P, in percent. All non-numeric -## values (NaNs) of X are ignored. +## to the cumulative probability values, @var{p}, in percent. All non-numeric +## values (NaNs) of @var{x} are ignored. ## ## If @var{x} is a matrix, compute the percentiles for each column and ## return them in a matrix, such that the i-th row of @var{y} contains the @@ -29,10 +29,10 @@ ## ## The optional argument @var{dim} determines the dimension along which ## the percentiles are calculated. If @var{dim} is omitted, and @var{x} is -## a vector or matrix, it defaults to 1 (column wise quantiles). In the -## instance that @var{x} is a N-d array, @var{dim} defaults to the first -## dimension whose size greater than unity. -## +## a vector or matrix, it defaults to 1 (column-wise quantiles). When +## @var{x} is an N-d array, @var{dim} defaults to the first non-singleton +## dimension. +## @seealso{quantile} ## @end deftypefn ## Author: Ben Abbott @@ -44,30 +44,42 @@ print_usage (); endif - if (nargin < 2) - p = 100*[0.00 0.25, 0.50, 0.75, 1.00]; + if (!isnumeric(x)) + error ("prctile: X must be a numeric vector or matrix"); + endif + if (!isnumeric(p)) + error ("prctile: P must be a numeric vector"); endif - if (nargin < 3) - if (ndims (x) == 2) + if (nargin == 1) + p = [0, 25, 50, 75, 100]; + endif + + nd = ndims (x); + if (nargin == 2) + if (nd == 2) ## If a matrix or vector, use the 1st dimension. dim = 1; else - ## If an N-d array, use the firt dimension with a length > 1. - dim = find (size(v) != 1); + ## If an N-d array, find the first non-singleton dimension. + dim = find (size(v) > 1, 1); if (isempty (dim)) dim = 1; endif endif + else + if (!(isscalar (dim) && dim == fix (dim)) || + !(1 <= dim && dim <= nd)) + error ("prctile: DIM must be an integer and a valid dimension"); + endif endif - ## Convert from percent. - p = 0.01 * p; + ## Convert from percent to decimal. + p = p / 100; ## The 5th method is compatible with Matlab. method = 5; - ## Call the quantile function q = quantile (x, p, dim, method); endfunction @@ -154,3 +166,11 @@ %! qa = [0.1270; 0.2041; 0.6437; 0.6477; 0.9322]; %! assert (q, qa, tol); +%% Test input validation +%!error prctile () +%!error prctile (1, 2, 3, 4) +%!error prctile ([true, false], 10) +%!error prctile (1:10, [true, false]) +%!error prctile (1, 1, 1.5) +%!error prctile (1, 1, 0) +%!error prctile (1, 1, 3) diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/qqplot.m --- a/scripts/statistics/base/qqplot.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/qqplot.m Mon Jan 03 21:23:08 2011 -0800 @@ -18,7 +18,9 @@ ## . ## -*- texinfo -*- -## @deftypefn {Function File} {[@var{q}, @var{s}] =} qqplot (@var{x}, @var{dist}, @var{params}) +## @deftypefn {Function File} {[@var{q}, @var{s}] =} qqplot (@var{x}) +## @deftypefnx {Function File} {[@var{q}, @var{s}] =} qqplot (@var{x}, @var{dist}) +## @deftypefnx {Function File} {[@var{q}, @var{s}] =} qqplot (@var{x}, @var{dist}, @var{params}) ## Perform a QQ-plot (quantile plot). ## ## If F is the CDF of the distribution @var{dist} with parameters @@ -27,7 +29,7 @@ ## largest element of x versus abscissa @var{q}(@var{i}f) = G((@var{i} - ## 0.5)/@var{n}). ## -## If the sample comes from F except for a transformation of location +## If the sample comes from F, except for a transformation of location ## and scale, the pairs will approximately follow a straight line. ## ## The default for @var{dist} is the standard normal distribution. The @@ -40,8 +42,9 @@ ## @end example ## ## @noindent -## @var{dist} can be any string for which a function @var{dist_inv} -## that calculates the inverse CDF of distribution @var{dist} exists. +## @var{dist} can be any string for which a function @var{distinv} or +## @var{dist_inv} exists that calculates the inverse CDF of distribution +## @var{dist}. ## ## If no output arguments are given, the data are plotted directly. ## @end deftypefn @@ -55,18 +58,24 @@ print_usage (); endif - if (! (isvector(x))) - error ("qqplot: x must be a vector"); + if (!(isnumeric (x) && isvector(x))) + error ("qqplot: X must be a numeric vector"); endif + if (nargin == 1) + f = @stdnormal_inv; + else + if ( exist (invname = sprintf ("%sinv", dist)) + || exist (invname = sprintf ("%s_inv", dist))) + f = str2func (invname); + else + error ("qqplot: no inverse CDF found for distribution DIST"); + endif + endif; + s = sort (x); n = length (x); t = ((1 : n)' - .5) / n; - if (nargin == 1) - f = @stdnormal_inv; - else - f = str2func (sprintf ("%s_inv", dist)); - endif; if (nargin <= 2) q = feval (f, t); q_label = func2str (f); @@ -77,8 +86,8 @@ else tmp = ""; endif - q_label = sprintf ("%s with parameter(s) %g%s", func2str (f), - varargin{1}, tmp); + q_label = sprintf ("%s with parameter(s) %g%s", + func2str (f), varargin{1}, tmp); endif if (nargout == 0) diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/quantile.m --- a/scripts/statistics/base/quantile.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/quantile.m Mon Jan 03 21:23:08 2011 -0800 @@ -22,18 +22,17 @@ ## @deftypefnx {Function File} {@var{q} =} quantile (@var{x}, @var{p}, @var{dim}, @var{method}) ## For a sample, @var{x}, calculate the quantiles, @var{q}, corresponding to ## the cumulative probability values in @var{p}. All non-numeric values (NaNs) -## of -## @var{x} are ignored. +## of @var{x} are ignored. ## ## If @var{x} is a matrix, compute the quantiles for each column and ## return them in a matrix, such that the i-th row of @var{q} contains ## the @var{p}(i)th quantiles of each column of @var{x}. ## ## The optional argument @var{dim} determines the dimension along which -## the percentiles are calculated. If @var{dim} is omitted, and @var{x} is -## a vector or matrix, it defaults to 1 (column wise quantiles). In the -## instance that @var{x} is a N-d array, @var{dim} defaults to the first -## dimension whose size greater than unity. +## the quantiles are calculated. If @var{dim} is omitted, and @var{x} is +## a vector or matrix, it defaults to 1 (column-wise quantiles). If +## @var{x} is an N-d array, @var{dim} defaults to the first non-singleton +## dimension. ## ## The methods available to calculate sample quantiles are the nine methods ## used by R (http://www.r-project.org/). The default value is METHOD = 5. @@ -55,21 +54,21 @@ ## @item Method 4: p(k) = k / n. That is, linear interpolation of the ## empirical cdf. ## -## @item Method 5: p(k) = (k - 0.5) / n. That is a piecewise linear -## function where the knots are the values midway through the steps of -## the empirical cdf. +## @item Method 5: p(k) = (k - 0.5) / n. That is a piecewise linear function +## where the knots are the values midway through the steps of the empirical +## cdf. ## ## @item Method 6: p(k) = k / (n + 1). ## ## @item Method 7: p(k) = (k - 1) / (n - 1). ## -## @item Method 8: p(k) = (k - 1/3) / (n + 1/3). The resulting quantile -## estimates are approximately median-unbiased regardless of the -## distribution of @var{x}. +## @item Method 8: p(k) = (k - 1/3) / (n + 1/3). The resulting quantile +## estimates are approximately median-unbiased regardless of the distribution +## of @var{x}. ## -## @item Method 9: p(k) = (k - 3/8) / (n + 1/4). The resulting quantile -## estimates are approximately unbiased for the expected order -## statistics if @var{x} is normally distributed. +## @item Method 9: p(k) = (k - 3/8) / (n + 1/4). The resulting quantile +## estimates are approximately unbiased for the expected order statistics if +## @var{x} is normally distributed. ## @end enumerate ## ## Hyndman and Fan (1996) recommend method 8. Maxima, S, and R @@ -88,12 +87,13 @@ ## @item R: A Language and Environment for Statistical Computing; ## @url{http://cran.r-project.org/doc/manuals/fullrefman.pdf}. ## @end itemize +## @seealso{prctile} ## @end deftypefn ## Author: Ben Abbott ## Description: Matlab style quantile function of a discrete/continuous distribution -function q = quantile (x, p, dim, method) +function q = quantile (x, p, dim = 1, method = 5) if (nargin < 1 || nargin > 4) print_usage (); @@ -103,16 +103,9 @@ p = [0.00 0.25, 0.50, 0.75, 1.00]; endif - if (nargin < 3) - dim = 1; - endif - - if (nargin < 4) - method = 5; - endif - - if (dim > ndims(x)) - error ("quantile: invalid dimension"); + if (!(isscalar (dim) && dim == fix (dim)) || + !(1 <= dim && dim <= ndims (x))) + error ("quantile: DIM must be an integer and a valid dimension"); endif ## Set the permutation vector. @@ -276,6 +269,13 @@ %! yexp = median (x, dim); %! assert (yobs, yexp); +%% Test input validation +%!error quantile () +%!error quantile (1, 2, 3, 4, 5) +%!error quantile (1, 1, 1.5) +%!error quantile (1, 1, 0) +%!error quantile (1, 1, 3) + ## For the cumulative probability values in @var{p}, compute the ## quantiles, @var{q} (the inverse of the cdf), for the sample, @var{x}. ## @@ -286,7 +286,7 @@ ## Author: Ben Abbott ## Vectorized version: Jaroslav Hajek -## Description: Quantile function of a empirical samples +## Description: Quantile function of empirical samples function inv = __quantile__ (x, p, method = 5) @@ -294,8 +294,8 @@ print_usage (); endif - if (! ismatrix (x)) - error ("quantile: x must be a matrix"); + if (!isnumeric (x)) + error ("quantile: X must be a numeric vector or matrix"); endif ## Save length and set shape of quantiles. @@ -389,3 +389,4 @@ endif endfunction + diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/range.m --- a/scripts/statistics/base/range.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/range.m Mon Jan 03 21:23:08 2011 -0800 @@ -21,13 +21,16 @@ ## -*- texinfo -*- ## @deftypefn {Function File} {} range (@var{x}) ## @deftypefnx {Function File} {} range (@var{x}, @var{dim}) -## If @var{x} is a vector, return the range, i.e., the difference -## between the maximum and the minimum, of the input data. +## Return the range, i.e., the difference between the maximum and the minimum +## of the input data. If @var{x} is a vector, the range is calculated over +## the elements of @var{x}. If @var{x} is a matrix, the range is calculated +## over each column of @var{x}. ## -## If @var{x} is a matrix, do the above for each column of @var{x}. +## If the optional argument @var{dim} is given, operate along this dimension. ## -## If the optional argument @var{dim} is supplied, work along dimension -## @var{dim}. +## The range is a quickly computed measure of the dispersion of a data set, but +## is less accurate than @code{iqr} if there are outlying data points. +## @seealso{iqr, std} ## @end deftypefn ## Author: KH @@ -44,3 +47,12 @@ endif endfunction + +%!assert(range (1:10), 9) +%!assert(range (magic (3)), [5, 8, 5]) +%!assert(range (magic (3), 2), [7; 4; 7]) +%!assert(range (2), 0) + +%% Test input validation +%!error range () +%!error range (1, 2, 3) diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/ranks.m --- a/scripts/statistics/base/ranks.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/ranks.m Mon Jan 03 21:23:08 2011 -0800 @@ -22,6 +22,7 @@ ## Return the ranks of @var{x} along the first non-singleton dimension ## adjusted for ties. If the optional argument @var{dim} is ## given, operate along this dimension. +## @seealso{spearman,kendall} ## @end deftypefn ## Author: KH @@ -37,8 +38,8 @@ print_usage (); endif - if (!ismatrix(x) || ischar(x)) - error ("ranks: X must be a numeric matrix or vector"); + if (!isnumeric(x)) + error ("ranks: X must be a numeric vector or matrix"); endif nd = ndims (x); @@ -50,7 +51,7 @@ dim = 1; endif else - if (!(isscalar (dim) && dim == round (dim)) + if (!(isscalar (dim) && dim == fix (dim)) || !(1 <= dim && dim <= nd)) error ("ranks: DIM must be an integer and a valid dimension"); endif @@ -87,3 +88,21 @@ endif endfunction + + +%!assert(ranks (1:2:10), 1:5) +%!assert(ranks (10:-2:1), 5:-1:1) +%!assert(ranks ([2, 1, 2, 4]), [2.5, 1, 2.5, 4]) +%!assert(ranks (ones(1, 5)), 3*ones(1, 5)) +%!assert(ranks (1e6*ones(1, 5)), 3*ones(1, 5)) +%!assert(ranks (rand (1, 5), 1), ones(1, 5)) + +%% Test input validation +%!error ranks () +%!error ranks (1, 2, 3) +%!error ranks ({1, 2}) +%!error ranks (true(2,1)) +%!error ranks (1, 1.5) +%!error ranks (1, 0) +%!error ranks (1, 3) + diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/run_count.m --- a/scripts/statistics/base/run_count.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/run_count.m Mon Jan 03 21:23:08 2011 -0800 @@ -18,10 +18,13 @@ ## . ## -*- texinfo -*- -## @deftypefn {Function File} {} run_count (@var{x}, @var{n}) +## @deftypefn {Function File} {} run_count (@var{x}, @var{n}) +## @deftypefnx {Function File} {} run_count (@var{x}, @var{n}, @var{dim}) ## Count the upward runs along the first non-singleton dimension of ## @var{x} of length 1, 2, @dots{}, @var{n}-1 and greater than or equal -## to @var{n}. If the optional argument @var{dim} is given then operate +## to @var{n}. +## +## If the optional argument @var{dim} is given then operate ## along this dimension. ## @end deftypefn @@ -34,11 +37,11 @@ print_usage (); endif - if (!ismatrix(x) || ischar(x)) - error ("run_count: X must be a numeric matrix or vector"); + if (!isnumeric(x)) + error ("run_count: X must be a numeric vector or matrix"); endif - if (!(isscalar (n) && n == round (n)) || n < 1) + if (!(isscalar (n) && n == fix (n) && n > 0)) error ("run_count: N must be a positive integer"); endif @@ -51,7 +54,7 @@ dim = 1; endif else - if (!(isscalar (dim) && dim == round (dim)) + if (!(isscalar (dim) && dim == fix (dim)) || !(1 <= dim && dim <= nd)) error ("run_count: DIM must be an integer and a valid dimension"); endif @@ -91,3 +94,23 @@ endif endfunction + + +%!assert(run_count (magic(3), 4), [1,0,1;1,0,1;0,1,0;0,0,0]) +%!assert(run_count (magic(3), 4, 2), [1,0,1;1,0,1;0,1,0;0,0,0]') +%!assert(run_count (5:-1:1, 5), [5, 0, 0, 0, 0]) +%!assert(run_count (ones(3), 4), [0,0,0;0,0,0;1,1,1;0,0,0]) + +%% Test input validation +%!error run_count () +%!error run_count (1) +%!error run_count (1, 2, 3, 4) +%!error run_count ({1, 2}, 3) +%!error run_count (true(3), 3) +%!error run_count (1:5, ones(2,2)) +%!error run_count (1:5, 1.5) +%!error run_count (1:5, -2) +%!error run_count (1:5, 3, ones(2,2)) +%!error run_count (1:5, 3, 1.5) +%!error run_count (1:5, 3, 0) + diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/skewness.m --- a/scripts/statistics/base/skewness.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/skewness.m Mon Jan 03 21:23:08 2011 -0800 @@ -18,11 +18,12 @@ ## . ## -*- texinfo -*- -## @deftypefn {Function File} {} skewness (@var{x}, @var{dim}) -## If @var{x} is a vector of length @math{n}, return the skewness +## @deftypefn {Function File} {} skewness (@var{x}) +## @deftypefnx {Function File} {} skewness (@var{x}, @var{dim}) +## Compute the skewness of the elements of the vector @var{x}. ## @tex ## $$ -## {\rm skewness} (x) = {1\over N \sigma(x)^3} \sum_{i=1}^N (x_i-\bar{x})^3 +## {\rm skewness} (x) = {1\over N \sigma^3} \sum_{i=1}^N (x_i-\bar{x})^3 ## $$ ## where $\bar{x}$ is the mean value of $x$. ## @end tex @@ -35,9 +36,10 @@ ## @end ifnottex ## ## @noindent -## of @var{x}. If @var{x} is a matrix, return the skewness along the +## If @var{x} is a matrix, return the skewness along the ## first non-singleton dimension of the matrix. If the optional ## @var{dim} argument is given, operate along this dimension. +## @seealso{var,kurtosis,moment} ## @end deftypefn ## Author: KH @@ -50,8 +52,8 @@ print_usage (); endif - if (!ismatrix(x) || ischar(x)) - error ("skewness: X must be a numeric matrix or vector"); + if (!isnumeric(x)) + error ("skewness: X must be a numeric vector or matrix"); endif nd = ndims (x); @@ -71,7 +73,7 @@ c = sz(dim); idx = ones (1, nd); - idx (dim) = c; + idx(dim) = c; x = x - repmat (mean (x, dim), idx); sz(dim) = 1; retval = zeros (sz, class (x)); @@ -82,7 +84,20 @@ endfunction -%!error skewness (); +%!assert(skewness ([-1,0,1]), 0); +%!assert(skewness ([-2,0,1]) < 0); +%!assert(skewness ([-1,0,2]) > 0); +%!assert(skewness ([-3,0,1]) == -1*skewness([-1,0,3])); +%!test +%! x = [0; 0; 0; 1]; +%! y = [x, 2*x]; +%! assert(all (abs (skewness (y) - [0.75, 0.75]) < sqrt (eps))); -%!error skewness (1, 2, 3); - +%% Test input validation +%!error skewness () +%!error skewness (1, 2, 3) +%!error skewness ([true true]) +%!error skewness (1, ones(2,2)) +%!error skewness (1, 1.5) +%!error skewness (1, 0) +%!error skewness (1, 3) diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/spearman.m --- a/scripts/statistics/base/spearman.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/spearman.m Mon Jan 03 21:23:08 2011 -0800 @@ -18,48 +18,64 @@ ## . ## -*- texinfo -*- -## @deftypefn {Function File} {} spearman (@var{x}, @var{y}) -## Compute Spearman's rank correlation coefficient @var{rho} for each of -## the variables specified by the input arguments. -## -## For matrices, each row is an observation and each column a variable; -## vectors are always observations and may be row or column vectors. -## -## @code{spearman (@var{x})} is equivalent to @code{spearman (@var{x}, -## @var{x})}. +## @deftypefn {Function File} {} spearman (@var{x}) +## @deftypefnx {Function File} {} spearman (@var{x}, @var{y}) +## @cindex Spearman's Rho +## Compute Spearman's rank correlation coefficient @var{rho}. ## ## For two data vectors @var{x} and @var{y}, Spearman's @var{rho} is the -## correlation of the ranks of @var{x} and @var{y}. +## correlation coefficient of the ranks of @var{x} and @var{y}. ## ## If @var{x} and @var{y} are drawn from independent distributions, ## @var{rho} has zero mean and variance @code{1 / (n - 1)}, and is ## asymptotically normally distributed. +## +## @code{spearman (@var{x})} is equivalent to @code{spearman (@var{x}, +## @var{x})}. +## @seealso{ranks, kendall} ## @end deftypefn ## Author: KH ## Description: Spearman's rank correlation rho -function rho = spearman (x, y) +function rho = spearman (x, y = []) if ((nargin < 1) || (nargin > 2)) print_usage (); endif + if (! (isnumeric (x) && isnumeric (y))) + error ("spearman: X and Y must be numeric matrices or vectors"); + endif + + if (ndims (x) != 2 || ndims (y) != 2) + error ("spearman: X and Y must be 2-D matrices or vectors"); + endif + if (rows (x) == 1) x = x'; endif n = rows (x); if (nargin == 1) - rho = cor (ranks (x)); + rho = corrcoef (ranks (x)); else if (rows (y) == 1) y = y'; endif if (rows (y) != n) - error ("spearman: x and y must have the same number of observations"); + error ("spearman: X and Y must have the same number of observations"); endif - rho = cor (ranks (x), ranks (y)); + rho = corrcoef (ranks (x), ranks (y)); endif endfunction + +%% Test input validation +%!error spearman (); +%!error spearman (1, 2, 3); +%!error spearman ([true, true]); +%!error spearman (ones(1,2), [true, true]); +%!error spearman (ones (2,2,2)); +%!error spearman (ones (2,2), ones (2,2,2)); +%!error spearman (ones (2,2), ones (3,2)); diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/statistics.m --- a/scripts/statistics/base/statistics.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/statistics.m Mon Jan 03 21:23:08 2011 -0800 @@ -20,30 +20,31 @@ ## -*- texinfo -*- ## @deftypefn {Function File} {} statistics (@var{x}) ## @deftypefnx {Function File} {} statistics (@var{x}, @var{dim}) -## If @var{x} is a matrix, return a matrix with the minimum, first -## quartile, median, third quartile, maximum, mean, standard deviation, -## skewness, and kurtosis of the columns of @var{x} as its columns. +## Return a vector with the minimum, first quartile, median, third quartile, +## maximum, mean, standard deviation, skewness, and kurtosis of the elements of +## the vector @var{x}. ## -## If @var{x} is a vector, calculate the statistics along the first +## If @var{x} is a matrix, calculate statistics over the first ## non-singleton dimension. -## +## If the optional argument @var{dim} is given, operate along this dimension. +## @seealso{min,max,median,mean,std,skewness,kurtosis} ## @end deftypefn ## Author: KH ## Description: Compute basic statistics -function S = statistics (X, dim) +function stats = statistics (x, dim) if (nargin != 1 && nargin != 2) print_usage (); endif - if (!ismatrix(X) || ischar(X)) - error ("statistics: X must be a numeric matrix or vector"); + if (!isnumeric(x)) + error ("statistics: X must be a numeric vector or matrix"); endif - nd = ndims (X); - sz = size (X); + nd = ndims (x); + sz = size (x); if (nargin != 2) ## Find the first non-singleton dimension. dim = find (sz > 1, 1); @@ -61,10 +62,10 @@ error ("statistics: dimension of X is too small (<2)"); endif - emp_inv = quantile (X, [0.25; 0.5; 0.75], dim, 7); + emp_inv = quantile (x, [0.25; 0.5; 0.75], dim, 7); - S = cat (dim, min (X, [], dim), emp_inv, max (X, [], dim), mean (X, dim), - std (X, [], dim), skewness (X, dim), kurtosis (X, dim)); + stats = cat (dim, min (x, [], dim), emp_inv, max (x, [], dim), mean (x, dim), + std (x, [], dim), skewness (x, dim), kurtosis (x, dim)); endfunction @@ -73,3 +74,14 @@ %! s = statistics (x); %! m = median (x); %! assert (m, s(3,:), eps); + +%% Test input validation +%!error statistics () +%!error statistics (1, 2, 3) +%!error statistics ([true true]) +%!error statistics (1, ones(2,2)) +%!error statistics (1, 1.5) +%!error statistics (1, 0) +%!error statistics (1, 3) +%!error statistics (1) + diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/std.m --- a/scripts/statistics/base/std.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/std.m Mon Jan 03 21:23:08 2011 -0800 @@ -21,73 +21,85 @@ ## @deftypefn {Function File} {} std (@var{x}) ## @deftypefnx {Function File} {} std (@var{x}, @var{opt}) ## @deftypefnx {Function File} {} std (@var{x}, @var{opt}, @var{dim}) -## If @var{x} is a vector, compute the standard deviation of the elements -## of @var{x}. +## Compute the standard deviation of the elements of the vector @var{x}. ## @tex ## $$ -## {\rm std} (x) = \sigma (x) = \sqrt{{\sum_{i=1}^N (x_i - \bar{x})^2 \over N - 1}} +## {\rm std} (x) = \sigma = \sqrt{{\sum_{i=1}^N (x_i - \bar{x})^2 \over N - 1}} ## $$ -## where $\bar{x}$ is the mean value of $x$. +## where $\bar{x}$ is the mean value of $x$ and $N$ is the number of elements. ## @end tex ## @ifnottex ## ## @example ## @group -## std (x) = sqrt (sumsq (x - mean (x)) / (n - 1)) +## std (x) = sqrt ( 1/(N-1) SUM_i (x(i) - mean(x))^2 ) ## @end group ## @end example ## +## @noindent +## where @math{N} is the number of elements. ## @end ifnottex +## ## If @var{x} is a matrix, compute the standard deviation for ## each column and return them in a row vector. ## -## The argument @var{opt} determines the type of normalization to use. Valid -## values are +## The argument @var{opt} determines the type of normalization to use. +## Valid values are ## -## @table @asis +## @table @asis ## @item 0: -## normalizes with @math{N-1}, provides the square root of best unbiased -## estimator of the variance [default] +## normalize with @math{N-1}, provides the square root of the best unbiased +## estimator of the variance [default] ## ## @item 1: -## normalizes with @math{N}, this provides the square root of the second -## moment around the mean +## normalize with @math{N}, this provides the square root of the second +## moment around the mean ## @end table ## -## The third argument @var{dim} determines the dimension along which the -## standard -## deviation is calculated. -## @seealso{mean, median} +## If the optional argument @var{dim} is given, operate along this dimension. +## @seealso{var, range, iqr, mean, median} ## @end deftypefn ## Author: jwe -function retval = std (a, opt, dim) +function retval = std (x, opt = 0, dim) if (nargin < 1 || nargin > 3) print_usage (); endif + + if (! (isnumeric (x))) + error ("std: X must be a numeric vector or matrix"); + endif + + if (isempty (opt)) + opt = 0; + endif + if (opt != 0 && opt != 1) + error ("std: normalization OPT must be 0 or 1"); + endif + + sz = size (x); if (nargin < 3) - dim = find (size (a) > 1, 1); + ## Find the first non-singleton dimension. + dim = find (sz > 1, 1); if (isempty (dim)) dim = 1; endif endif - if (nargin < 2 || isempty (opt)) - opt = 0; - endif - n = size (a, dim); + n = size (x, dim); if (n == 1) - retval = zeros (size (a)); - elseif (numel (a) > 0) - retval = sqrt (sumsq (center (a, dim), dim) / (n + opt - 1)); + retval = zeros (sz); + elseif (numel (x) > 0) + retval = sqrt (sumsq (center (x, dim), dim) / (n - 1 + opt)); else - error ("std: x must not be empty"); + error ("std: X must not be empty"); endif endfunction + %!test %! x = ones (10, 2); %! y = [1, 3]; @@ -95,6 +107,10 @@ %! assert (std (x, 0, 3), zeros (10, 2)) %! assert (std (ones (3, 1, 2), 0, 2), zeros (3, 1, 2)) +%% Test input validation %!error std (); +%!error std (1, 2, 3, 4); +%!error std (true(1,2)) +%!error std (1, -1); +%!error std ([], 1); -%!error std (1, 2, 3, 4); diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/studentize.m --- a/scripts/statistics/base/studentize.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/studentize.m Mon Jan 03 21:23:08 2011 -0800 @@ -18,13 +18,15 @@ ## . ## -*- texinfo -*- -## @deftypefn {Function File} {} studentize (@var{x}, @var{dim}) +## @deftypefn {Function File} {} studentize (@var{x}) +## @deftypefnx {Function File} {} studentize (@var{x}, @var{dim}) ## If @var{x} is a vector, subtract its mean and divide by its standard ## deviation. ## ## If @var{x} is a matrix, do the above along the first non-singleton -## dimension. If the optional argument @var{dim} is given then operate -## along this dimension. +## dimension. +## If the optional argument @var{dim} is given, operate along this dimension. +## @seealso{center} ## @end deftypefn ## Author: KH @@ -36,8 +38,12 @@ print_usage (); endif - if (!ismatrix(x) || ischar(x)) - error ("studentize: X must be a numeric matrix or vector"); + if (! isnumeric(x)) + error ("studentize: X must be a numeric vector or matrix"); + endif + + if (isinteger (x)) + x = double (x); endif nd = ndims (x); @@ -49,16 +55,35 @@ dim = 1; endif else - if (!(isscalar (dim) && dim == round (dim)) + if (!(isscalar (dim) && dim == fix (dim)) || !(1 <= dim && dim <= nd)) error ("studentize: DIM must be an integer and a valid dimension"); endif endif c = sz(dim); - idx = ones (1, nd); - idx(dim) = c; - t = x - repmat (mean (x, dim), idx); - t = t ./ repmat (max (cat (dim, std(t, [], dim), ! any (t, dim)), [], dim), idx); + if (c == 0) + t = x; + else + idx = ones (1, nd); + idx(dim) = c; + t = x - repmat (mean (x, dim), idx); + t = t ./ repmat (max (cat (dim, std(t, [], dim), ! any (t, dim)), [], dim), idx); + endif endfunction + +%!assert(studentize ([1,2,3]), [-1,0,1]) +%!assert(studentize (int8 ([1,2,3])), [-1,0,1]) +#%!assert(studentize (ones (3,2,0,2)), zeros (3,2,0,2)) +%!assert(studentize ([2,0,-2;0,2,0;-2,-2,2]), [1,0,-1;0,1,0;-1,-1,1]) + +%% Test input validation +%!error studentize () +%!error studentize (1, 2, 3) +%!error studentize ([true true]) +%!error studentize (1, ones(2,2)) +%!error studentize (1, 1.5) +%!error studentize (1, 0) +%!error studentize (1, 3) + diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/table.m --- a/scripts/statistics/base/table.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/table.m Mon Jan 03 21:23:08 2011 -0800 @@ -20,8 +20,8 @@ ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{t}, @var{l_x}] =} table (@var{x}) ## @deftypefnx {Function File} {[@var{t}, @var{l_x}, @var{l_y}] =} table (@var{x}, @var{y}) -## Create a contingency table @var{t} from data vectors. The @var{l} -## vectors are the corresponding levels. +## Create a contingency table @var{t} from data vectors. The @var{l_x} and +## @var{l_y} vectors are the corresponding levels. ## ## Currently, only 1- and 2-dimensional tables are supported. ## @end deftypefn @@ -31,28 +31,43 @@ function [t, v, w] = table (x, y) + if (nargin < 1 || nargin > 2) + print_usage (); + endif + if (nargin == 1) - if (! (isvector (x))) - error ("table: x must be a vector"); + if (!isnumeric (x) || !isvector (x)) + error ("table: X must be a numeric vector"); endif - v = values (x); + v = unique (x); for i = 1 : length (v) t(i) = sum (x == v(i) | isnan (v(i)) * isnan (x)); endfor elseif (nargin == 2) - if (! (isvector (x) && isvector (y) && (length (x) == length (y)))) - error ("table: x and y must be vectors of the same length"); + if (! ( isvector (x) && isnumeric (x) + && isvector (y) && isnumeric (y) + && (length (x) == length (y)))) + error ("table: X and Y must be numeric vectors of the same length"); endif - v = values (x); - w = values (y); + v = unique (x); + w = unique (y); for i = 1 : length (v) for j = 1 : length (w) t(i,j) = sum ((x == v(i) | isnan (v(i)) * isnan (x)) & (y == w(j) | isnan (w(j)) * isnan (y))); endfor endfor - else - print_usage (); endif -endfunction \ No newline at end of file +endfunction + +%% Test input validation +%!error table () +%!error table (1, 2, 3) +%!error table (ones (2)) +%!error table ([true true]) +%!error table (ones (2,1), true (2,1)) +%!error table (true (2,1), ones (2,1)) +%!error table (ones (2,2), ones (2,1)) +%!error table (ones (2,1), ones (2,2)) +%!error table (ones (2,1), ones (3,1)) diff -r 20f53b3a558f -r e151e23f73bc scripts/statistics/base/var.m --- a/scripts/statistics/base/var.m Mon Jan 03 18:36:49 2011 -0800 +++ b/scripts/statistics/base/var.m Mon Jan 03 21:23:08 2011 -0800 @@ -18,78 +18,91 @@ ## . ## -*- texinfo -*- -## @deftypefn {Function File} {} var (@var{x}) -## For vector arguments, return the (real) variance of the values. -## For matrix arguments, return a row vector containing the variance for -## each column. +## @deftypefn {Function File} {} var (@var{x}) +## @deftypefnx {Function File} {} var (@var{x}, @var{opt}) +## @deftypefnx {Function File} {} var (@var{x}, @var{opt}, @var{dim}) +## Compute the variance of the elements of the vector @var{x}. +## @tex +## $$ +## {\rm std} (x) = \sigma^2 = {\sum_{i=1}^N (x_i - \bar{x})^2 \over N - 1} +## $$ +## where $\bar{x}$ is the mean value of $x$. +## @end tex +## @ifnottex ## +## @example +## @group +## std (x) = 1/(N-1) SUM_i (x(i) - mean(x))^2 +## @end group +## @end example +## +## @end ifnottex +## If @var{x} is a matrix, compute the variance for each column +## and return them in a row vector. +## ## The argument @var{opt} determines the type of normalization to use. ## Valid values are ## ## @table @asis ## @item 0: -## Normalizes with @math{N-1}, provides the best unbiased estimator of the -## variance [default]. +## normalize with @math{N-1}, provides the best unbiased estimator of the +## variance [default] ## ## @item 1: -## Normalizes with @math{N}, this provides the second moment around the mean. +## normalizes with @math{N}, this provides the second moment around the mean ## @end table ## -## The third argument @var{dim} determines the dimension along which the -## variance is calculated. +## If the optional argument @var{dim} is given, operate along this dimension. +## @seealso{cov,std,skewness,kurtosis,moment} ## @end deftypefn ## Author: KH ## Description: Compute variance -function retval = var (x, opt, dim) +function retval = var (x, opt = 0, dim) if (nargin < 1 || nargin > 3) print_usage (); endif + + if (!isnumeric (x)) + error ("var: X must be a numeric vector or matrix"); + endif + + if (isempty (opt)) + opt = 0; + endif + if (opt != 0 && opt != 1) + error ("var: normalization OPT must be 0 or 1"); + endif + if (nargin < 3) dim = find (size (x) > 1, 1); if (isempty (dim)) dim = 1; endif endif - if (nargin < 2 || isempty (opt)) - opt = 0; - endif - sz = size (x); - n = sz(dim); - if (isempty (x)) - ## FIXME -- is there a way to obtain these results without all the - ## special cases? - if (ndims (x) == 2 && sz(1) == 0 && sz(2) == 0) - retval = NaN; - else - sz(dim) = 1; - if (n == 0) - if (prod (sz) == 0) - retval = zeros (sz); - else - retval = NaN (sz); - endif - else - retval = zeros (sz); - endif - endif - elseif (n == 1) - retval = zeros (sz); + n = size (x, dim); + if (n == 1) + retval = zeros (size (x), class (x)); + elseif (numel (x) > 0) + retval = sumsq (center (x, dim), dim) / (n - 1 + opt); else - retval = sumsq (center (x, dim), dim) / (n + opt - 1); + error ("var: X must not be empty"); endif endfunction %!assert (var (13), 0) -%!assert (var ([]), NaN) -%!assert (var (ones (0, 0, 0), 0, 1), zeros (1, 0, 0)) -%!assert (var (ones (0, 0, 0), 0, 2), zeros (0, 1, 0)) -%!assert (var (ones (0, 0, 0), 0, 3), zeros (0, 0)) -%!assert (var (ones (1, 2, 0)), zeros (1, 1, 0)) -%!assert (var (ones (1, 2, 0, 0), 0, 1), zeros (1, 2, 0, 0)) -%!assert (var (ones (1, 2, 0, 0), 0, 2), zeros (1, 1, 0, 0)) -%!assert (var (ones (1, 2, 0, 0), 0, 3), zeros (1, 2, 1, 0)) +%!assert (var ([1,2,3]), 1) +%!assert (var ([1,2,3], 1), 2/3, eps) +%!assert (var ([1,2,3], [], 1), [0,0,0]) + +%% Test input validation +%!error var () +%!error var (1,2,3,4) +%!error var (true(1,2)) +%!error var (1, -1); +%!error var ([],1) +