view doc/interpreter/stats.txi @ 20537:afdb856e44f1

Deprecate mahalanobis function. * NEWS: Announce deprecation. * stats.txi: Remove from manual * scripts/deprecated/module.mk: Add to deprecated build dir. * scripts/statistics/base/module.mk: Remove from existing dir * scripts/deprecated/mahalanobis.m: Add warning message to deprecated function. * scripts/statistics/base/mahalanobis.m: Delete deprecated function.
author Rik <rik@octave.org>
date Tue, 22 Sep 2015 05:45:05 -0700
parents 4197fc428c7d
children
line wrap: on
line source

@c Copyright (C) 1996-2015 John W. Eaton
@c
@c This file is part of Octave.
@c
@c Octave is free software; you can redistribute it and/or modify it
@c under the terms of the GNU General Public License as published by the
@c Free Software Foundation; either version 3 of the License, or (at
@c your option) any later version.
@c
@c Octave is distributed in the hope that it will be useful, but WITHOUT
@c ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
@c FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
@c for more details.
@c
@c You should have received a copy of the GNU General Public License
@c along with Octave; see the file COPYING.  If not, see
@c <http://www.gnu.org/licenses/>.

@node Statistics
@chapter Statistics

Octave has support for various statistical methods.  This includes
basic descriptive statistics, probability distributions, statistical tests,
random number generation, and much more.

The functions that analyze data all assume that multi-dimensional data
is arranged in a matrix where each row is an observation, and each
column is a variable.  Thus, the matrix defined by

@example
@group
a = [ 0.9, 0.7;
      0.1, 0.1;
      0.5, 0.4 ];
@end group
@end example

@noindent
contains three observations from a two-dimensional distribution.
While this is the default data arrangement, most functions support
different arrangements.

It should be noted that the statistics functions don't test for data
containing NaN, NA, or Inf.  These values need to be detected and dealt
with explicitly.  See @ref{XREFisnan,,isnan}, @ref{XREFisna,,isna},
@ref{XREFisinf,,isinf}, @ref{XREFisfinite,,isfinite}.

@menu
* Descriptive Statistics::
* Basic Statistical Functions::
* Statistical Plots::
* Correlation and Regression Analysis::
* Distributions::
* Tests::
* Random Number Generation::
@end menu

@node Descriptive Statistics
@section Descriptive Statistics

One principal goal of descriptive statistics is to represent the essence of a
large data set concisely.  Octave provides the mean, median, and mode functions
which all summarize a data set with just a single number corresponding to
the central tendency of the data.

@DOCSTRING(mean)

@DOCSTRING(median)

@DOCSTRING(mode)

Using just one number, such as the mean, to represent an entire data set may
not give an accurate picture of the data.  One way to characterize the fit is
to measure the dispersion of the data.  Octave provides several functions for
measuring dispersion.

@DOCSTRING(range)

@DOCSTRING(iqr)

@DOCSTRING(meansq)

@DOCSTRING(std)

In addition to knowing the size of a dispersion it is useful to know the shape
of the data set.  For example, are data points massed to the left or right
of the mean?  Octave provides several common measures to describe the shape
of the data set.  Octave can also calculate moments allowing arbitrary shape
measures to be developed.

@DOCSTRING(var)

@DOCSTRING(skewness)

@DOCSTRING(kurtosis)

@DOCSTRING(moment)

@DOCSTRING(quantile)

@DOCSTRING(prctile)

A summary view of a data set can be generated quickly with the
@code{statistics} function.

@DOCSTRING(statistics)

@node Basic Statistical Functions
@section Basic Statistical Functions

Octave supports various helpful statistical functions.  Many are useful as
initial steps to prepare a data set for further analysis.  Others provide
different measures from those of the basic descriptive statistics.

@DOCSTRING(center)

@DOCSTRING(zscore)

@DOCSTRING(histc)

@c FIXME: really want to put a reference to unique here
@c @DOCSTRING(values)

@DOCSTRING(nchoosek)

@DOCSTRING(perms)

@DOCSTRING(ranks)

@DOCSTRING(run_count)

@DOCSTRING(runlength)

@DOCSTRING(probit)

@DOCSTRING(logit)

@DOCSTRING(cloglog)

@DOCSTRING(table)

@node Statistical Plots
@section Statistical Plots

@c Should hist be moved to here, or perhaps the qqplot and ppplot
@c functions should be moved to the Plotting Chapter?

Octave can create Quantile Plots (QQ-Plots), and Probability Plots
(PP-Plots).  These are simple graphical tests for determining if a
data set comes from a certain distribution.

Note that Octave can also show histograms of data
using the @code{hist} function as described in
@ref{Two-Dimensional Plots}.

@DOCSTRING(qqplot)

@DOCSTRING(ppplot)

@node Correlation and Regression Analysis
@section Correlation and Regression Analysis

@c FIXME: Need Intro Here

@DOCSTRING(cov)

@DOCSTRING(corr)

@DOCSTRING(spearman)

@DOCSTRING(kendall)

@c FIXME: Need discussion of ols & gls and references to them in optim.txi


@DOCSTRING(logistic_regression)

@node Distributions
@section Distributions

Octave has functions for computing the Probability Density Function
(PDF), the Cumulative Distribution function (CDF), and the quantile
(the inverse of the CDF) for a large number of distributions.

The following table summarizes the supported distributions (in
alphabetical order).

@tex
\vskip 6pt
{\hbox to \hsize {\hfill\vbox{\offinterlineskip \tabskip=0pt
\halign{
\vrule height2.0ex depth1.ex width 0.6pt #\tabskip=0.3em &
# \hfil & \vrule # & # \hfil & \vrule # & # \hfil & \vrule # & # \hfil &
# \vrule width 0.6pt \tabskip=0pt\cr
\noalign{\hrule height 0.6pt}
& {\bf Distribution} && {\bf PDF}      && {\bf CDF}     && {\bf Quantile}&\cr
\noalign{\hrule}
&Beta         && betapdf        && betacdf       && betainv&\cr
&Binomial     && binopdf        && binocdf       && binoinv&\cr
&Cauchy       && cauchy\_pdf    && cauchy\_cdf   && cauchy\_inv&\cr
&Chi-Square   && chi2pdf        && chi2cdf       && chi2inv&\cr
&Univariate Discrete       && discrete\_pdf  && discrete\_cdf && discrete\_inv&\cr
&Empirical    && empirical\_pdf  && empirical\_cdf && empirical\_inv&\cr
&Exponential  && exppdf         && expcdf        && expinv&\cr
&F            && fpdf           && fcdf          && finv&\cr
&Gamma        && gampdf         && gamcdf        && gaminv&\cr
&Geometric    && geopdf         && geocdf        && geoinv&\cr
&Hypergeometric  && hygepdf     && hygecdf       && hygeinv&\cr
&Kolmogorov Smirnov && {\it Not Available} && kolmogorov\_&& {\it Not Available}&\cr
&             &&                && smirnov\_cdf &&&\cr
&Laplace      && laplace\_pdf   && laplace\_cdf  && laplace\_inv&\cr
&Logistic     && logistic\_pdf  && logistic\_cdf && logistic\_inv&\cr
&Log-Normal   && lognpdf        && logncdf       && logninv&\cr
&Univariate Normal && normpdf   && normcdf       && norminv&\cr
&Pascal       && nbinpdf        && nbincdf       && nbininv&\cr
&Poisson      && poisspdf       && poisscdf      && poissinv&\cr
&Standard Normal && stdnormal\_pdf  && stdnormal\_cdf && stdnormal\_inv&\cr
&t (Student)  && tpdf           && tcdf          && tinv&\cr
&Uniform Discrete && unidpdf    && unidcdf       && unidinv&\cr
&Uniform      && unifpdf        && unifcdf       && unifinv&\cr
&Weibull      && wblpdf         && wblcdf        && wblinv&\cr
\noalign{\hrule height 0.6pt}
}}\hfill}}
@end tex
@ifnottex
@multitable @columnfractions .31 .23 .23 .23
@headitem Distribution
  @tab PDF
  @tab CDF
  @tab Quantile
@item Beta Distribution
  @tab @code{betapdf}
  @tab @code{betacdf}
  @tab @code{betainv}
@item Binomial Distribution
  @tab @code{binopdf}
  @tab @code{binocdf}
  @tab @code{binoinv}
@item Cauchy Distribution
  @tab @code{cauchy_pdf}
  @tab @code{cauchy_cdf}
  @tab @code{cauchy_inv}
@item Chi-Square Distribution
  @tab @code{chi2pdf}
  @tab @code{chi2cdf}
  @tab @code{chi2inv}
@item Univariate Discrete Distribution
  @tab @code{discrete_pdf}
  @tab @code{discrete_cdf}
  @tab @code{discrete_inv}
@item Empirical Distribution
  @tab @code{empirical_pdf}
  @tab @code{empirical_cdf}
  @tab @code{empirical_inv}
@item Exponential Distribution
  @tab @code{exppdf}
  @tab @code{expcdf}
  @tab @code{expinv}
@item F Distribution
  @tab @code{fpdf}
  @tab @code{fcdf}
  @tab @code{finv}
@item Gamma Distribution
  @tab @code{gampdf}
  @tab @code{gamcdf}
  @tab @code{gaminv}
@item Geometric Distribution
  @tab @code{geopdf}
  @tab @code{geocdf}
  @tab @code{geoinv}
@item Hypergeometric Distribution
  @tab @code{hygepdf}
  @tab @code{hygecdf}
  @tab @code{hygeinv}
@item Kolmogorov Smirnov Distribution
  @tab @emph{Not Available}
  @tab @code{kolmogorov_smirnov_cdf}
  @tab @emph{Not Available}
@item Laplace Distribution
  @tab @code{laplace_pdf}
  @tab @code{laplace_cdf}
  @tab @code{laplace_inv}
@item Logistic Distribution
  @tab @code{logistic_pdf}
  @tab @code{logistic_cdf}
  @tab @code{logistic_inv}
@item Log-Normal Distribution
  @tab @code{lognpdf}
  @tab @code{logncdf}
  @tab @code{logninv}
@item Univariate Normal Distribution
  @tab @code{normpdf}
  @tab @code{normcdf}
  @tab @code{norminv}
@item Pascal Distribution
  @tab @code{nbinpdf}
  @tab @code{nbincdf}
  @tab @code{nbininv}
@item Poisson Distribution
  @tab @code{poisspdf}
  @tab @code{poisscdf}
  @tab @code{poissinv}
@item Standard Normal Distribution
  @tab @code{stdnormal_pdf}
  @tab @code{stdnormal_cdf}
  @tab @code{stdnormal_inv}
@item t (Student) Distribution
  @tab @code{tpdf}
  @tab @code{tcdf}
  @tab @code{tinv}
@item Univariate Discrete Distribution
  @tab @code{unidpdf}
  @tab @code{unidcdf}
  @tab @code{unidinv}
@item Uniform Distribution
  @tab @code{unifpdf}
  @tab @code{unifcdf}
  @tab @code{unifinv}
@item Weibull Distribution
  @tab @code{wblpdf}
  @tab @code{wblcdf}
  @tab @code{wblinv}
@end multitable
@end ifnottex

@DOCSTRING(betapdf)

@DOCSTRING(betacdf)

@DOCSTRING(betainv)

@DOCSTRING(binopdf)

@DOCSTRING(binocdf)

@DOCSTRING(binoinv)

@DOCSTRING(cauchy_pdf)

@DOCSTRING(cauchy_cdf)

@DOCSTRING(cauchy_inv)

@DOCSTRING(chi2pdf)

@DOCSTRING(chi2cdf)

@DOCSTRING(chi2inv)

@DOCSTRING(discrete_pdf)

@DOCSTRING(discrete_cdf)

@DOCSTRING(discrete_inv)

@DOCSTRING(empirical_pdf)

@DOCSTRING(empirical_cdf)

@DOCSTRING(empirical_inv)

@DOCSTRING(exppdf)

@DOCSTRING(expcdf)

@DOCSTRING(expinv)

@DOCSTRING(fpdf)

@DOCSTRING(fcdf)

@DOCSTRING(finv)

@DOCSTRING(gampdf)

@DOCSTRING(gamcdf)

@DOCSTRING(gaminv)

@DOCSTRING(geopdf)

@DOCSTRING(geocdf)

@DOCSTRING(geoinv)

@DOCSTRING(hygepdf)

@DOCSTRING(hygecdf)

@DOCSTRING(hygeinv)

@DOCSTRING(kolmogorov_smirnov_cdf)

@DOCSTRING(laplace_pdf)

@DOCSTRING(laplace_cdf)

@DOCSTRING(laplace_inv)

@DOCSTRING(logistic_pdf)

@DOCSTRING(logistic_cdf)

@DOCSTRING(logistic_inv)

@DOCSTRING(lognpdf)

@DOCSTRING(logncdf)

@DOCSTRING(logninv)

@DOCSTRING(nbinpdf)

@DOCSTRING(nbincdf)

@DOCSTRING(nbininv)

@DOCSTRING(normpdf)

@DOCSTRING(normcdf)

@DOCSTRING(norminv)

@DOCSTRING(poisspdf)

@DOCSTRING(poisscdf)

@DOCSTRING(poissinv)

@DOCSTRING(stdnormal_pdf)

@DOCSTRING(stdnormal_cdf)

@DOCSTRING(stdnormal_inv)

@DOCSTRING(tpdf)

@DOCSTRING(tcdf)

@DOCSTRING(tinv)

@DOCSTRING(unidpdf)

@DOCSTRING(unidcdf)

@DOCSTRING(unidinv)

@DOCSTRING(unifpdf)

@DOCSTRING(unifcdf)

@DOCSTRING(unifinv)

@DOCSTRING(wblpdf)

@DOCSTRING(wblcdf)

@DOCSTRING(wblinv)

@node Tests
@section Tests

Octave can perform many different statistical tests.  The following
table summarizes the available tests.

@tex
\vskip 6pt
{\hbox to \hsize {\hfill\vbox{\offinterlineskip \tabskip=0pt
\halign{
\vrule height2.0ex depth1.ex width 0.6pt #\tabskip=0.3em &
# \hfil & \vrule # & # \hfil & # \vrule width 0.6pt \tabskip=0pt\cr
\noalign{\hrule height 0.6pt}
& @strong{Hypothesis} && {\bf Test Functions} &\cr
\noalign{\hrule}
& Equal mean values && anova, hotelling\_test2, t\_test\_2, &\cr
&                   && welch\_test, wilcoxon\_test, z\_test\_2 &\cr
& Equal medians && kruskal\_wallis\_test, sign\_test &\cr
& Equal variances && bartlett\_test, manova, var\_test &\cr
& Equal distributions && chisquare\_test\_homogeneity, &\cr
&                     && kolmogorov\_smirnov\_test\_2, u\_test &\cr
& Equal marginal frequencies && mcnemar\_test &\cr
& Equal success probabilities && prop\_test\_2 &\cr
& Independent observations && chisquare\_test\_independence, &\cr
&                          && run\_test &\cr
& Uncorrelated observations && cor\_test &\cr
& Given mean value && hotelling\_test, t\_test, z\_test &\cr
& Observations from distribution && kolmogorov\_smirnov\_test &\cr
& Regression && f\_test\_regression, t\_test\_regression &\cr
\noalign{\hrule height 0.6pt}
}}\hfill}}
@end tex
@ifnottex
@multitable @columnfractions .4 .5
@headitem Hypothesis
  @tab Test Functions
@item Equal mean values
  @tab @code{anova}, @code{hotelling_test2}, @code{t_test_2},
       @code{welch_test}, @code{wilcoxon_test}, @code{z_test_2}
@item Equal medians
  @tab @code{kruskal_wallis_test}, @code{sign_test}
@item Equal variances
  @tab @code{bartlett_test}, @code{manova}, @code{var_test}
@item Equal distributions
  @tab @code{chisquare_test_homogeneity}, @code{kolmogorov_smirnov_test_2},
       @code{u_test}
@item Equal marginal frequencies
  @tab @code{mcnemar_test}
@item Equal success probabilities
  @tab @code{prop_test_2}
@item Independent observations
  @tab @code{chisquare_test_independence}, @code{run_test}
@item Uncorrelated observations
  @tab @code{cor_test}
@item Given mean value
  @tab @code{hotelling_test}, @code{t_test}, @code{z_test}
@item Observations from given distribution
  @tab @code{kolmogorov_smirnov_test}
@item Regression
  @tab @code{f_test_regression}, @code{t_test_regression}
@end multitable
@end ifnottex

The tests return a p-value that describes the outcome of the test.
Assuming that the test hypothesis is true, the p-value is the probability
of obtaining a worse result than the observed one.  So large p-values
corresponds to a successful test.  Usually a test hypothesis is accepted
if the p-value exceeds 0.05.

@DOCSTRING(anova)

@DOCSTRING(bartlett_test)

@DOCSTRING(chisquare_test_homogeneity)

@DOCSTRING(chisquare_test_independence)

@DOCSTRING(cor_test)

@DOCSTRING(f_test_regression)

@DOCSTRING(hotelling_test)

@DOCSTRING(hotelling_test_2)

@DOCSTRING(kolmogorov_smirnov_test)

@DOCSTRING(kolmogorov_smirnov_test_2)

@DOCSTRING(kruskal_wallis_test)

@DOCSTRING(manova)

@DOCSTRING(mcnemar_test)

@DOCSTRING(prop_test_2)

@DOCSTRING(run_test)

@DOCSTRING(sign_test)

@DOCSTRING(t_test)

@DOCSTRING(t_test_2)

@DOCSTRING(t_test_regression)

@DOCSTRING(u_test)

@DOCSTRING(var_test)

@DOCSTRING(welch_test)

@DOCSTRING(wilcoxon_test)

@DOCSTRING(z_test)

@DOCSTRING(z_test_2)

@node Random Number Generation
@section Random Number Generation

Octave can generate random numbers from a large number of distributions.
The random number generators are based on the random number generators
described in @ref{Special Utility Matrices}.
@c Should rand, randn, rande, randp, and randg be moved to here?

The following table summarizes the available random number generators
(in alphabetical order).

@tex
\vskip 6pt
{\hbox to \hsize {\hfill\vbox{\offinterlineskip \tabskip=0pt
\halign{
\vrule height2.0ex depth1.ex width 0.6pt #\tabskip=0.3em &
# \hfil & \vrule # & # \hfil & # \vrule width 0.6pt \tabskip=0pt\cr
\noalign{\hrule height 0.6pt}
& {\bf Distribution}                && {\bf Function} &\cr
\noalign{\hrule}
& Beta Distribution                 && betarnd &\cr
& Binomial Distribution             && binornd &\cr
& Cauchy Distribution               && cauchy\_rnd &\cr
& Chi-Square Distribution           && chi2rnd &\cr
& Univariate Discrete Distribution  && discrete\_rnd &\cr
& Empirical Distribution            && empirical\_rnd &\cr
& Exponential Distribution          && exprnd &\cr
& F Distribution                    && frnd &\cr
& Gamma Distribution                && gamrnd &\cr
& Geometric Distribution            && geornd &\cr
& Hypergeometric Distribution       && hygernd &\cr
& Laplace Distribution              && laplace\_rnd &\cr
& Logistic Distribution             && logistic\_rnd &\cr
& Log-Normal Distribution           && lognrnd &\cr
& Pascal Distribution               && nbinrnd &\cr
& Univariate Normal Distribution    && normrnd &\cr
& Poisson Distribution              && poissrnd &\cr
& Standard Normal Distribution      && stdnormal\_rnd &\cr
& t (Student) Distribution          && trnd &\cr
& Univariate Discrete Distribution  && unidrnd &\cr
& Uniform Distribution              && unifrnd &\cr
& Weibull Distribution              && wblrnd &\cr
& Wiener Process                    && wienrnd &\cr
\noalign{\hrule height 0.6pt}
}}\hfill}}
@end tex
@ifnottex
@multitable @columnfractions .4 .3
@headitem Distribution                  @tab Function
@item Beta Distribution                 @tab @code{betarnd}
@item Binomial Distribution             @tab @code{binornd}
@item Cauchy Distribution               @tab @code{cauchy_rnd}
@item Chi-Square Distribution           @tab @code{chi2rnd}
@item Univariate Discrete Distribution  @tab @code{discrete_rnd}
@item Empirical Distribution            @tab @code{empirical_rnd}
@item Exponential Distribution          @tab @code{exprnd}
@item F Distribution                    @tab @code{frnd}
@item Gamma Distribution                @tab @code{gamrnd}
@item Geometric Distribution            @tab @code{geornd}
@item Hypergeometric Distribution       @tab @code{hygernd}
@item Laplace Distribution              @tab @code{laplace_rnd}
@item Logistic Distribution             @tab @code{logistic_rnd}
@item Log-Normal Distribution           @tab @code{lognrnd}
@item Pascal Distribution               @tab @code{nbinrnd}
@item Univariate Normal Distribution    @tab @code{normrnd}
@item Poisson Distribution              @tab @code{poissrnd}
@item Standard Normal Distribution      @tab @code{stdnormal_rnd}
@item t (Student) Distribution          @tab @code{trnd}
@item Univariate Discrete Distribution  @tab @code{unidrnd}
@item Uniform Distribution              @tab @code{unifrnd}
@item Weibull Distribution              @tab @code{wblrnd}
@item Wiener Process                    @tab @code{wienrnd}
@end multitable
@end ifnottex

@DOCSTRING(betarnd)

@DOCSTRING(binornd)

@DOCSTRING(cauchy_rnd)

@DOCSTRING(chi2rnd)

@DOCSTRING(discrete_rnd)

@DOCSTRING(empirical_rnd)

@DOCSTRING(exprnd)

@DOCSTRING(frnd)

@DOCSTRING(gamrnd)

@DOCSTRING(geornd)

@DOCSTRING(hygernd)

@DOCSTRING(laplace_rnd)

@DOCSTRING(logistic_rnd)

@DOCSTRING(lognrnd)

@DOCSTRING(nbinrnd)

@DOCSTRING(normrnd)

@DOCSTRING(poissrnd)

@DOCSTRING(stdnormal_rnd)

@DOCSTRING(trnd)

@DOCSTRING(unidrnd)

@DOCSTRING(unifrnd)

@DOCSTRING(wblrnd)

@DOCSTRING(wienrnd)