view doc/interpreter/stats.txi @ 30564:796f54d4ddbf stable

update Octave Project Developers copyright for the new year In files that have the "Octave Project Developers" copyright notice, update for 2021. In all .txi and .texi files except gpl.txi and gpl.texi in the doc/liboctave and doc/interpreter directories, change the copyright to "Octave Project Developers", the same as used for other source files. Update copyright notices for 2022 (not done since 2019). For gpl.txi and gpl.texi, change the copyright notice to be "Free Software Foundation, Inc." and leave the date at 2007 only because this file only contains the text of the GPL, not anything created by the Octave Project Developers. Add Paul Thomas to contributors.in.
author John W. Eaton <jwe@octave.org>
date Tue, 28 Dec 2021 18:22:40 -0500
parents 7fa1d6f670f5
children 4c6c8f14766c
line wrap: on
line source

@c Copyright (C) 1996-2022 The Octave Project Developers
@c
@c This file is part of Octave.
@c
@c Octave is free software: you can redistribute it and/or modify it
@c under the terms of the GNU General Public License as published by
@c the Free Software Foundation, either version 3 of the License, or
@c (at your option) any later version.
@c
@c Octave is distributed in the hope that it will be useful, but
@c WITHOUT ANY WARRANTY; without even the implied warranty of
@c MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
@c GNU General Public License for more details.
@c
@c You should have received a copy of the GNU General Public License
@c along with Octave; see the file COPYING.  If not, see
@c <https://www.gnu.org/licenses/>.

@node Statistics
@chapter Statistics

Octave has support for various statistical methods.  The emphasis is on basic
descriptive statistics, but the Octave Forge statistics package includes
probability distributions, statistical tests, random number generation, and
much more.

The functions that analyze data all assume that multi-dimensional data is
arranged in a matrix where each row is an observation, and each column is a
variable.  Thus, the matrix defined by

@example
@group
a = [ 0.9, 0.7;
      0.1, 0.1;
      0.5, 0.4 ];
@end group
@end example

@noindent
contains three observations from a two-dimensional distribution.  While this is
the default data arrangement, most functions support different arrangements.

It should be noted that the statistics functions don't test for data containing
NaN, NA, or Inf.  These values need to be detected and dealt with explicitly.
See @ref{XREFisnan,,isnan}, @ref{XREFisna,,isna}, @ref{XREFisinf,,isinf},
@ref{XREFisfinite,,isfinite}.

@menu
* Descriptive Statistics::
* Statistics on Sliding Windows of Data::
* Basic Statistical Functions::
* Correlation and Regression Analysis::
* Distributions::
* Random Number Generation::
@end menu

@node Descriptive Statistics
@section Descriptive Statistics

One principal goal of descriptive statistics is to represent the essence of a
large data set concisely.  Octave provides the mean, median, and mode functions
which all summarize a data set with just a single number corresponding to
the central tendency of the data.

@DOCSTRING(mean)

@DOCSTRING(median)

@DOCSTRING(mode)

Using just one number, such as the mean, to represent an entire data set may
not give an accurate picture of the data.  One way to characterize the fit is
to measure the dispersion of the data.  Octave provides several functions for
measuring dispersion.

@DOCSTRING(bounds)

@DOCSTRING(range)

@DOCSTRING(iqr)

@DOCSTRING(mad)

@DOCSTRING(meansq)

@DOCSTRING(std)

In addition to knowing the size of a dispersion it is useful to know the shape
of the data set.  For example, are data points massed to the left or right
of the mean?  Octave provides several common measures to describe the shape
of the data set.  Octave can also calculate moments allowing arbitrary shape
measures to be developed.

@DOCSTRING(var)

@DOCSTRING(skewness)

@DOCSTRING(kurtosis)

@DOCSTRING(moment)

@DOCSTRING(quantile)

@DOCSTRING(prctile)

A summary view of a data set can be generated quickly with the
@code{statistics} function.

@DOCSTRING(statistics)

@node Statistics on Sliding Windows of Data
@section Statistics on Sliding Windows of Data

It is often useful to calculate descriptive statistics over a subsection
(i.e., window) of a full dataset.  Octave provides the function @code{movfun}
which will call an arbitrary function handle with windows of data and
accumulate the results.  Many of the most commonly desired functions, such as
the moving average over a window of data (@code{movmean}), are already
provided.

@DOCSTRING(movfun)

@DOCSTRING(movslice)

@DOCSTRING(movmad)

@DOCSTRING(movmax)

@DOCSTRING(movmean)

@DOCSTRING(movmedian)

@DOCSTRING(movmin)

@DOCSTRING(movprod)

@DOCSTRING(movstd)

@DOCSTRING(movsum)

@DOCSTRING(movvar)

@node Basic Statistical Functions
@section Basic Statistical Functions

Octave supports various helpful statistical functions.  Many are useful as
initial steps to prepare a data set for further analysis.  Others provide
different measures from those of the basic descriptive statistics.

@DOCSTRING(center)

@DOCSTRING(zscore)

@DOCSTRING(histc)

@noindent
@code{unique} function documented at @ref{XREFunique,,unique} is often
useful for statistics.

@DOCSTRING(nchoosek)

@DOCSTRING(perms)

@DOCSTRING(ranks)

@DOCSTRING(run_count)

@DOCSTRING(runlength)

@node Correlation and Regression Analysis
@section Correlation and Regression Analysis

@c FIXME: Need Intro Here

@DOCSTRING(cov)

@DOCSTRING(corr)

@DOCSTRING(corrcoef)

@DOCSTRING(spearman)

@DOCSTRING(kendall)

@node Distributions
@section Distributions

Octave has functions for computing the Probability Density Function (PDF), the
Cumulative Distribution function (CDF), and the quantile (the inverse of the
CDF) for arbitrary user-defined distributions (discrete) and for experimental
data (empirical).

The following table summarizes the supported distributions (in alphabetical
order).

@tex
\vskip 6pt
{\hbox to \hsize {\hfill\vbox{\offinterlineskip \tabskip=0pt
\halign{
\vrule height2.0ex depth1.ex width 0.6pt #\tabskip=0.3em &
# \hfil & \vrule # & # \hfil & \vrule # & # \hfil & \vrule # & # \hfil &
# \vrule width 0.6pt \tabskip=0pt\cr
\noalign{\hrule height 0.6pt}
& {\bf Distribution} && {\bf PDF}      && {\bf CDF}     && {\bf Quantile}&\cr
\noalign{\hrule}
&Univariate Discrete       && discrete\_pdf  && discrete\_cdf && discrete\_inv&\cr
&Empirical    && empirical\_pdf  && empirical\_cdf && empirical\_inv&\cr
\noalign{\hrule height 0.6pt}
}}\hfill}}
@end tex
@ifnottex
@multitable @columnfractions .31 .23 .23 .23
@headitem Distribution
  @tab PDF
  @tab CDF
  @tab Quantile
@item Univariate Discrete Distribution
  @tab @code{discrete_pdf}
  @tab @code{discrete_cdf}
  @tab @code{discrete_inv}
@item Empirical Distribution
  @tab @code{empirical_pdf}
  @tab @code{empirical_cdf}
  @tab @code{empirical_inv}
@end multitable
@end ifnottex

@DOCSTRING(discrete_pdf)

@DOCSTRING(discrete_cdf)

@DOCSTRING(discrete_inv)

@DOCSTRING(empirical_pdf)

@DOCSTRING(empirical_cdf)

@DOCSTRING(empirical_inv)

@node Random Number Generation
@section Random Number Generation

Octave can generate random numbers from a large number of distributions.  The
random number generators are based on the random number generators described in
@ref{Special Utility Matrices}.

The following table summarizes the available random number generators (in
alphabetical order).

@tex
\vskip 6pt
{\hbox to \hsize {\hfill\vbox{\offinterlineskip \tabskip=0pt
\halign{
\vrule height2.0ex depth1.ex width 0.6pt #\tabskip=0.3em &
# \hfil & \vrule # & # \hfil & # \vrule width 0.6pt \tabskip=0pt\cr
\noalign{\hrule height 0.6pt}
& {\bf Distribution}                && {\bf Function} &\cr
\noalign{\hrule}
& Univariate Discrete Distribution  && discrete\_rnd &\cr
& Empirical Distribution            && empirical\_rnd &\cr
& Exponential Distribution          && rande &\cr
& Gamma Distribution                && randg &\cr
& Poisson Distribution              && randp &\cr
& Standard Normal Distribution      && randn &\cr
& Uniform Distribution              && rand &\cr
& Uniform Distribution (integers)   && randi &\cr
\noalign{\hrule height 0.6pt}
}}\hfill}}
@end tex
@ifnottex
@multitable @columnfractions .4 .3
@headitem Distribution                  @tab Function
@item Univariate Discrete Distribution  @tab @code{discrete_rnd}
@item Empirical Distribution            @tab @code{empirical_rnd}
@item Exponential Distribution          @tab @code{rande}
@item Gamma Distribution                @tab @code{randg}
@item Poisson Distribution              @tab @code{randp}
@item Standard Normal Distribution      @tab @code{randn}
@item Uniform Distribution              @tab @code{rand}
@item Uniform Distribution (integers)   @tab @code{randi}
@end multitable
@end ifnottex

@DOCSTRING(discrete_rnd)

@DOCSTRING(empirical_rnd)