annotate scripts/strings/strsplit.m @ 13701:46e68badedb8

strsplit.m: Expand to accept 2-D character arrays. Improve input validation. * strsplit.m: Expand to accept 2-D character arrays. Improve input validation. Add tests. Document new feature.
author Rik <octave@nomad.inbox5.com>
date Fri, 14 Oct 2011 10:15:01 -0700
parents 9e1b9ca119eb
children 73b2b3ca6524
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
11523
fd0a3ac60b0e update copyright notices
John W. Eaton <jwe@octave.org>
parents: 11104
diff changeset
1 ## Copyright (C) 2009-2011 Jaroslav Hajek
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
2 ##
11104
2c356a35d7f5 fix copyright notices
John W. Eaton <jwe@octave.org>
parents: 8884
diff changeset
3 ## This file is part of Octave.
2c356a35d7f5 fix copyright notices
John W. Eaton <jwe@octave.org>
parents: 8884
diff changeset
4 ##
2c356a35d7f5 fix copyright notices
John W. Eaton <jwe@octave.org>
parents: 8884
diff changeset
5 ## Octave is free software; you can redistribute it and/or modify it
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
6 ## under the terms of the GNU General Public License as published by
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
7 ## the Free Software Foundation; either version 3 of the License, or (at
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
8 ## your option) any later version.
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
9 ##
11104
2c356a35d7f5 fix copyright notices
John W. Eaton <jwe@octave.org>
parents: 8884
diff changeset
10 ## Octave is distributed in the hope that it will be useful, but
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
11 ## WITHOUT ANY WARRANTY; without even the implied warranty of
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
12 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
13 ## General Public License for more details.
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
14 ##
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
15 ## You should have received a copy of the GNU General Public License
11104
2c356a35d7f5 fix copyright notices
John W. Eaton <jwe@octave.org>
parents: 8884
diff changeset
16 ## along with Octave; see the file COPYING. If not, see
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
17 ## <http://www.gnu.org/licenses/>.
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
18
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
19 ## -*- texinfo -*-
13701
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
20 ## @deftypefn {Function File} {[@var{cstr}] =} strsplit (@var{p}, @var{sep}, @var{strip_empty})
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
21 ## Split a string using one or more delimiters and return a cell
8884
579de77acd90 strsplit.m: style fixes
John W. Eaton <jwe@octave.org>
parents: 8883
diff changeset
22 ## array of strings. Consecutive delimiters and delimiters at
579de77acd90 strsplit.m: style fixes
John W. Eaton <jwe@octave.org>
parents: 8883
diff changeset
23 ## boundaries result in empty strings, unless @var{strip_empty} is true.
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
24 ## The default value of @var{strip_empty} is false.
13701
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
25 ##
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
26 ## 2-D character arrays are split at delimiters and at the original column
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
27 ## boundaries.
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
28 ##
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
29 ## Example:
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
30 ## @example
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
31 ## strsplit ("a,b,c", ",")
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
32 ## @result{}
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
33 ## @{
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
34 ## [1,1] = a
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
35 ## [1,2] = b
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
36 ## [1,3] = c
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
37 ## @}
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
38 ##
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
39 ## strsplit (["a,b" ; "cde"], ",")
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
40 ## @result{}
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
41 ## @{
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
42 ## [1,1] = a
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
43 ## [1,2] = b
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
44 ## [1,3] = cde
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
45 ## @}
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
46 ## @group
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
47 ## @end group
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
48 ## @end example
8884
579de77acd90 strsplit.m: style fixes
John W. Eaton <jwe@octave.org>
parents: 8883
diff changeset
49 ## @seealso{strtok}
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
50 ## @end deftypefn
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
51
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
52 function s = strsplit (p, sep, strip_empty = false)
8884
579de77acd90 strsplit.m: style fixes
John W. Eaton <jwe@octave.org>
parents: 8883
diff changeset
53
13701
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
54 if (nargin < 2 || nargin > 3)
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
55 print_usage ();
13701
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
56 elseif (! ischar (p) || ! ischar (sep))
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
57 error ("strsplit: P and SEP must be string values");
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
58 elseif (! isscalar (strip_empty))
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
59 error ("strsplit: STRIP_EMPTY must be a scalar value");
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
60 endif
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
61
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
62 if (isempty (p))
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
63 s = cell (size (p));
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
64 else
13701
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
65 if (rows (p) > 1)
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
66 ## For 2-D arrays, add separator character at line boundaries
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
67 ## and transform to single string
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
68 p(:, end+1) = sep(1);
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
69 p = reshape (p.', 1, numel (p));
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
70 p(end) = [];
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
71 endif
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
72
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
73 ## Split p according to delimiter
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
74 if (isscalar (sep))
13701
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
75 ## Single separator
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
76 idx = find (p == sep);
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
77 else
13701
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
78 ## Multiple separators
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
79 idx = strchr (p, sep);
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
80 endif
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
81
13701
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
82 ## Get substring lengths.
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
83 if (isempty (idx))
13701
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
84 strlens = length (p);
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
85 else
13701
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
86 strlens = [idx(1)-1, diff(idx)-1, numel(p)-idx(end)];
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
87 endif
8884
579de77acd90 strsplit.m: style fixes
John W. Eaton <jwe@octave.org>
parents: 8883
diff changeset
88 ## Remove separators.
11587
c792872f8942 all script files: untabify and strip trailing whitespace
John W. Eaton <jwe@octave.org>
parents: 11523
diff changeset
89 p(idx) = [];
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
90 if (strip_empty)
8884
579de77acd90 strsplit.m: style fixes
John W. Eaton <jwe@octave.org>
parents: 8883
diff changeset
91 ## Omit zero lengths.
13701
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
92 strlens = strlens(strlens != 0);
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
93 endif
13701
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
94
8884
579de77acd90 strsplit.m: style fixes
John W. Eaton <jwe@octave.org>
parents: 8883
diff changeset
95 ## Convert!
13701
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
96 s = mat2cell (p, 1, strlens);
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
97 endif
8884
579de77acd90 strsplit.m: style fixes
John W. Eaton <jwe@octave.org>
parents: 8883
diff changeset
98
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
99 endfunction
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
100
13701
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
101
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
102 %!assert (strsplit ("road to hell", " "), {"road", "to", "hell"})
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
103 %!assert (strsplit ("road to^hell", " ^"), {"road", "to", "hell"})
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
104 %!assert (strsplit ("road to--hell", " -", true), {"road", "to", "hell"})
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
105 %!assert (strsplit (["a,bc";",de"], ","), {"a", "bc", ones(1,0), "de "})
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
106 %!assert (strsplit (["a,bc";",de"], ",", true), {"a", "bc", "de "})
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
107 %!assert (strsplit (["a,bc";",de"], ", ", true), {"a", "bc", "de"})
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
108
13701
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
109 %% Test input validation
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
110 %!error strsplit ()
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
111 %!error strsplit ("abc")
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
112 %!error strsplit ("abc", "b", true, 4)
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
113 %!error <P and SEP must be string values> strsplit (123, "b")
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
114 %!error <P and SEP must be string values> strsplit ("abc", 1)
46e68badedb8 strsplit.m: Expand to accept 2-D character arrays. Improve input validation.
Rik <octave@nomad.inbox5.com>
parents: 12915
diff changeset
115 %!error <STRIP_EMPTY must be a scalar value> strsplit ("abc", "def", ones(3,3))
8877
2c8b2399247b implement strsplit; deprecate split
Jaroslav Hajek <highegg@gmail.com>
parents:
diff changeset
116