7018
|
1 @c Copyright (C) 1996, 1997, 1999, 2000, 2002, 2003, 2004, 2005, |
|
2 @c 2006, 2007 John W. Eaton |
|
3 @c |
|
4 @c This file is part of Octave. |
|
5 @c |
|
6 @c Octave is free software; you can redistribute it and/or modify it |
|
7 @c under the terms of the GNU General Public License as published by the |
|
8 @c Free Software Foundation; either version 3 of the License, or (at |
|
9 @c your option) any later version. |
|
10 @c |
|
11 @c Octave is distributed in the hope that it will be useful, but WITHOUT |
|
12 @c ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or |
|
13 @c FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
|
14 @c for more details. |
|
15 @c |
|
16 @c You should have received a copy of the GNU General Public License |
|
17 @c along with Octave; see the file COPYING. If not, see |
|
18 @c <http://www.gnu.org/licenses/>. |
3294
|
19 |
4167
|
20 @node Strings |
3294
|
21 @chapter Strings |
|
22 @cindex strings |
|
23 @cindex character strings |
|
24 @opindex " |
|
25 @opindex ' |
|
26 |
|
27 A @dfn{string constant} consists of a sequence of characters enclosed in |
|
28 either double-quote or single-quote marks. For example, both of the |
|
29 following expressions |
|
30 |
|
31 @example |
|
32 @group |
|
33 "parrot" |
|
34 'parrot' |
|
35 @end group |
|
36 @end example |
|
37 |
|
38 @noindent |
|
39 represent the string whose contents are @samp{parrot}. Strings in |
|
40 Octave can be of any length. |
|
41 |
|
42 Since the single-quote mark is also used for the transpose operator |
|
43 (@pxref{Arithmetic Ops}) but double-quote marks have no other purpose in |
|
44 Octave, it is best to use double-quote marks to denote strings. |
|
45 |
6554
|
46 @cindex escape sequence notation |
|
47 In double-quoted strings, the backslash character is used to introduce |
6623
|
48 @dfn{escape sequences} that represent other characters. For example, |
6554
|
49 @samp{\n} embeds a newline character in a double-quoted string and |
|
50 @samp{\"} embeds a double quote character. |
3294
|
51 |
6554
|
52 In single-quoted strings, backslash is not a special character. |
|
53 |
|
54 Here is an example showing the difference |
3294
|
55 |
6554
|
56 @example |
6556
|
57 @group |
6554
|
58 toascii ("\n") |
6570
|
59 @result{} 10 |
6554
|
60 toascii ('\n') |
6570
|
61 @result{} [ 92 110 ] |
6556
|
62 @end group |
6554
|
63 @end example |
3294
|
64 |
6554
|
65 You may also insert a single quote character in a single-quoted string |
|
66 by using two single quote characters in succession. For example, |
|
67 |
|
68 @example |
|
69 'I can''t escape' |
6570
|
70 @result{} I can't escape |
6554
|
71 @end example |
3294
|
72 |
|
73 Here is a table of all the escape sequences used in Octave. They are |
|
74 the same as those used in the C programming language. |
|
75 |
|
76 @table @code |
|
77 @item \\ |
|
78 Represents a literal backslash, @samp{\}. |
|
79 |
|
80 @item \" |
|
81 Represents a literal double-quote character, @samp{"}. |
|
82 |
|
83 @item \' |
|
84 Represents a literal single-quote character, @samp{'}. |
|
85 |
3893
|
86 @item \0 |
4946
|
87 Represents the ``nul'' character, control-@@, ASCII code 0. |
3893
|
88 |
3294
|
89 @item \a |
|
90 Represents the ``alert'' character, control-g, ASCII code 7. |
|
91 |
|
92 @item \b |
|
93 Represents a backspace, control-h, ASCII code 8. |
|
94 |
|
95 @item \f |
|
96 Represents a formfeed, control-l, ASCII code 12. |
|
97 |
|
98 @item \n |
|
99 Represents a newline, control-j, ASCII code 10. |
|
100 |
|
101 @item \r |
|
102 Represents a carriage return, control-m, ASCII code 13. |
|
103 |
|
104 @item \t |
|
105 Represents a horizontal tab, control-i, ASCII code 9. |
|
106 |
|
107 @item \v |
|
108 Represents a vertical tab, control-k, ASCII code 11. |
|
109 |
|
110 @c We don't do octal or hex this way yet. |
|
111 @c |
|
112 @c @item \@var{nnn} |
|
113 @c Represents the octal value @var{nnn}, where @var{nnn} are one to three |
|
114 @c digits between 0 and 7. For example, the code for the ASCII ESC |
|
115 @c (escape) character is @samp{\033}.@refill |
|
116 @c |
|
117 @c @item \x@var{hh}@dots{} |
|
118 @c Represents the hexadecimal value @var{hh}, where @var{hh} are hexadecimal |
|
119 @c digits (@samp{0} through @samp{9} and either @samp{A} through @samp{F} or |
|
120 @c @samp{a} through @samp{f}). Like the same construct in @sc{ansi} C, |
|
121 @c the escape |
|
122 @c sequence continues until the first non-hexadecimal digit is seen. However, |
|
123 @c using more than two hexadecimal digits produces undefined results. (The |
|
124 @c @samp{\x} escape sequence is not allowed in @sc{posix} @code{awk}.)@refill |
|
125 @end table |
|
126 |
|
127 Strings may be concatenated using the notation for defining matrices. |
|
128 For example, the expression |
|
129 |
|
130 @example |
|
131 [ "foo" , "bar" , "baz" ] |
|
132 @end example |
|
133 |
|
134 @noindent |
|
135 produces the string whose contents are @samp{foobarbaz}. @xref{Numeric |
3402
|
136 Data Types}, for more information about creating matrices. |
3294
|
137 |
|
138 @menu |
6624
|
139 * Creating Strings:: |
|
140 * Comparing Strings:: |
|
141 * Manipulating Strings:: |
3294
|
142 * String Conversions:: |
|
143 * Character Class Functions:: |
|
144 @end menu |
|
145 |
4167
|
146 @node Creating Strings |
3294
|
147 @section Creating Strings |
|
148 |
6623
|
149 The easiest way to create a string is, as illustrated in the introduction, |
|
150 to enclose a text in double-quotes or single-quotes. It is however |
|
151 possible to create a string without actually writing a text. The |
|
152 function @code{blanks} creates a string of a given length consisting |
|
153 only of blank characters (ASCII code 32). |
|
154 |
3361
|
155 @DOCSTRING(blanks) |
3294
|
156 |
6623
|
157 The string representation used by Octave is an array of characters, so |
|
158 the result of @code{blanks(10)} is actually a row vector of length 10 |
|
159 containing the value 32 in all places. This lends itself to the obvious |
|
160 generalisation to character matrices. Using a matrix of characters, it |
|
161 is possible to represent a collection of same-length strings in one |
|
162 variable. The convention used in Octave is that each row in a |
|
163 character matrix is a separate string, but letting each column represent |
|
164 a string is equally possible. |
|
165 |
|
166 The easiest way to create a character matrix is to put several strings |
|
167 together into a matrix. |
|
168 |
|
169 @example |
|
170 collection = [ "String #1"; "String #2" ]; |
|
171 @end example |
|
172 |
|
173 @noindent |
|
174 This creates a 2-by-9 character matrix. |
|
175 |
|
176 One relevant question is, what happens when character matrix is |
|
177 created from strings of different length. The answer is that Octave |
|
178 puts blank characters at the end of strings shorter than the longest |
|
179 string. While it is possible to use a different character than the |
|
180 blank character using the @code{string_fill_char} function, it shows |
|
181 a problem with character matrices. It simply isn't possible to |
|
182 represent strings of different lengths. The solution is to use a cell |
|
183 array of strings, which is described in @ref{Cell Arrays of Strings}. |
|
184 |
4358
|
185 @DOCSTRING(char) |
|
186 |
3361
|
187 @DOCSTRING(strcat) |
3294
|
188 |
6502
|
189 @DOCSTRING(strvcat) |
|
190 |
|
191 @DOCSTRING(strtrunc) |
|
192 |
3361
|
193 @DOCSTRING(string_fill_char) |
3294
|
194 |
3361
|
195 @DOCSTRING(str2mat) |
3294
|
196 |
4535
|
197 @DOCSTRING(ischar) |
|
198 |
6502
|
199 @DOCSTRING(mat2str) |
|
200 |
|
201 @DOCSTRING(num2str) |
3294
|
202 |
6623
|
203 @DOCSTRING(int2str) |
|
204 |
|
205 @node Comparing Strings |
|
206 @section Comparing Strings |
|
207 |
|
208 Since a string is a character array comparison between strings work |
|
209 element by element as the following example shows. |
|
210 |
|
211 @example |
|
212 GNU = "GNU's Not UNIX"; |
|
213 spaces = (GNU == " ") |
|
214 @result{} spaces = |
|
215 0 0 0 0 0 1 0 0 0 1 0 0 0 0 |
|
216 @end example |
|
217 |
|
218 @noindent |
|
219 To determine if two functions are identical it is therefore necessary |
|
220 to use the @code{strcmp} or @code{strncpm} functions. Similar |
7001
|
221 functions exist for doing case-insensitive comparisons. |
6623
|
222 |
|
223 @DOCSTRING(strcmp) |
|
224 |
|
225 @DOCSTRING(strcmpi) |
|
226 |
|
227 @DOCSTRING(strncmp) |
|
228 |
|
229 @DOCSTRING(strncmpi) |
|
230 |
|
231 @node Manipulating Strings |
|
232 @section Manipulating Strings |
|
233 |
|
234 Octave supports a wide range of functions for manipulating strings. |
|
235 Since a string is just a matrix, simple manipulations can be accomplished |
|
236 using standard operators. The following example shows how to replace |
|
237 all blank characters with underscores. |
|
238 |
|
239 @example |
|
240 quote = "First things first, but not necessarily in that order"; |
|
241 quote( quote == " " ) = "_" |
|
242 @print{} quote = First_things_first,_but_not_necessarily_in_that_order |
|
243 @end example |
|
244 |
|
245 For more complex manipulations, such as searching, replacing, and |
7001
|
246 general regular expressions, the following functions come with Octave. |
3294
|
247 |
3361
|
248 @DOCSTRING(deblank) |
3294
|
249 |
3361
|
250 @DOCSTRING(findstr) |
3294
|
251 |
3361
|
252 @DOCSTRING(index) |
3294
|
253 |
3361
|
254 @DOCSTRING(rindex) |
3294
|
255 |
6502
|
256 @DOCSTRING(strfind) |
|
257 |
|
258 @DOCSTRING(strmatch) |
|
259 |
|
260 @DOCSTRING(strtok) |
|
261 |
3361
|
262 @DOCSTRING(split) |
3294
|
263 |
3361
|
264 @DOCSTRING(strrep) |
3294
|
265 |
3361
|
266 @DOCSTRING(substr) |
3294
|
267 |
5582
|
268 @DOCSTRING(regexp) |
|
269 |
|
270 @DOCSTRING(regexpi) |
|
271 |
6549
|
272 @DOCSTRING(regexprep) |
|
273 |
4167
|
274 @node String Conversions |
3294
|
275 @section String Conversions |
|
276 |
6623
|
277 Octave supports various kinds of conversions between strings and |
|
278 numbers. As an example, it is possible to convert a string containing |
|
279 a hexadecimal number to a floating point number. |
|
280 |
|
281 @example |
|
282 hex2dec ("FF") |
|
283 @result{} ans = 255 |
|
284 @end example |
|
285 |
3361
|
286 @DOCSTRING(bin2dec) |
3294
|
287 |
3361
|
288 @DOCSTRING(dec2bin) |
3294
|
289 |
3361
|
290 @DOCSTRING(dec2hex) |
3294
|
291 |
3361
|
292 @DOCSTRING(hex2dec) |
3294
|
293 |
3920
|
294 @DOCSTRING(dec2base) |
|
295 |
|
296 @DOCSTRING(base2dec) |
|
297 |
6623
|
298 @DOCSTRING(str2double) |
3920
|
299 |
6623
|
300 @DOCSTRING(strjust) |
6502
|
301 |
3361
|
302 @DOCSTRING(str2num) |
3294
|
303 |
3361
|
304 @DOCSTRING(toascii) |
3294
|
305 |
3361
|
306 @DOCSTRING(tolower) |
3294
|
307 |
3361
|
308 @DOCSTRING(toupper) |
3294
|
309 |
3428
|
310 @DOCSTRING(do_string_escapes) |
|
311 |
3361
|
312 @DOCSTRING(undo_string_escapes) |
3294
|
313 |
4167
|
314 @node Character Class Functions |
3294
|
315 @section Character Class Functions |
|
316 |
|
317 Octave also provides the following character class test functions |
|
318 patterned after the functions in the standard C library. They all |
|
319 operate on string arrays and return matrices of zeros and ones. |
|
320 Elements that are nonzero indicate that the condition was true for the |
|
321 corresponding character in the string array. For example, |
|
322 |
|
323 @example |
|
324 @group |
|
325 isalpha ("!Q@@WERT^Y&") |
|
326 @result{} [ 0, 1, 0, 1, 1, 1, 1, 0, 1, 0 ] |
|
327 @end group |
|
328 @end example |
|
329 |
3361
|
330 @DOCSTRING(isalnum) |
3294
|
331 |
3361
|
332 @DOCSTRING(isalpha) |
|
333 |
|
334 @DOCSTRING(isascii) |
3294
|
335 |
3361
|
336 @DOCSTRING(iscntrl) |
3294
|
337 |
3361
|
338 @DOCSTRING(isdigit) |
3294
|
339 |
3361
|
340 @DOCSTRING(isgraph) |
3294
|
341 |
6549
|
342 @DOCSTRING(isletter) |
|
343 |
3361
|
344 @DOCSTRING(islower) |
3294
|
345 |
3361
|
346 @DOCSTRING(isprint) |
3294
|
347 |
3361
|
348 @DOCSTRING(ispunct) |
3294
|
349 |
3361
|
350 @DOCSTRING(isspace) |
3294
|
351 |
3361
|
352 @DOCSTRING(isupper) |
3294
|
353 |
3361
|
354 @DOCSTRING(isxdigit) |