6778
|
1 @c Copyright (C) 1996, 1997, 2007 John W. Eaton |
3294
|
2 @c This is part of the Octave manual. |
|
3 @c For copying conditions, see the file gpl.texi. |
|
4 |
4167
|
5 @node Strings |
3294
|
6 @chapter Strings |
|
7 @cindex strings |
|
8 @cindex character strings |
|
9 @opindex " |
|
10 @opindex ' |
|
11 |
|
12 A @dfn{string constant} consists of a sequence of characters enclosed in |
|
13 either double-quote or single-quote marks. For example, both of the |
|
14 following expressions |
|
15 |
|
16 @example |
|
17 @group |
|
18 "parrot" |
|
19 'parrot' |
|
20 @end group |
|
21 @end example |
|
22 |
|
23 @noindent |
|
24 represent the string whose contents are @samp{parrot}. Strings in |
|
25 Octave can be of any length. |
|
26 |
|
27 Since the single-quote mark is also used for the transpose operator |
|
28 (@pxref{Arithmetic Ops}) but double-quote marks have no other purpose in |
|
29 Octave, it is best to use double-quote marks to denote strings. |
|
30 |
6554
|
31 @cindex escape sequence notation |
|
32 In double-quoted strings, the backslash character is used to introduce |
6623
|
33 @dfn{escape sequences} that represent other characters. For example, |
6554
|
34 @samp{\n} embeds a newline character in a double-quoted string and |
|
35 @samp{\"} embeds a double quote character. |
3294
|
36 |
6554
|
37 In single-quoted strings, backslash is not a special character. |
|
38 |
|
39 Here is an example showing the difference |
3294
|
40 |
6554
|
41 @example |
6556
|
42 @group |
6554
|
43 toascii ("\n") |
6570
|
44 @result{} 10 |
6554
|
45 toascii ('\n') |
6570
|
46 @result{} [ 92 110 ] |
6556
|
47 @end group |
6554
|
48 @end example |
3294
|
49 |
6554
|
50 You may also insert a single quote character in a single-quoted string |
|
51 by using two single quote characters in succession. For example, |
|
52 |
|
53 @example |
|
54 'I can''t escape' |
6570
|
55 @result{} I can't escape |
6554
|
56 @end example |
3294
|
57 |
|
58 Here is a table of all the escape sequences used in Octave. They are |
|
59 the same as those used in the C programming language. |
|
60 |
|
61 @table @code |
|
62 @item \\ |
|
63 Represents a literal backslash, @samp{\}. |
|
64 |
|
65 @item \" |
|
66 Represents a literal double-quote character, @samp{"}. |
|
67 |
|
68 @item \' |
|
69 Represents a literal single-quote character, @samp{'}. |
|
70 |
3893
|
71 @item \0 |
4946
|
72 Represents the ``nul'' character, control-@@, ASCII code 0. |
3893
|
73 |
3294
|
74 @item \a |
|
75 Represents the ``alert'' character, control-g, ASCII code 7. |
|
76 |
|
77 @item \b |
|
78 Represents a backspace, control-h, ASCII code 8. |
|
79 |
|
80 @item \f |
|
81 Represents a formfeed, control-l, ASCII code 12. |
|
82 |
|
83 @item \n |
|
84 Represents a newline, control-j, ASCII code 10. |
|
85 |
|
86 @item \r |
|
87 Represents a carriage return, control-m, ASCII code 13. |
|
88 |
|
89 @item \t |
|
90 Represents a horizontal tab, control-i, ASCII code 9. |
|
91 |
|
92 @item \v |
|
93 Represents a vertical tab, control-k, ASCII code 11. |
|
94 |
|
95 @c We don't do octal or hex this way yet. |
|
96 @c |
|
97 @c @item \@var{nnn} |
|
98 @c Represents the octal value @var{nnn}, where @var{nnn} are one to three |
|
99 @c digits between 0 and 7. For example, the code for the ASCII ESC |
|
100 @c (escape) character is @samp{\033}.@refill |
|
101 @c |
|
102 @c @item \x@var{hh}@dots{} |
|
103 @c Represents the hexadecimal value @var{hh}, where @var{hh} are hexadecimal |
|
104 @c digits (@samp{0} through @samp{9} and either @samp{A} through @samp{F} or |
|
105 @c @samp{a} through @samp{f}). Like the same construct in @sc{ansi} C, |
|
106 @c the escape |
|
107 @c sequence continues until the first non-hexadecimal digit is seen. However, |
|
108 @c using more than two hexadecimal digits produces undefined results. (The |
|
109 @c @samp{\x} escape sequence is not allowed in @sc{posix} @code{awk}.)@refill |
|
110 @end table |
|
111 |
|
112 Strings may be concatenated using the notation for defining matrices. |
|
113 For example, the expression |
|
114 |
|
115 @example |
|
116 [ "foo" , "bar" , "baz" ] |
|
117 @end example |
|
118 |
|
119 @noindent |
|
120 produces the string whose contents are @samp{foobarbaz}. @xref{Numeric |
3402
|
121 Data Types}, for more information about creating matrices. |
3294
|
122 |
|
123 @menu |
6624
|
124 * Creating Strings:: |
|
125 * Comparing Strings:: |
|
126 * Manipulating Strings:: |
3294
|
127 * String Conversions:: |
|
128 * Character Class Functions:: |
|
129 @end menu |
|
130 |
4167
|
131 @node Creating Strings |
3294
|
132 @section Creating Strings |
|
133 |
6623
|
134 The easiest way to create a string is, as illustrated in the introduction, |
|
135 to enclose a text in double-quotes or single-quotes. It is however |
|
136 possible to create a string without actually writing a text. The |
|
137 function @code{blanks} creates a string of a given length consisting |
|
138 only of blank characters (ASCII code 32). |
|
139 |
3361
|
140 @DOCSTRING(blanks) |
3294
|
141 |
6623
|
142 The string representation used by Octave is an array of characters, so |
|
143 the result of @code{blanks(10)} is actually a row vector of length 10 |
|
144 containing the value 32 in all places. This lends itself to the obvious |
|
145 generalisation to character matrices. Using a matrix of characters, it |
|
146 is possible to represent a collection of same-length strings in one |
|
147 variable. The convention used in Octave is that each row in a |
|
148 character matrix is a separate string, but letting each column represent |
|
149 a string is equally possible. |
|
150 |
|
151 The easiest way to create a character matrix is to put several strings |
|
152 together into a matrix. |
|
153 |
|
154 @example |
|
155 collection = [ "String #1"; "String #2" ]; |
|
156 @end example |
|
157 |
|
158 @noindent |
|
159 This creates a 2-by-9 character matrix. |
|
160 |
|
161 One relevant question is, what happens when character matrix is |
|
162 created from strings of different length. The answer is that Octave |
|
163 puts blank characters at the end of strings shorter than the longest |
|
164 string. While it is possible to use a different character than the |
|
165 blank character using the @code{string_fill_char} function, it shows |
|
166 a problem with character matrices. It simply isn't possible to |
|
167 represent strings of different lengths. The solution is to use a cell |
|
168 array of strings, which is described in @ref{Cell Arrays of Strings}. |
|
169 |
4358
|
170 @DOCSTRING(char) |
|
171 |
3361
|
172 @DOCSTRING(strcat) |
3294
|
173 |
6502
|
174 @DOCSTRING(strvcat) |
|
175 |
|
176 @DOCSTRING(strtrunc) |
|
177 |
3361
|
178 @DOCSTRING(string_fill_char) |
3294
|
179 |
3361
|
180 @DOCSTRING(str2mat) |
3294
|
181 |
4535
|
182 @DOCSTRING(ischar) |
|
183 |
6502
|
184 @DOCSTRING(mat2str) |
|
185 |
|
186 @DOCSTRING(num2str) |
3294
|
187 |
6623
|
188 @DOCSTRING(int2str) |
|
189 |
|
190 @node Comparing Strings |
|
191 @section Comparing Strings |
|
192 |
|
193 Since a string is a character array comparison between strings work |
|
194 element by element as the following example shows. |
|
195 |
|
196 @example |
|
197 GNU = "GNU's Not UNIX"; |
|
198 spaces = (GNU == " ") |
|
199 @result{} spaces = |
|
200 0 0 0 0 0 1 0 0 0 1 0 0 0 0 |
|
201 @end example |
|
202 |
|
203 @noindent |
|
204 To determine if two functions are identical it is therefore necessary |
|
205 to use the @code{strcmp} or @code{strncpm} functions. Similar |
7001
|
206 functions exist for doing case-insensitive comparisons. |
6623
|
207 |
|
208 @DOCSTRING(strcmp) |
|
209 |
|
210 @DOCSTRING(strcmpi) |
|
211 |
|
212 @DOCSTRING(strncmp) |
|
213 |
|
214 @DOCSTRING(strncmpi) |
|
215 |
|
216 @node Manipulating Strings |
|
217 @section Manipulating Strings |
|
218 |
|
219 Octave supports a wide range of functions for manipulating strings. |
|
220 Since a string is just a matrix, simple manipulations can be accomplished |
|
221 using standard operators. The following example shows how to replace |
|
222 all blank characters with underscores. |
|
223 |
|
224 @example |
|
225 quote = "First things first, but not necessarily in that order"; |
|
226 quote( quote == " " ) = "_" |
|
227 @print{} quote = First_things_first,_but_not_necessarily_in_that_order |
|
228 @end example |
|
229 |
|
230 For more complex manipulations, such as searching, replacing, and |
7001
|
231 general regular expressions, the following functions come with Octave. |
3294
|
232 |
3361
|
233 @DOCSTRING(deblank) |
3294
|
234 |
3361
|
235 @DOCSTRING(findstr) |
3294
|
236 |
3361
|
237 @DOCSTRING(index) |
3294
|
238 |
3361
|
239 @DOCSTRING(rindex) |
3294
|
240 |
6502
|
241 @DOCSTRING(strfind) |
|
242 |
|
243 @DOCSTRING(strmatch) |
|
244 |
|
245 @DOCSTRING(strtok) |
|
246 |
3361
|
247 @DOCSTRING(split) |
3294
|
248 |
3361
|
249 @DOCSTRING(strrep) |
3294
|
250 |
3361
|
251 @DOCSTRING(substr) |
3294
|
252 |
5582
|
253 @DOCSTRING(regexp) |
|
254 |
|
255 @DOCSTRING(regexpi) |
|
256 |
6549
|
257 @DOCSTRING(regexprep) |
|
258 |
4167
|
259 @node String Conversions |
3294
|
260 @section String Conversions |
|
261 |
6623
|
262 Octave supports various kinds of conversions between strings and |
|
263 numbers. As an example, it is possible to convert a string containing |
|
264 a hexadecimal number to a floating point number. |
|
265 |
|
266 @example |
|
267 hex2dec ("FF") |
|
268 @result{} ans = 255 |
|
269 @end example |
|
270 |
3361
|
271 @DOCSTRING(bin2dec) |
3294
|
272 |
3361
|
273 @DOCSTRING(dec2bin) |
3294
|
274 |
3361
|
275 @DOCSTRING(dec2hex) |
3294
|
276 |
3361
|
277 @DOCSTRING(hex2dec) |
3294
|
278 |
3920
|
279 @DOCSTRING(dec2base) |
|
280 |
|
281 @DOCSTRING(base2dec) |
|
282 |
6623
|
283 @DOCSTRING(str2double) |
3920
|
284 |
6623
|
285 @DOCSTRING(strjust) |
6502
|
286 |
3361
|
287 @DOCSTRING(str2num) |
3294
|
288 |
3361
|
289 @DOCSTRING(toascii) |
3294
|
290 |
3361
|
291 @DOCSTRING(tolower) |
3294
|
292 |
3361
|
293 @DOCSTRING(toupper) |
3294
|
294 |
3428
|
295 @DOCSTRING(do_string_escapes) |
|
296 |
3361
|
297 @DOCSTRING(undo_string_escapes) |
3294
|
298 |
4167
|
299 @node Character Class Functions |
3294
|
300 @section Character Class Functions |
|
301 |
|
302 Octave also provides the following character class test functions |
|
303 patterned after the functions in the standard C library. They all |
|
304 operate on string arrays and return matrices of zeros and ones. |
|
305 Elements that are nonzero indicate that the condition was true for the |
|
306 corresponding character in the string array. For example, |
|
307 |
|
308 @example |
|
309 @group |
|
310 isalpha ("!Q@@WERT^Y&") |
|
311 @result{} [ 0, 1, 0, 1, 1, 1, 1, 0, 1, 0 ] |
|
312 @end group |
|
313 @end example |
|
314 |
3361
|
315 @DOCSTRING(isalnum) |
3294
|
316 |
3361
|
317 @DOCSTRING(isalpha) |
|
318 |
|
319 @DOCSTRING(isascii) |
3294
|
320 |
3361
|
321 @DOCSTRING(iscntrl) |
3294
|
322 |
3361
|
323 @DOCSTRING(isdigit) |
3294
|
324 |
3361
|
325 @DOCSTRING(isgraph) |
3294
|
326 |
6549
|
327 @DOCSTRING(isletter) |
|
328 |
3361
|
329 @DOCSTRING(islower) |
3294
|
330 |
3361
|
331 @DOCSTRING(isprint) |
3294
|
332 |
3361
|
333 @DOCSTRING(ispunct) |
3294
|
334 |
3361
|
335 @DOCSTRING(isspace) |
3294
|
336 |
3361
|
337 @DOCSTRING(isupper) |
3294
|
338 |
3361
|
339 @DOCSTRING(isxdigit) |