Mercurial > octave
changeset 27537:7dc31256c5e4
Document that regexp* functions need UTF-8 encoded input (bug #35910).
* regexp.cc (Fregexp, Fregexpi, Fregexpreg): Document that the input strings
must be UTF-8 encoded.
* NEWS: Announce support for UTF-8 encoded strings in regexp* functions.
author | Markus Mützel <markus.muetzel@gmx.de> |
---|---|
date | Thu, 17 Oct 2019 20:41:03 +0200 |
parents | d389416f0e50 |
children | 7f1fbc0541bd |
files | NEWS libinterp/corefcn/regexp.cc |
diffstat | 2 files changed, 13 insertions(+), 5 deletions(-) [+] |
line wrap: on
line diff
--- a/NEWS Mon Oct 21 11:50:20 2019 -0400 +++ b/NEWS Thu Oct 17 20:41:03 2019 +0200 @@ -40,6 +40,12 @@ Octave:colon-complex-argument : when any arg is complex Octave:colon-nonscalar-argument : when any arg is non-scalar +- The `regexp` and related functions now correctly handle and *require* + strings in UTF-8 encoding. As with any other function that requires + strings to be encoded in Octave's native encoding, you can use + "native2unicode" to convert from your preferred locale. For example, + the copyright symbol in UTF-8 is `native2unicode (169, "latin1")`. + #### Graphics backend - Graphic primitives now accept a color property value of `"none"`
--- a/libinterp/corefcn/regexp.cc Mon Oct 21 11:50:20 2019 -0400 +++ b/libinterp/corefcn/regexp.cc Thu Oct 17 20:41:03 2019 +0200 @@ -662,8 +662,8 @@ @deftypefnx {} {[@dots{}] =} regexp (@var{str}, @var{pat}, "@var{opt1}", @dots{}) Regular expression string matching. -Search for @var{pat} in @var{str} and return the positions and substrings of -any matches, or empty values if there are none. +Search for @var{pat} in UTF-8 encoded @var{str} and return the positions and +substrings of any matches, or empty values if there are none. The matched pattern @var{pat} can include any of the standard regex operators, including: @@ -1195,9 +1195,9 @@ Case insensitive regular expression string matching. -Search for @var{pat} in @var{str} and return the positions and substrings of -any matches, or empty values if there are none. @xref{XREFregexp,,regexp}, -for details on the syntax of the search pattern. +Search for @var{pat} in UTF-8 encoded @var{str} and return the positions and +substrings of any matches, or empty values if there are none. +@xref{XREFregexp,,regexp}, for details on the syntax of the search pattern. @seealso{regexp} @end deftypefn */) { @@ -1396,6 +1396,8 @@ The pattern is a regular expression as documented for @code{regexp}. @xref{XREFregexp,,regexp}. +All strings must be UTF-8 encoded. + The replacement string may contain @code{$i}, which substitutes for the ith set of parentheses in the match string. For example,