changeset 17244:9de751a10910

Add support for hexadecimal escape sequences in double-quoted strings (bug #39775) * lex.ll (<DQ_STRING_START>\\x[0-9a-fA-F]+): New pattern to parse hex escape sequences in double-quoted strings. * strings.txi: Uncomment and clean up the descriptions of octal and hex escape sequences.
author Mike Miller <mtmiller@ieee.org>
date Wed, 14 Aug 2013 02:07:07 -0400
parents f4c8c66faf34
children 7babcdb9bc13
files doc/interpreter/strings.txi libinterp/parse-tree/lex.ll
diffstat 2 files changed, 26 insertions(+), 15 deletions(-) [+]
line wrap: on
line diff
--- a/doc/interpreter/strings.txi	Wed Aug 14 00:13:40 2013 -0400
+++ b/doc/interpreter/strings.txi	Wed Aug 14 02:07:07 2013 -0400
@@ -119,21 +119,17 @@
 @item \v
 Represents a vertical tab, control-k, ASCII code 11.
 
-@c We don't do octal or hex this way yet.
-@c
-@c @item \@var{nnn}
-@c Represents the octal value @var{nnn}, where @var{nnn} are one to three
-@c digits between 0 and 7.  For example, the code for the ASCII ESC
-@c (escape) character is @samp{\033}.@refill
-@c 
-@c @item \x@var{hh}@dots{}
-@c Represents the hexadecimal value @var{hh}, where @var{hh} are hexadecimal
-@c digits (@samp{0} through @samp{9} and either @samp{A} through @samp{F} or
-@c @samp{a} through @samp{f}).  Like the same construct in @sc{ansi} C,
-@c the escape 
-@c sequence continues until the first non-hexadecimal digit is seen.  However,
-@c using more than two hexadecimal digits produces undefined results.  (The
-@c @samp{\x} escape sequence is not allowed in @sc{posix} @code{awk}.)@refill
+@item \@var{nnn}
+Represents the octal value @var{nnn}, where @var{nnn} are one to three
+digits between 0 and 7.  For example, the code for the ASCII ESC
+(escape) character is @samp{\033}.
+
+@item \x@var{hh}@dots{}
+Represents the hexadecimal value @var{hh}, where @var{hh} are hexadecimal
+digits (@samp{0} through @samp{9} and either @samp{A} through @samp{F} or
+@samp{a} through @samp{f}).  Like the same construct in @sc{ansi} C,
+the escape sequence continues until the first non-hexadecimal digit is seen.
+However, using more than two hexadecimal digits produces undefined results.
 @end table
 
 In a single-quoted string there is only one escape sequence: you may insert a
--- a/libinterp/parse-tree/lex.ll	Wed Aug 14 00:13:40 2013 -0400
+++ b/libinterp/parse-tree/lex.ll	Wed Aug 14 02:07:07 2013 -0400
@@ -694,6 +694,21 @@
       curr_lexer->string_text += static_cast<unsigned char> (result);
   }
 
+<DQ_STRING_START>\\x[0-9a-fA-F]+ {
+    curr_lexer->lexer_debug ("<DQ_STRING_START>\\\\x[0-9a-fA-F]+");
+
+    curr_lexer->current_input_column += yyleng;
+
+    int result;
+    sscanf (yytext+2, "%x", &result);
+
+    // Truncate the value silently instead of checking the range like
+    // we do for octal above.  This is to match C/C++ where any number
+    // of digits is allowed but the value is implementation-defined if
+    // it exceeds the range of the character type.
+    curr_lexer->string_text += static_cast<unsigned char> (result);
+  }
+
 <DQ_STRING_START>"\\a" {
     curr_lexer->lexer_debug ("<DQ_STRING_START>\"\\\\a\"");