# HG changeset patch # User Rik # Date 1444079158 25200 # Node ID e34692daf663ec52bd88dcb2d3a21253527220fe # Parent 128414587af2645f0e04d8af54972ddea8798796 Extend parser to accept '_' in numbers. * NEWS: Announce change. * lex.ll: Define NUMBER to be a NUMREAL (real number) or NUMHEX (hex number). Define NUMREAL to begin with a digit (D) followed by more digits or an '_'. Define NUMHEX to begin with 0[xX], a hex digit (a-fA-F0-9), followed by '_' or more hex digits. Define EXPON to have a digit (D) followed by more digits or an '_'. * lex.ll (octave_base_lexer::handle_number): Strip out any underscores before processing number with sscanf. * parser.tst: Add tests for new behavior. diff -r 128414587af2 -r e34692daf663 NEWS --- a/NEWS Fri Oct 09 16:52:49 2015 -0400 +++ b/NEWS Mon Oct 05 14:05:58 2015 -0700 @@ -1,6 +1,12 @@ Summary of important user-visible changes for version 4.2: --------------------------------------------------------- + ** The parser has been extended to accept, but ignore, underscore characters + in numbers. This facilitates writing more legible code by using '_' as + a thousands separator or to group nibbles into bytes in hex constants. + + Examples: 1_000_000 == 1e6 or 0xDE_AD_BE_EF + ** The default colormap is now set to 'viridis' which is also the default colormap in matplotlib. This new colormap fixes some of the main issues with the old default colormap 'jet' diff -r 128414587af2 -r e34692daf663 libinterp/parse-tree/lex.ll --- a/libinterp/parse-tree/lex.ll Fri Oct 09 16:52:49 2015 -0400 +++ b/libinterp/parse-tree/lex.ll Mon Oct 05 14:05:58 2015 -0700 @@ -320,14 +320,17 @@ %} D [0-9] +D_ [0-9_] S [ \t] NL ((\n)|(\r)|(\r\n)) Im [iIjJ] CCHAR [#%] IDENT ([_$a-zA-Z][_$a-zA-Z0-9]*) FQIDENT ({IDENT}(\.{IDENT})*) -EXPON ([DdEe][+-]?{D}+) -NUMBER (({D}+\.?{D}*{EXPON}?)|(\.{D}+{EXPON}?)|(0[xX][0-9a-fA-F]+)) +EXPON ([DdEe][+-]?{D}{D_}*) +NUMREAL (({D}{D_}*\.?{D_}*{EXPON}?)|(\.{D}{D_}*{EXPON}?)) +NUMHEX (0[xX][0-9a-fA-F][0-9a-fA-F_]*) +NUMBER ({NUMREAL}|{NUMHEX}) ANY_EXCEPT_NL [^\r\n] ANY_INCLUDING_NL (.|{NL}) @@ -1124,9 +1127,9 @@ // the constant. %} -{D}+/\.[\*/\\^\'] | +{D}{D_}*/\.[\*/\\^\'] | {NUMBER} { - curr_lexer->lexer_debug ("{D}+/\\.[\\*/\\\\^\\']|{NUMBER}"); + curr_lexer->lexer_debug ("{D}{D_}*/\\.[\\*/\\\\^\\']|{NUMBER}"); if (curr_lexer->previous_token_may_be_command () && curr_lexer->space_follows_previous_token ()) @@ -2668,28 +2671,37 @@ char *yytxt = flex_yytext (); - if (looks_like_hex (yytxt, strlen (yytxt))) + // Strip any underscores + char *tmptxt = strsave (yytxt); + char *rptr = tmptxt; + char *wptr = tmptxt; + while (*rptr) + { + *wptr = *rptr++; + wptr += (*wptr != '_'); + } + *wptr = '\0'; + + if (looks_like_hex (tmptxt, strlen (tmptxt))) { unsigned long ival; - nread = sscanf (yytxt, "%lx", &ival); + nread = sscanf (tmptxt, "%lx", &ival); value = static_cast (ival); } else { - char *tmp = strsave (yytxt); - - char *idx = strpbrk (tmp, "Dd"); + char *idx = strpbrk (tmptxt, "Dd"); if (idx) *idx = 'e'; - nread = sscanf (tmp, "%lf", &value); - - delete [] tmp; + nread = sscanf (tmptxt, "%lf", &value); } + delete [] tmptxt; + // If yytext doesn't contain a valid number, we are in deep doo doo. assert (nread == 1); diff -r 128414587af2 -r e34692daf663 test/parser.tst --- a/test/parser.tst Fri Oct 09 16:52:49 2015 -0400 +++ b/test/parser.tst Mon Oct 05 14:05:58 2015 -0700 @@ -275,6 +275,13 @@ %! assert (a += b *= c += 1, 42); %! assert (b == 40 && c == 8); +## Test extended number format which allows '_' as NOP character +%!assert (123_456, 123456) +%!assert (.123_456, .123456) +%!assert (123_456.123_456, 123456.123456) +%!assert (0xAB_CD, 43981) +%!assert (2e0_1, 20); + ## Test creation of anonymous functions %!test