# HG changeset patch # User prnienhuis # Date 1395094535 0 # Node ID fa3fb3268caa6773484f551dc42ce83eaa3cd442 # Parent ae50074dd6ffe2f15cf4700b8a13c0aa1cdaf9c4 Enable write support for OOXML (lightly tested) diff -r ae50074dd6ff -r fa3fb3268caa main/io/inst/private/__OCT_oct2xlsx__.m --- a/main/io/inst/private/__OCT_oct2xlsx__.m Tue Mar 11 08:08:15 2014 +0000 +++ b/main/io/inst/private/__OCT_oct2xlsx__.m Mon Mar 17 22:15:35 2014 +0000 @@ -1,4 +1,5 @@ ## Copyright (C) 2013,2014 Markus Bergholz +## Parts Copyright (C) 2014 Philip Nienhuis ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software @@ -20,7 +21,7 @@ ## usage: ## __OCT_oct2xlsx__(filename, matrix, wsh, range, spsh_opts) ## -## matrix: have to be a 2D Matrix. NaN will be an empty cell in Excel. Strings are not supported! +## matrix: have to be a 2D cell array. NaN will be an empty cell in Excel. ## ## wsh: can be a 1, 2 or 3 for indexing the worksheet. >3 is not supported yet! ## wsh: can be string for naming the worksheet. the string has a limit length of 31! @@ -41,7 +42,8 @@ ## Elapsed time is 253.35110616684 seconds. ## @end deftypefn -## Author: Markus Bergholz +## Author: Markus Bergholz +## Amended by Philip Nienhuis ## ToDo ## - use english language as default (atm it's all german. "Tabelle1" instead of "Table1"). ## - fix write to defined range (rows still starts a row 1) @@ -67,110 +69,440 @@ ## Version 0.1 ## 2013/11/08 - Initial Release -function [xls, rstatus] = __OCT_oct2xlsx__ (filename, matrix, wsh=1, crange="", spsh_opts, obj_dims) +function [xls, rstatus] = __OCT_oct2xlsx__ (matrix, xls, wsh=1, crange="", spsh_opts, obj_dims) + + ## Analyze worksheet parameter & determine if new sheet is required + new_sh = 0; + if (ischar (wsh)) + if (31 < length (wsh)) + error ("Worksheet name longer than 31 characters is not supported by Excel"); + endif + wsh_number = strmatch (wsh, xls.sheets.sh_names); + if (isempty (wsh_number)) + ## Worksheet not in stack. We'll create a new one + new_sh = 1; + wsh_number = numel (xls.sheets.sh_names) + 1; + if (xls.changed == 3) + wsh_number--; + endif + xls.sheets.sh_names(end+1) = wsh; + endif + wsh_string = wsh; -% ####### Below code already in caller -%if (isempty (crange)) -% offset.row = 0; -% offset.col = 0; -%else -% offset.row = str2double (cell2mat (regexp (crange, '(\d*)', 'tokens'))) - 1; -% offset.col = __OCT_cc__ (cell2mat (cell2mat (regexp (crange, '([A-Z]*)', 'tokens')))) - 1; -%endif + elseif (isnumeric (wsh)) + if (wsh > numel (xls.sheets.sh_names)) + ## New worksheet + new_sh = 1; + ## Default sheet name + wsh_string = sprintf ("Sheet%d", wsh); + ## It may already exist... + while (! isempty (strmatch (wsh_string, xls.sheets.sh_names)) && length (wsh_string <= 31)) + wsh_string = strrep (wsh_string, "Sheet", "Sheet_"); + endwhile + if (length (string) > 31) + error ("oct2xls: cannot add worksheet with a unique name"); + endif + ## The sheet index number can't leave a gap in the stack, so: + wsh_number = numel (xls.sheets.sh_names) + 1; + xls.sheets.sh_names(end+1) = wsh_string; + else + wsh_number = wsh; + endif + endif + + if (spsh_opts.formulas_as_text == 0) + ## Provisionally only read/write strings, not formulas + ## FIXME actually a formula evaluator is required to process formulas + spsh_opts.formulas_as_text = 1; + endif -% ####### Below code already in oct2xls -%# Input matrix check -%# ================== -%if (ndims (matrix) != 2) -% error ("xlsxwrite only supports 2D matrix"); -%endif -% -%if (1048576 < rows (matrix)) -% error ("Too many rows. Excel is limited to 1,048,576 rows!"); -%endif -%if 16384 < columns (matrix) -% error ("Too many columns. Excel is limited to 16,384 colums!"); + if (new_sh) + rawarr = {}; + lims = []; + else + ## Get all data in sheet and row/column limits + [rawarr, xls] = __OCT_xlsx2oct__ (xls, wsh, "", spsh_opts); + lims = xls.limits; + endif + + ## Merge old and new data. Provisionally allow empty new data to wipe old data + [rawarr, lims, onr, onc] = __OCT_merge_data__ (rawarr, lims, matrix, obj_dims, spsh_opts); + +## FIXME - contains stuff that won't work with existing sheets +## (though I like the idea PRN) +%## something cool, that matlab doesn't support +%# xlswrite('myfile.xlsx',matrix,{'1','Sheetname'}) +%if (iscell (wsh)) +% # check size +% if (1 ~= rows (wsh) || 2 ~= columns (wsh)) +% error ("Too many input arguments for wsh"); +% endif +% # check first argument +% if (1 == ischar (wsh{1,1})) +% if (31 < length (wsh{1,1})) +% error ("Worksheet name longer than 31 characters is not supported by Excel"); +% endif +% wsh_string=wsh{1,1}; +% elseif (isnumeric (wsh{1,1})) +% if (1 ~= ismember (wsh{1,1} ,1:3)) +% error ("wsh index must be 1, 2 or 3"); +% endif +% wsh_number = wsh{1,1}; +% else +% error ("wsh should contain one numeric value (for indexing) and one string (for naming)"); +% endif +% +% # check second argument +% if (ischar (wsh{1,2})) +% if (31 < length(wsh{1,2})) +% error ("Worksheet name longer than 31 characters is not supported by Excel"); +% endif +% wsh_string = wsh{1,2}; +% elseif (isnumeric (wsh{1,2})) +% if (! ismember (wsh{1,2} ,1:3)) +% error ("wsh index must be 1, 2 or 3"); +% endif +% wsh_number = wsh{1,2}; +% else +% error ("wsh should contain one numeric value (for indexing) and one string (for naming)"); +% endif %endif -############## Nakijken, we hebben niet alles nodig +## What needs to be done: +## - Find out worksheet number (easy, wsh_number) +## - Write data to /xl/worksheets/sheet.xml +## * For each string, check (persistent var) if sharedStrings.xml exists +## > If not, create it. +## * For each string check if it is in /sharedStrings.xml +## > if YES, put pointer in new worksheet +## > if NO, add node in sharedStrings.xml and pointer in new worksheet +## - If needed (new file) update /workbook.xml w. sheet name & sheetId higher than any existing sheetId +## - Update workbook_rels.xml + + ## Write data to worksheet file + [xls, status] = __OCT_oct2xlsx_sh__ (xls, wsh_number, rawarr, lims, onc, onr, spsh_opts); + + ## Update worksheet bookkeeping + if (new_sh) ## !!!!! FIXME To be tested !!!!!! + ## Read xl/_rels/workbook.xml.rels + rid = fopen ([xls.workbook filesep "xl" filesep "_rels" filesep "workbook.xml.rels"], "r+"); + rxml = fread (rid, Inf, "char=>char").'; + fclose (rid); + ## Add worksheet entry. First find unique rId + rId = str2double (cell2mat (regexp (rxml, 'Id="rId(\d+)"', "tokens"))); + ## Assess rId (needed in [Content_Types].xml, below) + nwrId = sort (rId)(end) + 1; -## FIXME -# when file exist, it gets complicated -if (ischar (wsh)) - if (31 < length (wsh)) - error ("Worksheet name longer than 31 characters is not supported by Excel"); - endif - wsh_string = wsh; - wsh_number = 1; -elseif (isnumeric (wsh)) - if (1 == wsh) - wsh_number = wsh; - wsh_string = "Tabelle1"; - elseif (2 == wsh) - wsh_number = wsh; - wsh_string = ("Tabelle2"); - elseif 3 == wsh - wsh_number = wsh; - wsh_string = "Tabelle3"; - else - error('wsh index must be 1, 2 or 3'); - endif -endif + ## + wid = fopen ([xls.workbook filesep "xl" filesep "workbook.xml"], "r+"); + wxml = fread (wid, Inf, "char=>char").'; + fclose (wid); + [sheets, is, ie] = getxmlnode (wxml, "sheets"); + sheetids = str2double (cell2mat (regexp (sheets, ' sheetId="(\d+?)"', "tokens"))); + if (xls.changed == 3) + ## No new sheet, just update Sheet1 name + shnum = 1; + sheets = strrep (sheets, 'name="Sheet1"', ['name="' wsh_string '"']); + else + shnum = max(sheetids)+1; + wshtag = sprintf ('', ... + wsh_string, shnum, nwrId); + sheets = strrep (sheets, "/>", ["/>" wshtag ""]); + endif + ## Re(/over-)write workbook.xml; start at sheets node + wid = fopen ([xls.workbook filesep "xl" filesep "workbook.xml"], "w+"); + fprintf (wid, "%s", wxml(1:is-1)); + fprintf (wid, "%s", sheets); + fprintf (wid, "%s", wxml(ie+1:end)); + fclose (wid); -## something cool, that matlab doesn't support -# xlswrite('myfile.xlsx',matrix,{'1','Sheetname'}) -if (iscell (wsh)) - # check size - if (1 ~= rows (wsh) || 2 ~= columns (wsh)) - error ("Too many input arguments for wsh"); - endif - # check first argument - if (1 == ischar (wsh{1,1})) - if (31 < length (wsh{1,1})) - error ("Worksheet name longer than 31 characters is not supported by Excel"); + ## Write xl/_rels/workbook.xml.rels. Only needed for existing files/new sheets + if (xls.changed != 3) + ## workbook.xml.rels + entry = sprintf ('', nwrId, shnum); + rxml = strrep (rxml, "/>", ["/>" entry ""]); + rid = fopen ([xls.workbook filesep "xl" filesep "_rels" filesep "workbook.xml.rels"], "w"); + fprintf (rid, "%s", rxml); + fclose (rid); + ## [Content_Types].xml. Insert worksheet #n entry + tid = fopen ([xls.workbook filesep "[Content_Types].xml"], "r+"); + txml = fread (tid, Inf, "char=>char").'; + fclose (tid); + partname = ['' ]; + partname = sprintf (partname, nwrId); + srchstr = 'worksheet+xml"/>'; + ipos = strfind (txml, srchstr)(end) + length (srchstr); + tid = fopen ([xls.workbook filesep "[Content_Types].xml"], "w"); + fprintf (tid, "%s", txml(1:ipos-1)); + fprintf (tid, partname); + fprintf (tid, txml(ipos:end)); + fclose (tid); endif - wsh_string=wsh{1,1}; - elseif (isnumeric (wsh{1,1})) - if (1 ~= ismember (wsh{1,1} ,1:3)) - error ("wsh index must be 1, 2 or 3"); - endif - wsh_number = wsh{1,1}; - else - error ("wsh should contain one numeric value (for indexing) and one string (for naming)"); - endif - - # check second argument - if (ischar (wsh{1,2})) - if (31 < length(wsh{1,2})) - error ("Worksheet name longer than 31 characters is not supported by Excel"); - endif - wsh_string = wsh{1,2}; - elseif (isnumeric (wsh{1,2})) - if (! ismember (wsh{1,2} ,1:3)) - error ("wsh index must be 1, 2 or 3"); + + ## + aid = fopen ([xls.workbook filesep "docProps" filesep "app.xml"], "r+"); + axml = fread (wid, Inf, "char=>char").'; + fclose (aid); + wshnode = sprintf ('%s', wsh_string); + if (xls.changed == 3) + [vt, is, ie] = getxmlnode (axml, "TitlesOfParts"); + ## Just replace Sheet1 entry by new name + vt = strrep (vt, '>Sheet1<', ['>' wsh_string '<']); + else + ## 1. Update HeadingPairs node + [vt1, is, ie] = getxmlnode (axml, "HeadingPairs"); + ## Bump number of entries + nshts = str2double (getxmlnode (vt1, "vt:i4", [], 1)) + 1; + vt1 = regexprep (vt1, '(\d+)', ["" sprintf("%d", nshts) ""]); + ## 2. Update TitlesOfParts node + [vt2, ~, ie] = getxmlnode (axml, "TitlesOfParts", ie); + ## Bump number of entries + nshts = str2double (getxmlattv (vt2, "size")) + 1; + vt2 = regexprep (vt2, 'size="(\d+)"', ['size="' sprintf("%d", nshts) '"']); + ## Add new worksheet entry + vt2 = strrep (vt2, "", ["" wshnode ""]); + vt = [vt1 vt2]; endif - wsh_number = wsh{1,2}; - else - error ("wsh should contain one numeric value (for indexing) and one string (for naming)"); + ## Re(/over-)write workbook.xml; start at sheets node + aid = fopen ([xls.workbook filesep "docProps" filesep "app.xml"], "w+"); + fprintf (wid, "%s", axml(1:is-1)); + fprintf (wid, "%s", vt); + fprintf (wid, "%s", axml(ie+1:end)); + fclose (wid); endif -endif -############ Kan vervangen door één functie-call met [be|over]schrijven worksheet - -%if (! exist (filename, "file")) + ## If needed update sharedStrings entries xml descriptor files + if (status > 1) + ## workbook_rels.xml + rid = fopen ([xls.workbook filesep "xl" filesep "_rels" filesep "workbook.xml.rels"], "r+"); + rxml = fread (rid, Inf, "char=>char").'; + fclose (rid); + if (isempty (strmatch ("sharedStrings", rxml))) + ## Add sharedStrings.xml entry. First find unique rId + rId = str2double (cell2mat (regexp (rxml, 'Id="rId(\d+)"', "tokens"))); + nwrId = sort (rId)(end) + 1; + entry = sprintf ('', nwrId); + rxml = strrep (rxml, "/>", ["/>" entry ""]); + rid = fopen ([xls.workbook filesep "xl" filesep "_rels" filesep "workbook.xml.rels"], "w"); + fprintf (rid, "%s", rxml); + fclose (rid); + endif - __OCT_OOXML_create_file__ (filename, matrix, wsh_number, wsh_string, offset); + ## [Content_Types].xml + tid = fopen ([xls.workbook filesep "[Content_Types].xml"], "r+"); + txml = fread (tid, Inf, "char=>char").'; + fclose (tid); + if (isempty (strmatch ("sharedStrings", txml))) + ## Add sharedStrings.xml entry after styles.xml node. First find that one + [~, ~, ipos] = regexp (txml, '(?:styles\+xml)(?:.+)(>' ... + txml(ipos+1:end)]; + tid = fopen ([xls.workbook filesep "[Content_Types].xml"], "w"); + fprintf (tid, "%s", txml); + fclose (tid); + endif + + endif -%elseif (2 == exist (filename,"file")); -% -% error('sorry, edit an existing file is not implemented yet') -% %__OOXML_modify_file(filename, matrix, wsh_number, wsh_string); -% -%else -% -% error ("%s is not a file!", filename) -% -%endif + ## - /docProps/core.xml (user/modifier info & date/time) + cid = fopen ([xls.workbook filesep "docProps" filesep "core.xml"], "r+"); + cxml = fread (cid, Inf, "char=>char").'; + fclose (cid); + cxml = regexprep (cxml, 'dBy>(\w+)GNU Octave']; + xml = [ xml '']; + xml = [ xml '']; + xml = [ xml '']; + xml = [ xml '']; + xml = [ xml '' ]; + else + ## Read complete contents + xml = fread (fid, Inf, "char=>char").'; + fclose (fid); + endif + + ## Update "dimension" (=range) node + [dimnode, is1, ie1] = getxmlnode (xml, "dimension"); + ## Compute new limits + rng = sprintf ("%s:%s", calccelladdress (lims(2, 1), lims(1, 1)), ... + calccelladdress (lims(2, 2), lims(1, 2))); + + ## Open sheet file (new or old) in reset mode, write first part of worksheet + fid = fopen ([xls.workbook filesep "xl" filesep "worksheets" filesep ... + sprintf("sheet%d.xml", wsh_number)], "w+"); + fprintf (fid, "%s", xml(1:is1-1)); + ## Write updated dimension node + fprintf (fid, '', rng); + + ## Get Sheetdata node + [shtdata, is2, ie2] = getxmlnode (xml, "sheetData"); + ## Write second block of xml until start of sheetData + fprintf (fid, "%s", [xml(ie1+1:is2-1) ""]); + + ## Explore data types in matrix + typearr = spsh_prstype (matrix, onr, onc, [1:5], spsh_opts); + + if (all (typearr(:) == 1)) ## Numeric +# write matrix to sheet%%WSH%%.xml +# __OOXML_turbowrite__(sprintf("%s/xl/worksheets/sheet%d.xml",tmpdir,wsh_number), matrix); + for r=1:rows(matrix) + fprintf (fid, '', r , ... + 1+offset. row, columns(matrix)+offset.row); + for c = 1:columns(matrix) + if 0 == isnan (matrix(r,c)) + fprintf(fid, sprintf('%f', __OCT_cc__(c+offset.col), r, matrix(r,c))); + endif + endfor + fprintf(f, ''); + endfor + + else + ## Heterogeneous array. Write cell nodes depending on content + strings = {}; + str_cnt = uniq_str_cnt = 0; + ## Check if there are any strings + if (any (typearr(:) == 3)) + ## Yep. Read sharedStrings.xml + try + sid = fopen (sprintf ("%s/xl/sharedStrings.xml", xls.workbook), "r+"); + if (sid > 0) + ## File exists => there are already some strings in the sheet + shstr = fread (sid, "char=>char").'; + fclose (sid); + ## Extract string values. May be much more than present in current sheet + strings = cell2mat (regexp (shstr, '(.+?)|(.*)/>)', "tokens")); + uniq_str_cnt = str2double (getxmlattv (shstr, "uniqueCount")); + ## Make shstr a mueric value + shstr = 1; + else + ## File didn't exist yet + shstr = 0; + endif + catch + ## No sharedStrings.xml; implies no "fixed" strings (computed strings can still be there) + strings = {}; + str_cnt = uniq_str_cnt = 0; + end_try_catch + endif + ## Process data row by row + for ii=1:rows (matrix) + ## Row node opening tag + fprintf (fid, '', ii+lims(2, 1)-1, lims(1, 1), lims(1, 2)); + for jj=1:columns (matrix) + ## Init required attributes. Note leading space + addr = sprintf (' r="%s"', calccelladdress (ii+lims(2, 1)-1, jj+lims(1, 1)-1)); + ## Init optional atttributes + stag = ttag = form = ""; ## t: e = error, b = boolean, s/str = string + switch typearr(ii, jj) + case 1 ## Numeric + ## t tag ("type") is omitted for numeric data + val = ["" strtrim(sprintf ("%25.10f", matrix{ii, jj})) ""]; + case 2 ## Boolean + ttag = ' t="b"'; + if (matrix{ii, jj}) + val = ["1"]; + else + val = ["0"]; + endif + case 3 ## String + ttag = ' t="s"'; + ## FIXME s value provisionally set to 0 +%% stag = ' s="0"'; + sptr = strmatch (matrix{ii, jj}, strings, "exact"); + if (isempty (sptr)) + ## Add new string + strings = [strings matrix{ii, jj}]; + ++uniq_str_cnt; + ## New pointer into sharedStrings (0-based) + sptr = uniq_str_cnt; + endif + ## Val must be pointer (0-based) into sharedStrings.xml + val = sprintf ("%d", sptr - 1); + ++str_cnt; + case 4 ## Formula + form = sprintf ("%s", matrix{ii, jj}(2:end)); + #val = "?"; + val = " "; + otherwise ## (includes "case 5" + ## Empty value. Clear address + addr = ''; + endswitch + ## Append node to file, include tags + if (! isempty (addr)) + fprintf (fid, '', addr, stag, ttag); + if (! isempty (val)) + fprintf (fid, "%s%s", form, val); + endif + fprintf (fid, ""); + endif + endfor + fprintf(fid, ''); + endfor + endif + + ## Closing tag + fprintf (fid, ""); + ## Append rest of original xml to file and close it + fprintf (fid, "%s", xml(ie2+1:end)); + fclose (fid); + + ## Rewrite sharedStrings.xml, if required + if (any (typearr(:) == 3)) + ## (Re-)write xl/sharedStrings.xml + sid = fopen (sprintf ("%s/xl/sharedStrings.xml", xls.workbook), "w+"); + fprintf (sid, '\n'); + fprintf (sid, '', ... + str_cnt, uniq_str_cnt); + for ii=1:uniq_str_cnt + fprintf (sid, "%s", strings{ii}); + endfor + fprintf (sid, ""); + fclose (sid); + ## Check if new sharedStrings file entires are required + if (isnumeric (shstr) && (! shstr)) + rstatus = 2; + return; + endif + endif + + ## Return + rstatus = 1; + endfunction diff -r ae50074dd6ff -r fa3fb3268caa main/io/inst/private/__OCT_spsh_open__.m --- a/main/io/inst/private/__OCT_spsh_open__.m Tue Mar 11 08:08:15 2014 +0000 +++ b/main/io/inst/private/__OCT_spsh_open__.m Mon Mar 17 22:15:35 2014 +0000 @@ -1,4 +1,4 @@ -## Copyright (C) 2013 Philip Nienhuis +## Copyright (C) 2013,2014 Philip Nienhuis ## Copyright (C) 2013 Markus Bergholz (.xlsx & archive unzip stuff) ## ## This program is free software; you can redistribute it and/or modify @@ -40,7 +40,10 @@ ## '' Shuffled code around to file type order ## 2014-01-22 Open new files from template directory in io script directory ## 2014-01-23 Move confirm_recursive_rmdir to __OCT_spsh_close__.m -## 2014-01-29 Support for xlsx and gnumeric +## 2014-01-29 Support for xlsx and gnumeric +## 2014-02-08 Fix wrong function name in error msg (couldn't be unzipped) +## 2014-03-17 Simplify sheet names discovery +## '' Ignore SheetId (nowhere used) function [ xls, xlssupport, lastintf] = __OCT_spsh_open__ (xls, xwrite, filename, xlssupport, ftype) @@ -89,7 +92,7 @@ fid = fopen (sprintf ('%s/xl/workbook.xml', tmpdir)); if (fid < 0) ## File open error - warning ("xls2oct: file %s couldn't be unzipped", filename); + warning ("xls2open: file %s couldn't be unzipped", filename); xls = []; return else @@ -106,8 +109,9 @@ fclose (fid); ## Get sheet names and indices - xls.sheets.sh_names = cell2mat (regexp (xml, '