Re: XPA Excel .net communication


Rob Westland
 

Hi Peter,

you can alos rename the .xls to .zip. Unzip it. Look inside the zip and you see the xml-files which are the worksheets.

With a little bit of playing, you can change the xml-files ;-)

Rob


Op 19-3-2020 om 11:47 schreef Peter Ashworth:

Hi All

For a long time whenever we wanted some data output which would invariably be used in excel, We would just generate a tab delimited file (sometimes csv but mostly tab delimited), and just launch excel pointing at that file.

For the most part this is fine, although we have noticed things like telephone fields and a few others have leading 0's stripped off them. The only way to not have this happen was for us to manually open excel on its own, and then open the file from within excel which caused the import wizard to come into play, where you can then say treat this column as text.

We had a thought recently to try our hand at instead of just writing a pure text file and opening it in excel, could we just directly create an xlsx.

Test data: 572 columns 5000 rows

Method 1
Excel interop for c# was the first thing we tried. The first implementation was creating a List<List<string>> which is essentially a 2d array to store each cells contents. Then whilst doing a foreach around it setting each cells value through the excel interop. This turned out to be horrifically slow to the point where I never actually saw this successfully finish on my large dataset (it did finish on a 4 column 4 row test) as I had to bomb out, from further research I think this boils down because its an open instance of excel and there is a lot of overhead. I saw somewhere it suggested to set the application ScreenUpdating flag to false can improve performance but my dataset was large enough I couldnt see any improvement.

Method 2
I then found a suggested improved approach which seems like a .net equivalent of Heidis COM example from https://magicu-l.groups.io/g/main/files/OfficeCom%20for%20Magicxpa/OfficeMagic2.zip which I've also seen referenced to elsewhere on the forums, which is to by using vectors move the entire data in one call to a range of cells. However in my test of 572*5000 data this method failed to execute. It was fine in a 572*50 test. Maybe it was breaking some form of memory/size limit? I will also try and give Heidis COM example a try at somepoint, but I presume it will peform like c# interop and its just a different interface.

Method 3
I found a third party software called ClosedXML, currently hosted on github. This has an advantage in that its completely self contained and doesnt even need excel installed to work. however it seems to add an extra 50% processing time to perform, on the time having read all data into a c# array. so for my 5000 records it took 1 minute 40 seconds to process the data into a c# array. (not just one table, this is 5000 profiles in one table with data in supporting tables to be read). then about 50 seconds to decant that data through ClosedXML. Which is sort of alright but its still a pretty heavy time hit.

Method 4
Hyper focusing on our core issue of we dont want leading 0's to be stripped off fields. Whilst experimenting with all of the above I ran accross enough of the interop API to try just taking a tab dlimited file, and opening it in excel whilst trying to maintain a particular formatting.

object fieldInfo = new int[,] { { 1, 2}, { 2, 2 }, { 3, 2 }, { 4, 2 }, { 5, 2 }, { 6, 2 }, { 7, 2 }, { 8, 2 }, { 9, 2 }, { 10, 2 }, { 11, 2}, { 12, 2 }, { 13, 2 }, { 14, 2 }, { 15, 2 }, { 16, 2 }, { 17, 2 }, { 18, 2 }, { 19, 2 }, { 20, 2 }, { 21, 2}, { 22, 2 }, { 23, 2 }, { 24, 2 }, { 25, 2 }, { 26, 2 }, { 27, 2 }, { 28, 2 }, { 29, 2 }, { 30, 2 } };

xlApp.Workbooks.OpenText("c:\\selections\\petera\\18032020.TXT",

        false, 1,

        Excel.XlTextParsingType.xlDelimited,

        Excel.XlTextQualifier.xlTextQualifierDoubleQuote,

        Type.Missing, true, Type.Missing, Type.Missing,

        Type.Missing, Type.Missing, Type.Missing,

        fieldInfo, //field info

        Type.Missing, Type.Missing,

        Type.Missing, Type.Missing, Type.Missing);

The above lines of code defining all the collumns as format 2 which means text. and then using the open text function to pull all the data into the worksheet preformatted as text. And then you can use saveas to save it as an xlsx file if you want. For a pre-existing tab dlimited file of my 572*5000 data, it took 5 or 6 seconds to occur. which is pretty good.

Final thoughts
So maybe method 4 is the way to jump for us, but im still a bit stumped at how poorly the c# excel interop performed. Has anyone else tried to automate an excel file other than just raw text output to a delimited file, or Heidis COM example which I will give a try? Or is it just that im trying to output such large datasets thats hampering me?

Thanks for any thoughts

Peter

Join main@magicu-l.groups.io to automatically receive all group messages.