Topics

XPA Excel .net communication

Peter Ashworth
 

Hi All

For a long time whenever we wanted some data output which would invariably be used in excel, We would just generate a tab delimited file (sometimes csv but mostly tab delimited), and just launch excel pointing at that file.

For the most part this is fine, although we have noticed things like telephone fields and a few others have leading 0's stripped off them. The only way to not have this happen was for us to manually open excel on its own, and then open the file from within excel which caused the import wizard to come into play, where you can then say treat this column as text.

We had a thought recently to try our hand at instead of just writing a pure text file and opening it in excel, could we just directly create an xlsx.

Test data: 572 columns 5000 rows

Method 1
Excel interop for c# was the first thing we tried. The first implementation was creating a List<List<string>> which is essentially a 2d array to store each cells contents. Then whilst doing a foreach around it setting each cells value through the excel interop. This turned out to be horrifically slow to the point where I never actually saw this successfully finish on my large dataset (it did finish on a 4 column 4 row test) as I had to bomb out, from further research I think this boils down because its an open instance of excel and there is a lot of overhead. I saw somewhere it suggested to set the application ScreenUpdating flag to false can improve performance but my dataset was large enough I couldnt see any improvement.

Method 2
I then found a suggested improved approach which seems like a .net equivalent of Heidis COM example from https://magicu-l.groups.io/g/main/files/OfficeCom%20for%20Magicxpa/OfficeMagic2.zip which I've also seen referenced to elsewhere on the forums, which is to by using vectors move the entire data in one call to a range of cells. However in my test of 572*5000 data this method failed to execute. It was fine in a 572*50 test. Maybe it was breaking some form of memory/size limit? I will also try and give Heidis COM example a try at somepoint, but I presume it will peform like c# interop and its just a different interface.

Method 3
I found a third party software called ClosedXML, currently hosted on github. This has an advantage in that its completely self contained and doesnt even need excel installed to work. however it seems to add an extra 50% processing time to perform, on the time having read all data into a c# array. so for my 5000 records it took 1 minute 40 seconds to process the data into a c# array. (not just one table, this is 5000 profiles in one table with data in supporting tables to be read). then about 50 seconds to decant that data through ClosedXML. Which is sort of alright but its still a pretty heavy time hit.

Method 4
Hyper focusing on our core issue of we dont want leading 0's to be stripped off fields. Whilst experimenting with all of the above I ran accross enough of the interop API to try just taking a tab dlimited file, and opening it in excel whilst trying to maintain a particular formatting.

object fieldInfo = new int[,] { { 1, 2}, { 2, 2 }, { 3, 2 }, { 4, 2 }, { 5, 2 }, { 6, 2 }, { 7, 2 }, { 8, 2 }, { 9, 2 }, { 10, 2 }, { 11, 2}, { 12, 2 }, { 13, 2 }, { 14, 2 }, { 15, 2 }, { 16, 2 }, { 17, 2 }, { 18, 2 }, { 19, 2 }, { 20, 2 }, { 21, 2}, { 22, 2 }, { 23, 2 }, { 24, 2 }, { 25, 2 }, { 26, 2 }, { 27, 2 }, { 28, 2 }, { 29, 2 }, { 30, 2 } };

xlApp.Workbooks.OpenText("c:\\selections\\petera\\18032020.TXT",

        false, 1,

        Excel.XlTextParsingType.xlDelimited,

        Excel.XlTextQualifier.xlTextQualifierDoubleQuote,

        Type.Missing, true, Type.Missing, Type.Missing,

        Type.Missing, Type.Missing, Type.Missing,

        fieldInfo, //field info

        Type.Missing, Type.Missing,

        Type.Missing, Type.Missing, Type.Missing);

The above lines of code defining all the collumns as format 2 which means text. and then using the open text function to pull all the data into the worksheet preformatted as text. And then you can use saveas to save it as an xlsx file if you want. For a pre-existing tab dlimited file of my 572*5000 data, it took 5 or 6 seconds to occur. which is pretty good.

Final thoughts
So maybe method 4 is the way to jump for us, but im still a bit stumped at how poorly the c# excel interop performed. Has anyone else tried to automate an excel file other than just raw text output to a delimited file, or Heidis COM example which I will give a try? Or is it just that im trying to output such large datasets thats hampering me?

Thanks for any thoughts

Peter

lehoczki.viktor@...
 

Dear Peter,

I also had a problem with creating csv files.
My solution was for 0 leading text fields to use ="value" in the cell.
CSV file example:

 =0101;101
="0101";101
0101;101
="     0101";101

So I simple put =" before the field on the output form and " char after the field.

Best regards,
Viktor

Peter Ashworth
 

Hi Viktor

Thanks for the response. I've used that method before. It doesn't quite suit our intentions this time because in some cases the raw .txt/.csv could be used by other programs which wouldnt handle the ="" wrapper. We want as clean approach as possible which requires no explanation to end users.

If we can drive a xlsx file then they can always open it and do a save as to generate a .txt/.csv if its required, and all the formatting will behave.

Also yes our core concern at the moment is leading 0's, but also discovery of the methods of writing to an excel file and whats achievable with it.

Thanks

Peter

John Cummens
 

Excel interop will always be slow, as behind the scenes it is essentially opening a desktop Excel with no GUI, and feeding macros to do all the work.

I've used ClosedXML in the past in native .net projects, and found it to not only be massively faster than Excel Interop, but also much, much easier to work with.  As a bonus, you can also set formatting for any columns/etc that are desired, quite easily.  Big fan of this library for Excel interaction, after having tried nearly all other options out there.

You can also simply include quotation marks in your CSV output data surrounding each field, and, when Excel opens them it interprets quoted fields as strings, so it should keep any formatting.  Should be just as fast as the standard CSV, but obviously you can't do any other formatting.

(Greetings from a lurker from Montana)

Peter Ashworth
 

Hi John, Thanks for the input.
----
As a general update to the thread. I have just tried the COM method in Heidis sample project. I had to tweak the cell range calculator slightly as it was only account for 2 character column title, I presume because an earlier excel limit was only up to column IV (256) whereas now it can go up to XFD.

However my main problem I ran into was building an vector of vectors of cells. Essentially a 2d vector.  my cell size was large as my largest field (1500) char. This worded for a 5 row output but when i did my 5000 row output it had massive memory issues not surprisingly. I could try and write something to spit out 10 rows at a time but I imagine that wouldn't be very efficient either.

Any other thoughts?

Thanks

Peter

Wes Hein
 

If you are using ClosedXML you can populate a dotnet table and use one command to convert it to an excel sheet.  It preserves formatting but the excel sheet opens as a table in this instance.

Wes

Rob Westland
 

Hi Peter,

you can alos rename the .xls to .zip. Unzip it. Look inside the zip and you see the xml-files which are the worksheets.

With a little bit of playing, you can change the xml-files ;-)

Rob


Op 19-3-2020 om 11:47 schreef Peter Ashworth:

Hi All

For a long time whenever we wanted some data output which would invariably be used in excel, We would just generate a tab delimited file (sometimes csv but mostly tab delimited), and just launch excel pointing at that file.

For the most part this is fine, although we have noticed things like telephone fields and a few others have leading 0's stripped off them. The only way to not have this happen was for us to manually open excel on its own, and then open the file from within excel which caused the import wizard to come into play, where you can then say treat this column as text.

We had a thought recently to try our hand at instead of just writing a pure text file and opening it in excel, could we just directly create an xlsx.

Test data: 572 columns 5000 rows

Method 1
Excel interop for c# was the first thing we tried. The first implementation was creating a List<List<string>> which is essentially a 2d array to store each cells contents. Then whilst doing a foreach around it setting each cells value through the excel interop. This turned out to be horrifically slow to the point where I never actually saw this successfully finish on my large dataset (it did finish on a 4 column 4 row test) as I had to bomb out, from further research I think this boils down because its an open instance of excel and there is a lot of overhead. I saw somewhere it suggested to set the application ScreenUpdating flag to false can improve performance but my dataset was large enough I couldnt see any improvement.

Method 2
I then found a suggested improved approach which seems like a .net equivalent of Heidis COM example from https://magicu-l.groups.io/g/main/files/OfficeCom%20for%20Magicxpa/OfficeMagic2.zip which I've also seen referenced to elsewhere on the forums, which is to by using vectors move the entire data in one call to a range of cells. However in my test of 572*5000 data this method failed to execute. It was fine in a 572*50 test. Maybe it was breaking some form of memory/size limit? I will also try and give Heidis COM example a try at somepoint, but I presume it will peform like c# interop and its just a different interface.

Method 3
I found a third party software called ClosedXML, currently hosted on github. This has an advantage in that its completely self contained and doesnt even need excel installed to work. however it seems to add an extra 50% processing time to perform, on the time having read all data into a c# array. so for my 5000 records it took 1 minute 40 seconds to process the data into a c# array. (not just one table, this is 5000 profiles in one table with data in supporting tables to be read). then about 50 seconds to decant that data through ClosedXML. Which is sort of alright but its still a pretty heavy time hit.

Method 4
Hyper focusing on our core issue of we dont want leading 0's to be stripped off fields. Whilst experimenting with all of the above I ran accross enough of the interop API to try just taking a tab dlimited file, and opening it in excel whilst trying to maintain a particular formatting.

object fieldInfo = new int[,] { { 1, 2}, { 2, 2 }, { 3, 2 }, { 4, 2 }, { 5, 2 }, { 6, 2 }, { 7, 2 }, { 8, 2 }, { 9, 2 }, { 10, 2 }, { 11, 2}, { 12, 2 }, { 13, 2 }, { 14, 2 }, { 15, 2 }, { 16, 2 }, { 17, 2 }, { 18, 2 }, { 19, 2 }, { 20, 2 }, { 21, 2}, { 22, 2 }, { 23, 2 }, { 24, 2 }, { 25, 2 }, { 26, 2 }, { 27, 2 }, { 28, 2 }, { 29, 2 }, { 30, 2 } };

xlApp.Workbooks.OpenText("c:\\selections\\petera\\18032020.TXT",

        false, 1,

        Excel.XlTextParsingType.xlDelimited,

        Excel.XlTextQualifier.xlTextQualifierDoubleQuote,

        Type.Missing, true, Type.Missing, Type.Missing,

        Type.Missing, Type.Missing, Type.Missing,

        fieldInfo, //field info

        Type.Missing, Type.Missing,

        Type.Missing, Type.Missing, Type.Missing);

The above lines of code defining all the collumns as format 2 which means text. and then using the open text function to pull all the data into the worksheet preformatted as text. And then you can use saveas to save it as an xlsx file if you want. For a pre-existing tab dlimited file of my 572*5000 data, it took 5 or 6 seconds to occur. which is pretty good.

Final thoughts
So maybe method 4 is the way to jump for us, but im still a bit stumped at how poorly the c# excel interop performed. Has anyone else tried to automate an excel file other than just raw text output to a delimited file, or Heidis COM example which I will give a try? Or is it just that im trying to output such large datasets thats hampering me?

Thanks for any thoughts

Peter