Skip to content

Conversation

@yocontra
Copy link

@yocontra yocontra commented Dec 17, 2020

Summary

We use the streaming parser and encountered a number of issues - sometimes files would stall out, sometimes they would always fail unless you opened and resaved the file, sometimes dates would just not be parsed for some files.

Major Changes

  • Change default arguments to match the docs (Closes docs: fix default options for streaming xlsx reader #1531)
  • Fixes a race condition where worksheets would start being read before the styles were parsed, causing dates to fail and show up as numbers (Closes [BUG] date parsing doesn't work in streaming mode #1430)
  • Fixes a stall (unreported?) in the BaseXForm parser - if closetag was encountered it would return, and the sax parser would fail to flush and just pause forever. The correct behavior should be to break so that it flushes the parser correctly.
  • Fixes a stall (unreported?) where worksheets would never be read.
    • I resolved this by implementing a basic queue to handle the waitingWorksheets logic - I think this cleans up the code a bit as well and makes it easier to understand.
  • Removes the iterateStream utility, which was causing stalls and race conditions (Closes [bug] iterate-stream causes race conditions in streaming mode #1558)
    • I read through all of the previous conversation on the unzipper project so I understand the initial reason why it was added.
    • All tests pass without it, I think this was just covering up for some of the previous bugs - file parser generators were stalling which bubbles up causing the unzipper generator to stall as well (and thus fail to autodrain, because that statement was never reached).

Minor Changes

  • Changes the getStream ducktype to match the other code in the project (just checking for a pipe function)
  • Use pipeline when writing files so that errors are properly managed
  • Removed some node 8 code - Node 8 has been deprecated/EOL since 2019 (https://blog.risingstack.com/update-nodejs-8-end-of-life-no-support/)
  • Switches to the promise version of autodrain, since your PR to unzipper making that work correctly landed and is published

Test plan

All of our edge case tests that use exceljs under the hood pass, and the existing exceljs tests pass. Let me know if you would like to see any additional tests.

Copy link
Member

@Siemienik Siemienik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @contra, Thank you for the great contribution :) Great job 🥇

Could I please you to do two points?

  1. Why test on MacOS with node v10 & v12 fails?
  2. Update this branch to the current master.

As these changes are major (min node version), I suppose we should separate major changes from another. To protect backwards-compatibility.

Additionally, I wan't to merge it without other contribution also approve major changes.


I'm contributing ExcelJS just for fun. If want you to know something more go into #1500

entries: 'ignore',
hyperlinks: 'cache',
styles: 'cache',
entries: 'emit',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that you have important reason here, so could I please you to explain, why changing these settings is obviusly? I'm little worried about losing performance here.

Imo, change defaults parameters should fails tests, what doesn't happen, hmm...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Siemienik I was asked to do this by @alubbe #1531 - the docs do not currently match the implementation, so this makes the implementation match the docs which is the intended behavior.

async *flushQueue(force) {
if (!force && !this.isDirectReady()) return; // not ready yet! do nothing
if (this.waitingWorkSheets.length === 0) return; // queue flushed!
const queue = this.waitingWorkSheets; // make a copy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I propose to pass waitingWorkSheets as an argument? It should be much safer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Siemienik flushQueue can be called multiple times asynchronously so there needs to be some mutable shared state with the worksheet queue on it - the alternative here would be passing an object in with the queue from the parent function. Based on everything else in this library it looked like mutable state living on the class was the preferred pattern.

if (!this.parseClose(value.name)) {
return this.model;
}
if (!this.parseClose(value.name)) break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which issue is it fix?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Fixes a stall (unreported?) in the BaseXForm parser - if closetag was encountered it would return, and the sax parser would fail to flush and just pause forever. The correct behavior should be to break so that it flushes the parser correctly."

let stream;
if (entryName.match(/xl\/media\//) ||
if (
entryName.match(/xl\/media\//) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which issue is it fix?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was just from running the project linter, if not I can revert this

@yocontra
Copy link
Author

yocontra commented Feb 17, 2021

@Siemienik Most of these changes are not going to be easy to land in a node 8 compatible way so I would suggest releasing this as a major or a part of the next major if one is planned.

If anyone feels strongly that this needs to be done in a node 8 compatible way feel free to amend this PR but I don't have the capacity to take that on, this was already a significant amount of debugging and work to sort this out.

@yocontra
Copy link
Author

yocontra commented Feb 17, 2021

@Siemienik Rebased and addressed the PR feedback. For the Mac issue - not sure, would have to look into how github runs their mac tests. It is pretty weird for it to fail only on mac when this entire suite is pretty OS independent.

@Siemienik Siemienik self-requested a review February 20, 2021 08:12
CarlOlson added a commit to coursebase/exceljs that referenced this pull request Nov 2, 2022
@jcalderonzumba
Copy link

jcalderonzumba commented Nov 10, 2022

@yocontra @Siemienik is there any progress on this one?
We are happy users of your library and this PR will help us close an outstanding issue we have with some excel files that we process in our system.

Is there anything I can do on my side to get this moved and merged?

@Siemienik
Copy link
Member

I would like too. However I am overloaded with my regular job.

@yocontra
Copy link
Author

yocontra commented Nov 16, 2022

@jcalderonzumba Could you test my fork against your workload and see if it resolves your issues? That would be helpful.

It looks like https://github.com/coursebase/exceljs is also a fork that incorporates my changes in this PR and some other fixes (unzipper) that were causing streaming issues.

@lucalas
Copy link

lucalas commented Oct 3, 2023

@Siemienik do you think there is a chance to review this pr and merge it?
We tried this fix and worked for us.

@albanm
Copy link

albanm commented Feb 13, 2024

I fully understand that sometimes time is lacking to maintain a library and that everybody here is working for free. But honestly these fixes are pretty important, IMO either these are fixed or the library should be considered as abandoned. At least there should be a disclaimer in the doc about the streaming parser having known issues, and the default options should be documented properly (styles being 'ignore' while documented as 'cache' is pretty bad).

Is there any hope of seeing this merged anytime ? Or does someone know a maintained fork that includes these fixes ?

@yocontra
Copy link
Author

@albanm You can always stick "github:yocontra/exceljs#fix-streams" in your package.json as the version to use this PR's version.

@hcminh
Copy link

hcminh commented Feb 22, 2024

is there any update on this, I really need this fixed parser 😢

@enjogole
Copy link

enjogole commented May 6, 2024

@yocontra I tried using your fork, but it still has a problem unless you opened and resaved the file, was this resolved?

Thanks.

@mo-gad
Copy link

mo-gad commented Jun 11, 2024

@Siemienik: This fix solves the issue we face as well (verified with "github:yocontra/exceljs#fix-streams"). Would really appreciate a final review and merge. Thanks a lot!

@moonjoungyoung
Copy link

Hello. Any changes on this PR?

@yocontra
Copy link
Author

Would one of the maintainers want to take over this PR and get it across the finish line? It seems like it needs some very minor love. Due to a new role I'm not able to allocate any time to this - but would like to see it land eventually since we've been going back and forth on it (and me rebasing it) for a little over 4 years.

Totally understand if maintainers are busy and maybe somebody else wants to fork off my branch and resubmit instead.

@matteovivona
Copy link

@yocontra I used pnpm patch to create a local patch with your PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] iterate-stream causes race conditions in streaming mode [BUG] date parsing doesn't work in streaming mode

10 participants