[SVCS-530] Xlsx duplicate header error fix#288
Open
AddisonSchiller wants to merge 4 commits intoCenterForOpenScience:developfrom
Open
[SVCS-530] Xlsx duplicate header error fix#288AddisonSchiller wants to merge 4 commits intoCenterForOpenScience:developfrom
AddisonSchiller wants to merge 4 commits intoCenterForOpenScience:developfrom
Conversation
The tabular renderer will no longer overwrite the values for headers that have the same name. Instead it will rename all duplicated headers in the format `name (1)`
cslzchen
requested changes
Nov 15, 2017
Contributor
cslzchen
left a comment
There was a problem hiding this comment.
In addition to our discussion, check the style as well.
| iteration = 0 | ||
| while increased_name in fields: | ||
| iteration += 1 | ||
| if iteration > 5000: |
Contributor
There was a problem hiding this comment.
Set iteration cap as a default argument and use a lower number for testing.
| assert sheet[1][0] == {'Name': 1.0, 'Dup (1)': 2.0, 'Dup (2)': 3.0, | ||
| 'Dup (3)': 4.0, 'Dup (4)': 5.0, 'Not Dup': 6.0} | ||
|
|
||
| # After demo it was suggested the iteration cap be raised. The value ended up to be about 5,000 |
Contributor
There was a problem hiding this comment.
As suggested above, use a default arg for iterations and set it lower. You can then use this for this test instead of having to make a file to iterate 5000 times.
…ular-file-renderer into feature/xlsx-duplicate-column-names-fix
Contributor
Author
|
@cslzchen , added max_iterations variable for testing. Re-enabled |
cslzchen
approved these changes
Nov 21, 2017
Contributor
cslzchen
left a comment
There was a problem hiding this comment.
Looks good and move to PCR 🎆 🎆
|
|
||
| def xlsx_xlrd(fp): | ||
| """Read and convert a xlsx file to JSON format using the xlrd library | ||
| def xlsx_xlrd(fp, max_iterations=5000): |
cslzchen
requested changes
Jun 18, 2018
Contributor
cslzchen
left a comment
There was a problem hiding this comment.
h/t @AddisonSchiller, PR looks good. I will take over and rebase it up-to-date.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ticket
https://openscience.atlassian.net/browse/SVCS-530
Purpose
The xlsx renderer tool for the tabular renderer was overwriting column values if there were duplicate names in the header names.
Changes
The tabular renderer will no longer overwrite the values
for headers that have the same name. Instead it will rename
all duplicated headers in the format
name (1)There is a very unlikely case where, if after searching for a name for 5000 iterations, it will use a UUID instead of a count.
Added some tests (One is commented out because of how hard it is to test)
Side effects
None that I know of
QA Notes
There is a zip file on the JIRA ticket with files to test with.