Skip to content

Fix flaky-test technology.tabula.TestSpreadsheetExtractor#testRTL#1

Open
same8891 wants to merge 2 commits intomasterfrom
fix-flaky
Open

Fix flaky-test technology.tabula.TestSpreadsheetExtractor#testRTL#1
same8891 wants to merge 2 commits intomasterfrom
fix-flaky

Conversation

@same8891
Copy link
Copy Markdown
Owner

@same8891 same8891 commented Oct 18, 2023

Test failure Reproduction

mvn install -pl . -am -DskipTests -Dsign.skip
mvn -pl . edu.illinois:nondex-maven-plugin:2.1.1:nondex -Dtest=technology.tabula.TestSpreadsheetExtractor#testRTL

Non-Dex detected flakiness and got the error message. More precisely as shown below:

[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.436 s <<< FAILURE! - in technology.tabula.TestSpreadsheetExtractor
[ERROR] testRTL(technology.tabula.TestSpreadsheetExtractor)  Time elapsed: 0.434 s  <<< FAILURE!
org.junit.ComparisonFailure: expected:<[اسمي سلطان]> but was:<[]>
	at technology.tabula.TestSpreadsheetExtractor.testRTL(TestSpreadsheetExtractor.java:458)

Root cause and fix

The failed assert is in line 458 file TestSpreadsheetExtractor.

assertEquals("اسمي سلطان", table.getRows().get(1).get(1).getText());

The flaky-test is caused by the function findSpreadsheetsFromCells() in SpreadsheetExtractionAlgorithm.java line 183. Because of using hashset and hashmap, this function will sometime return the result in different order.

public static List<Rectangle> findSpreadsheetsFromCells(List<? extends Rectangle> cells) {
    // via: http://stackoverflow.com/questions/13746284/merging-multiple-adjacent-rectangles-into-one-polygon
    List<Rectangle> rectangles = new ArrayList<>();
    Set<Point2D> pointSet = new HashSet<>();
    Map<Point2D, Point2D> edgesH = new HashMap<>();
    Map<Point2D, Point2D> edgesV = new HashMap<>();

This cause the flaky. To deal with this problem, I changed the hashset and hashmap to linkedhashset and linkedhashmap. The difference between [hashset,hashmap] and [linkedhashset,linkedhashmap] is that [linkedhashset,linkedhashmap] will return fixed order, but [hashset,hashmap] will return a random order. This ensure the function will be deterministic, which means it will return the result in fixed order.

@kevin952
Copy link
Copy Markdown

  1. Could you explain more on what assertion is necessary here? From this I mean, I understand your fix but it could be the order of the elements may not be required.
    PS: I had a similar issue where the order didnt matter but because the assertion was strict I used a LinkedHashSet as well. But I didnt need this and made an assertion change and it got accepted in the Real PR.
  2. If you could attach a link to the code that would be great.
  3. If you can mention that you used Nondex and attached the link of the github repo, it woud be great!

@same8891
Copy link
Copy Markdown
Owner Author

same8891 commented Nov 1, 2023

@kevin952 @Carol7102 I just update the comment. @kevin952 For sure, it is not necessary to use linkedhashmap for whole project. But for this test, it need use linkedhashmap to ensure the order of assertion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants