-
Notifications
You must be signed in to change notification settings - Fork 291
fix: IndexOutOfBoundsException when processing redirects with invalid … #795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…namespace Problem: substring() was being called on redirect titles that didn't start with the expected template namespace, causing IndexOutOfBoundsException. Solution: Added filter to validate titles start with templateNamespace before substring operation. Invalid redirects are now logged as warnings and excluded from processing.
WalkthroughModified the redirects mapping processing in MappingStatsHolder to validate that redirect titles start with the template namespace before filtering and swapping keys. Invalid redirects now log warnings instead of being silently processed. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala (2)
54-55: Remove commented-out code.The commented-out "simple fix" implementation should be removed before merging. Version control preserves the history, so there's no need to keep it in the code.
Apply this diff:
- // Simple fix (commented out): just filter out invalid redirects silently - // val redirects = wikiStats.redirects.filterKeys(title => title.startsWith(templateNamespace) && templateMappings.contains(title.substring(templateNamespace.length))).map(_.swap) -
57-65: Core fix is correct; consider refactoring for efficiency and to avoid deprecated method.The validation logic properly prevents the IndexOutOfBoundsException by ensuring titles start with
templateNamespacebefore callingsubstring. The logging is consistent with the pattern used for templates (lines 29-51).However, consider these improvements:
filterKeysis deprecated in Scala 2.13+- The two-stage filtering could be combined into a single pass for better performance
Apply this diff to combine filters and avoid the deprecated method:
- // Better fix: filter out invalid redirects with warning logging - val redirects = wikiStats.redirects.filter { case (title, _) => - if (title.startsWith(templateNamespace)) { - true - } else { - logger.warning(language.wikiCode + " redirect '" + title + "' does not start with '" + templateNamespace + "'") - false - } - }.filterKeys(title => templateMappings.contains(title.substring(templateNamespace.length))).map(_.swap) + // Better fix: filter out invalid redirects with warning logging + val redirects = wikiStats.redirects.filter { case (title, target) => + if (title.startsWith(templateNamespace)) { + templateMappings.contains(title.substring(templateNamespace.length)) + } else { + logger.warning(language.wikiCode + " redirect '" + title + "' does not start with '" + templateNamespace + "'") + false + } + }.map(_.swap)Optionally, for better readability, use string interpolation:
- logger.warning(language.wikiCode + " redirect '" + title + "' does not start with '" + templateNamespace + "'") + logger.warning(s"${language.wikiCode} redirect '$title' does not start with '$templateNamespace'")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala (1)
wiktionary/src/main/scala/org/dbpedia/extraction/mappings/WiktionaryPageExtractor.scala (1)
map(664-669)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build
- GitHub Check: build
- GitHub Check: build



…namespace
Problem: substring() was being called on redirect titles that didn't start with the expected template namespace, causing IndexOutOfBoundsException.
Solution: Added filter to validate titles start with templateNamespace before substring operation. Invalid redirects are now logged as warnings and excluded from processing.
Summary by CodeRabbit
Release Notes