Skip to content

Conversation

@haniyakonain
Copy link
Collaborator

@haniyakonain haniyakonain commented Oct 17, 2025

…namespace

Problem: substring() was being called on redirect titles that didn't start with the expected template namespace, causing IndexOutOfBoundsException.

Solution: Added filter to validate titles start with templateNamespace before substring operation. Invalid redirects are now logged as warnings and excluded from processing.

Summary by CodeRabbit

Release Notes

  • Refactor
    • Improved redirect mapping validation and logging for better system diagnostics.

…namespace

Problem: substring() was being called on redirect titles that didn't
start with the expected template namespace, causing IndexOutOfBoundsException.

Solution: Added filter to validate titles start with templateNamespace
before substring operation. Invalid redirects are now logged as warnings
and excluded from processing.
@coderabbitai
Copy link

coderabbitai bot commented Oct 17, 2025

Walkthrough

Modified the redirects mapping processing in MappingStatsHolder to validate that redirect titles start with the template namespace before filtering and swapping keys. Invalid redirects now log warnings instead of being silently processed.

Changes

Cohort / File(s) Summary
Redirects Mapping Validation
server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala
Refactored redirect mapping logic to add namespace validation and logging. Changed from direct filterKeys and swap to a two-stage approach: validate each redirect title against template namespace (log warnings for invalid), then filter by templateMappings presence and swap keys.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A rabbit hops through mappings, checking each redirect's way,
Validating namespaces with care, logging those that stray,
Templates aligned and filtered true, swapped with graceful care,
Warning whispers for the lost—transparency laid bare! 🔍✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "fix: IndexOutOfBoundsException when processing redirects with invalid" clearly and directly addresses the main objective of the changeset. The PR aims to fix an IndexOutOfBoundsException that occurred when substring() was called on redirect titles that didn't start with the expected templateNamespace, and the title accurately captures this bug fix. The title is specific, using the "fix:" prefix convention, and clearly identifies both the exception type and the context (processing redirects with invalid input), making it easy for developers scanning history to understand the primary change without being misleading or vague.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sonarqubecloud
Copy link

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala (2)

54-55: Remove commented-out code.

The commented-out "simple fix" implementation should be removed before merging. Version control preserves the history, so there's no need to keep it in the code.

Apply this diff:

-       // Simple fix (commented out): just filter out invalid redirects silently
-      // val redirects = wikiStats.redirects.filterKeys(title => title.startsWith(templateNamespace) && templateMappings.contains(title.substring(templateNamespace.length))).map(_.swap)
-

57-65: Core fix is correct; consider refactoring for efficiency and to avoid deprecated method.

The validation logic properly prevents the IndexOutOfBoundsException by ensuring titles start with templateNamespace before calling substring. The logging is consistent with the pattern used for templates (lines 29-51).

However, consider these improvements:

  1. filterKeys is deprecated in Scala 2.13+
  2. The two-stage filtering could be combined into a single pass for better performance

Apply this diff to combine filters and avoid the deprecated method:

-      // Better fix: filter out invalid redirects with warning logging
-      val redirects = wikiStats.redirects.filter { case (title, _) =>
-        if (title.startsWith(templateNamespace)) {
-          true
-        } else {
-          logger.warning(language.wikiCode + " redirect '" + title + "' does not start with '" + templateNamespace + "'")
-          false
-        }
-      }.filterKeys(title => templateMappings.contains(title.substring(templateNamespace.length))).map(_.swap)
+      // Better fix: filter out invalid redirects with warning logging
+      val redirects = wikiStats.redirects.filter { case (title, target) =>
+        if (title.startsWith(templateNamespace)) {
+          templateMappings.contains(title.substring(templateNamespace.length))
+        } else {
+          logger.warning(language.wikiCode + " redirect '" + title + "' does not start with '" + templateNamespace + "'")
+          false
+        }
+      }.map(_.swap)

Optionally, for better readability, use string interpolation:

-          logger.warning(language.wikiCode + " redirect '" + title + "' does not start with '" + templateNamespace + "'")
+          logger.warning(s"${language.wikiCode} redirect '$title' does not start with '$templateNamespace'")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a4335e9 and 6ad5eca.

📒 Files selected for processing (1)
  • server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala (1)
wiktionary/src/main/scala/org/dbpedia/extraction/mappings/WiktionaryPageExtractor.scala (1)
  • map (664-669)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build
  • GitHub Check: build
  • GitHub Check: build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant