-
Notifications
You must be signed in to change notification settings - Fork 291
Wikidata integration + Refactoring Core to accept new formats #155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
908cfce
draft core refactoring for wikidata
hadyelsahar 3124d7d
fixed type conflicts
hadyelsahar e668fff
added wikiPageExtractor trait to match extractors that accepts WikiPage
hadyelsahar 16ee866
various changes
jimkont 7ce3d27
Merge pull request #3 from jimkont/wikidata
hadyelsahar 1b7f815
added wikipage extractor
hadyelsahar 2811e1f
various changes by dimitris
hadyelsahar fed7493
changed updated ArticlePageExtractor in prop file
hadyelsahar dd6f3ca
fixed issues in Sever Module to deal with Parser -> Option
hadyelsahar b0671ab
fixed bugs in server module due to chainging Extractor[PageNode] to E…
hadyelsahar 9d96186
import scala.language.reflectiveCalls
jimkont e9976d4
adapt for the new parser output
jimkont f5eb297
remove obsolete classes
jimkont 909a741
throw exception on parsing error
jimkont edab2b6
merge with latest master
jimkont 3c73aa8
skipped HomepageExtractor test on merge
jimkont d5b578f
comment unneeded constants (todo: remove all & reuse Namespace class)
jimkont 6209a69
Adapt Live core to work with the refactored framework
jimkont 563287b
Merge pull request #4 from jimkont/wikidata
hadyelsahar 8984a53
rename Extractor with a more representative name
jimkont 0724dbe
fix HomepageExtractorTest (regression)
jimkont 848b195
Use revision URI as context in WikiPageExtractors
jimkont 28ca4fe
Naming coherence (review)
jimkont e50eefe
whitespace consistency
jimkont 115159e
fix comment
jimkont 460f3d0
adjust extraction properties with the renamed ProvenanceExtractor
jimkont 8f16d5f
Merge pull request #5 from jimkont/wikidata
hadyelsahar e535730
added comments to Datasets Destinations
hadyelsahar 867d0aa
remove extra space after Wikipage
hadyelsahar 6cf62af
remove extra space
hadyelsahar 50da5c0
remove extra space after JsonNode
hadyelsahar eda16c1
removing extra spaces
hadyelsahar 31bf421
removing extra spaces before WikiPage
hadyelsahar 52b2f09
removal of extraspace
hadyelsahar 8bdd1ef
deleted wikidataExtractor template file
hadyelsahar a67749b
aaa
hadyelsahar bdb4101
remove wikidataExtractor template
hadyelsahar 54c6b7d
add owl:sameas as ontologyproperty instead of string
hadyelsahar bc2b269
added edited namespace of wikdata entities inside dbpedia to be wikid…
hadyelsahar 6b661cd
page.wikiPage.sourceUri instead of page.wikiPage.title.pageIri
hadyelsahar File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
43 changes: 43 additions & 0 deletions
43
core/src/main/scala/org/dbpedia/extraction/mappings/ArticlePageExtractor.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| package org.dbpedia.extraction.mappings | ||
|
|
||
| import org.dbpedia.extraction.destinations.{DBpediaDatasets, Quad} | ||
| import org.dbpedia.extraction.wikiparser._ | ||
| import org.dbpedia.extraction.ontology.Ontology | ||
| import org.dbpedia.extraction.util.Language | ||
| import scala.collection.mutable.ArrayBuffer | ||
| import scala.language.reflectiveCalls | ||
|
|
||
| /** | ||
| * Extracts links to corresponding Articles in Wikipedia. | ||
| */ | ||
| class ArticlePageExtractor( | ||
| context : { | ||
| def ontology : Ontology | ||
| def language : Language | ||
| } | ||
| ) | ||
| extends PageNodeExtractor | ||
| { | ||
| // We used foaf:page here, but foaf:isPrimaryTopicOf is probably better. | ||
| private val isPrimaryTopicOf = context.ontology.properties("foaf:isPrimaryTopicOf") | ||
| private val primaryTopic = context.ontology.properties("foaf:primaryTopic") | ||
| private val dcLanguage = context.ontology.properties("dc:language") | ||
| private val typeOntProperty = context.ontology.properties("rdf:type") | ||
| private val foafDocument = context.ontology.classes("foaf:Document") | ||
|
|
||
| override val datasets = Set(DBpediaDatasets.LinksToWikipediaArticle) | ||
|
|
||
| override def extract(page : PageNode, subjectUri : String, pageContext : PageContext): Seq[Quad] = | ||
| { | ||
| if(page.title.namespace != Namespace.Main) return Seq.empty | ||
|
|
||
| val quads = new ArrayBuffer[Quad]() | ||
|
|
||
| quads += new Quad(context.language, DBpediaDatasets.LinksToWikipediaArticle, subjectUri, isPrimaryTopicOf, page.title.pageIri, page.sourceUri) | ||
| quads += new Quad(context.language, DBpediaDatasets.LinksToWikipediaArticle, page.title.pageIri, primaryTopic, subjectUri, page.sourceUri) | ||
| quads += new Quad(context.language, DBpediaDatasets.LinksToWikipediaArticle, page.title.pageIri, dcLanguage, context.language.wikiCode, page.sourceUri) | ||
| quads += new Quad(context.language, DBpediaDatasets.LinksToWikipediaArticle, page.title.pageIri, typeOntProperty, foafDocument.uri, page.sourceUri) | ||
|
|
||
| quads | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
5 changes: 3 additions & 2 deletions
5
core/src/main/scala/org/dbpedia/extraction/mappings/CompositeExtractor.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
26 changes: 26 additions & 0 deletions
26
core/src/main/scala/org/dbpedia/extraction/mappings/CompositeJsonNodeExtractor.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| package org.dbpedia.extraction.mappings | ||
| import org.dbpedia.extraction.wikiparser.JsonNode | ||
|
|
||
| class CompositeJsonNodeExtractor(extractors: Extractor[JsonNode]*) | ||
| extends CompositeExtractor[JsonNode](extractors: _*) | ||
| with JsonNodeExtractor | ||
|
|
||
| /** | ||
| * Creates new extractors. | ||
| */ | ||
| object CompositeJsonNodeExtractor | ||
| { | ||
| /** | ||
| * Creates a new extractor. | ||
| * | ||
| * TODO: using reflection here loses compile-time type safety. | ||
| * | ||
| * @param extractors List of extractor classes to be instantiated | ||
| * @param context Any type of object that implements the required parameter methods for the extractors | ||
| */ | ||
| def load(classes: Seq[Class[_ <: JsonNodeExtractor]], context: AnyRef): JsonNodeExtractor = | ||
| { | ||
| val extractors = classes.map(_.getConstructor(classOf[AnyRef]).newInstance(context)) | ||
| new CompositeJsonNodeExtractor(extractors: _*) | ||
| } | ||
| } |
80 changes: 80 additions & 0 deletions
80
core/src/main/scala/org/dbpedia/extraction/mappings/CompositeParseExtractor.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| package org.dbpedia.extraction.mappings | ||
|
|
||
| import org.dbpedia.extraction.destinations.Dataset | ||
| import org.dbpedia.extraction.destinations.Quad | ||
| import org.dbpedia.extraction.sources.WikiPage | ||
| import scala.collection.mutable.ArrayBuffer | ||
|
|
||
| /** | ||
| * TODO: generic type may not be optimal. | ||
| */ | ||
| class CompositeParseExtractor(extractors: Extractor[_]*) | ||
| extends WikiPageExtractor | ||
| { | ||
| override val datasets: Set[Dataset] = extractors.flatMap(_.datasets).toSet | ||
|
|
||
| override def extract(input: WikiPage, subjectUri: String, context: PageContext): Seq[Quad] = { | ||
|
|
||
| //val extractors = classes.map(_.getConstructor(classOf[AnyRef]).newInstance(context)) | ||
|
|
||
| //define different types of Extractors | ||
| val wikiPageExtractors = new ArrayBuffer[Extractor[WikiPage]]() | ||
| val pageNodeExtractors = new ArrayBuffer[PageNodeExtractor]() | ||
| val jsonNodeExtractors = new ArrayBuffer[JsonNodeExtractor]() | ||
| val finalExtractors = new ArrayBuffer[Extractor[WikiPage]]() | ||
| //to do: add json extractors | ||
|
|
||
| val quads = new ArrayBuffer[Quad]() | ||
|
|
||
| //if extractor is not either PageNodeExtractor or JsonNodeExtractor so it accepts WikiPage as input | ||
| extractors foreach { extractor => | ||
| extractor match { | ||
|
|
||
| case _ :PageNodeExtractor => pageNodeExtractors += extractor.asInstanceOf[PageNodeExtractor] //select all extractors which take PageNode to wrap them in WikiParseExtractor | ||
| case _ :JsonNodeExtractor => jsonNodeExtractors += extractor.asInstanceOf[JsonNodeExtractor] | ||
| case _ :WikiPageExtractor => wikiPageExtractors += extractor.asInstanceOf[Extractor[WikiPage]] //select all extractors which take Wikipage to wrap them in a CompositeExtractor | ||
| case _ => | ||
| } | ||
| } | ||
|
|
||
| if (!wikiPageExtractors.isEmpty) | ||
| finalExtractors += new CompositeWikiPageExtractor(wikiPageExtractors :_*) | ||
|
|
||
| //create and load WikiParseExtractor here | ||
| if (!pageNodeExtractors.isEmpty) | ||
| finalExtractors += new WikiParseExtractor(new CompositePageNodeExtractor(pageNodeExtractors :_*)) | ||
|
|
||
| //create and load JsonParseExtractor here | ||
| if (!jsonNodeExtractors.isEmpty) | ||
| finalExtractors += new JsonParseExtractor(new CompositeJsonNodeExtractor(jsonNodeExtractors :_*)) | ||
|
|
||
| if (finalExtractors.isEmpty) | ||
| Seq.empty | ||
| else | ||
| new CompositeExtractor[WikiPage](finalExtractors :_*).extract(input, subjectUri, context) | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Creates new extractors. | ||
| */ | ||
| object CompositeParseExtractor | ||
| { | ||
| /** | ||
| * Creates a new CompositeExtractor loaded with same type of Extractors[T] | ||
| * | ||
| * TODO: using reflection here loses compile-time type safety. | ||
| * | ||
| * @param classes List of extractor classes to be instantiated | ||
| * @param context Any type of object that implements the required parameter methods for the extractors | ||
| */ | ||
| def load(classes: Seq[Class[_ <: Extractor[_]]], context: AnyRef): WikiPageExtractor = | ||
| { | ||
| val extractors = classes.map(_.getConstructor(classOf[AnyRef]).newInstance(context)) | ||
| new CompositeParseExtractor(extractors: _*) | ||
| } | ||
| } | ||
|
|
||
|
|
||
|
|
||
|
|
9 changes: 9 additions & 0 deletions
9
core/src/main/scala/org/dbpedia/extraction/mappings/CompositeWikiPageExtractor.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| package org.dbpedia.extraction.mappings | ||
|
|
||
| import org.dbpedia.extraction.sources.WikiPage | ||
|
|
||
| class CompositeWikiPageExtractor(extractors: Extractor[WikiPage]*) | ||
| extends CompositeExtractor[WikiPage](extractors: _*) | ||
| with WikiPageExtractor | ||
|
|
||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
11 changes: 11 additions & 0 deletions
11
core/src/main/scala/org/dbpedia/extraction/mappings/JsonNodeExtractor.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| package org.dbpedia.extraction.mappings | ||
|
|
||
| import org.dbpedia.extraction.destinations.Quad | ||
| import org.dbpedia.extraction.wikiparser._ | ||
|
|
||
| /** | ||
| * Extractors are mappings that extract data from a JsonNode. | ||
| * Necessary to get some type safety in CompositeExtractor: | ||
| * Class[_ <: Extractor] can be checked at runtime, but Class[_ <: Mapping[PageNode]] can not. | ||
| */ | ||
| trait JsonNodeExtractor extends Extractor[JsonNode] |
35 changes: 35 additions & 0 deletions
35
core/src/main/scala/org/dbpedia/extraction/mappings/JsonParseExtractor.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| package org.dbpedia.extraction.mappings | ||
|
|
||
| import org.dbpedia.extraction.destinations.{Dataset, Quad} | ||
| import org.dbpedia.extraction.sources.WikiPage | ||
| import org.dbpedia.extraction.wikiparser.impl.json.JsonWikiParser | ||
|
|
||
| /** | ||
| * User: hadyelsahar | ||
| * Date: 11/19/13 | ||
| * Time: 12:43 PM | ||
| * | ||
| * JsonParseExtractor as explained in the design : https://f.cloud.github.com/assets/607468/363286/1f8da62c-a1ff-11e2-99c3-bb5136accc07.png | ||
| * | ||
| * send page to JsonParser, if jsonparser returns none do nothing | ||
| * if it's parsed correctly send the JsonNode to the next level extractors | ||
| * | ||
| * @param extractors a Sequence of CompositeJsonNodeExtractor | ||
| * | ||
| * */ | ||
| class JsonParseExtractor(extractors: CompositeJsonNodeExtractor)extends Extractor[WikiPage]{ | ||
|
|
||
| override val datasets: Set[Dataset] = extractors.datasets | ||
|
|
||
| override def extract(input: WikiPage, subjectUri: String, context: PageContext): Seq[Quad] = { | ||
|
|
||
| val parser = new JsonWikiParser() | ||
| val node = parser(input) | ||
| node match { | ||
| case Some(n) => extractors.extract(n, subjectUri, context) | ||
| case None => Seq.empty | ||
| } | ||
|
|
||
| } | ||
|
|
||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment to describe what this dataset is for?