Skip to content

Conversation

@hadyelsahar
Copy link
Contributor

This pull request is mainly the output of GSoC2013 Wikidata Integration inside DBpedia Project

  • added JsonParser to Parse Json Pages Extracted from Wikidata
  • added multiple Extractors to Extract Triples out of Wikidata Json Dumps
  • updating the ontology file by adding Wikidata to DBpedia mappings for Properties and Types in the Mappings wiki
  • the core refactoring issue Refactor core to accept new formats #38 to easily accept new Page formats Like Json in case of Wikidata

there are two main Issues left to be done after the merge :

1 - there are two warnings left for JsonLift: 'withFilter' method does not yet exist on net.liftweb.json.JsonAST.JValue, using `filter' method instead, try mvn clean install to see them

2- the Design of the Wikidata Extractors and JsonWikiParser , were temporary design to cope with the old core , now the new core is redesigned so what we should do :

  • JsonNode should hold a JsonObject, not a list of Nodes
  • move all the extra code from JsonParser (that generates AST) into the individual extractors Every extractor should define at the beginning variables with the paths that are of interest (e.g. json \ "claims" \ ...) and later reuse these variables.
  • Create org.dbpedia.extraction.util.JSONUtils where you can store all the Json related custom function you create
  • adapt the Wikidata Extractors to new Wikidata Json Changes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming coherence:

val WikidataNamespaceSameAs = new Dataset("wikidata-namespace-sameas")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use WikiPage.sourceIri (which does not exist).
WikiPage has a revision property though, so a valid (and coherent with current framework) sourceIri could be:

WikiPage.title.pageIri + "?oldid=" + revision

similar to PageNode.sourceUri

PageNode.title.pageIri does not give enough context to this quad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove space after WikiPage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method doing anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it's the dummy Wikidata Extractor Structure that Dimitris created in the beginning of the task , we can remove the whole file it's obsolete anyways

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go ahead and remove it Hady

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@hadyelsahar
Copy link
Contributor Author

last edits :
1- handeled Andreas last two comments considering owl:sameas property and page.wikiPage.SourceUri

2- wikidata uris to be : "http://www.wikidata.dbpedia.org/resource/" instead of ""http://www.wikidata.org/entity/"
it was already implemented but i think we overwritten it when merging or something

jimkont added a commit that referenced this pull request Mar 1, 2014
Wikidata integration + Refactoring Core to accept new formats
@jimkont jimkont merged commit fb89d3b into dbpedia:master Mar 1, 2014
@jimkont jimkont modified the milestone: pastReleases Mar 19, 2015
jimkont added a commit to jimkont/extraction-framework that referenced this pull request Mar 26, 2015
Wikidata integration + Refactoring Core to accept new formats
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants