You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Wikipedia2Vec is a tool used for obtaining embeddings (or vector representations) of words and entities (i.e., concepts that have corresponding pages in Wikipedia) from Wikipedia.
8
7
It is developed and maintained by [Studio Ousia](http://www.ousia.jp).
@@ -14,7 +13,7 @@ This tool implements the [conventional skip-gram model](https://en.wikipedia.org
14
13
15
14
An empirical comparison between Wikipedia2Vec and existing embedding tools (i.e., FastText, Gensim, RDF2Vec, and Wiki2vec) is available [here](https://arxiv.org/abs/1812.06280).
16
15
17
-
Documentation are available online at [http://wikipedia2vec.github.io/](http://wikipedia2vec.github.io/).
16
+
Documentation are available online at [http://wikipedia2vec.github.io/](http://wikipedia2vec.github.io/).
18
17
19
18
## Basic Usage
20
19
@@ -24,15 +23,15 @@ Wikipedia2Vec can be installed via PyPI:
24
23
% pip install wikipedia2vec
25
24
```
26
25
27
-
With this tool, embeddings can be learned by running a *train* command with a Wikipedia dump as input.
26
+
With this tool, embeddings can be learned by running a _train_ command with a Wikipedia dump as input.
28
27
For example, the following commands download the latest English Wikipedia dump and learn embeddings from this dump:
Then, the learned embeddings are written to *MODEL\_FILE*.
34
+
Then, the learned embeddings are written to _MODEL_FILE_.
36
35
Note that this command can take many optional parameters.
37
36
Please refer to [our documentation](https://wikipedia2vec.github.io/wikipedia2vec/commands/) for further details.
38
37
@@ -44,21 +43,21 @@ Pretrained embeddings for 12 languages (i.e., English, Arabic, Chinese, Dutch, F
44
43
45
44
Wikipedia2Vec has been applied to the following tasks:
46
45
47
-
* Entity linking: [Yamada et al., 2016](https://arxiv.org/abs/1601.01343), [Eshel et al., 2017](https://arxiv.org/abs/1706.09147), [Chen et al., 2019](https://arxiv.org/abs/1911.03834), [Poerner et al., 2020](https://arxiv.org/abs/1911.03681), [van Hulst et al., 2020](https://arxiv.org/abs/2006.01969).
48
-
* Named entity recognition: [Sato et al., 2017](http://www.aclweb.org/anthology/I17-2017), [Lara-Clares and Garcia-Serrano, 2019](http://ceur-ws.org/Vol-2421/eHealth-KD_paper_6.pdf).
49
-
* Question answering: [Yamada et al., 2017](https://arxiv.org/abs/1803.08652), [Poerner et al., 2020](https://arxiv.org/abs/1911.03681).
50
-
* Entity typing: [Yamada et al., 2018](https://arxiv.org/abs/1806.02960).
51
-
* Text classification: [Yamada et al., 2018](https://arxiv.org/abs/1806.02960), [Yamada and Shindo, 2019](https://arxiv.org/abs/1909.01259), [Alam et al., 2020](https://link.springer.com/chapter/10.1007/978-3-030-61244-3_9).
52
-
* Relation classification: [Poerner et al., 2020](https://arxiv.org/abs/1911.03681).
53
-
* Paraphrase detection: [Duong et al., 2018](https://ieeexplore.ieee.org/abstract/document/8606845).
54
-
* Knowledge graph completion: [Shah et al., 2019](https://aaai.org/ojs/index.php/AAAI/article/view/4162), [Shah et al., 2020](https://www.aclweb.org/anthology/2020.textgraphs-1.9/).
55
-
* Fake news detection: [Singh et al., 2019](https://arxiv.org/abs/1906.11126), [Ghosal et al., 2020](https://arxiv.org/abs/2010.10836).
56
-
* Plot analysis of movies: [Papalampidi et al., 2019](https://arxiv.org/abs/1908.10328).
57
-
* Novel entity discovery: [Zhang et al., 2020](https://arxiv.org/abs/2002.00206).
58
-
* Entity retrieval: [Gerritse et al., 2020](https://link.springer.com/chapter/10.1007%2F978-3-030-45439-5_7).
59
-
* Deepfake detection: [Zhong et al., 2020](https://arxiv.org/abs/2010.07475).
60
-
* Conversational information seeking: [Rodriguez et al., 2020](https://arxiv.org/abs/2005.00172).
61
-
* Query expansion: [Rosin et al., 2020](https://arxiv.org/abs/2012.12065).
46
+
- Entity linking: [Yamada et al., 2016](https://arxiv.org/abs/1601.01343), [Eshel et al., 2017](https://arxiv.org/abs/1706.09147), [Chen et al., 2019](https://arxiv.org/abs/1911.03834), [Poerner et al., 2020](https://arxiv.org/abs/1911.03681), [van Hulst et al., 2020](https://arxiv.org/abs/2006.01969).
47
+
- Named entity recognition: [Sato et al., 2017](http://www.aclweb.org/anthology/I17-2017), [Lara-Clares and Garcia-Serrano, 2019](http://ceur-ws.org/Vol-2421/eHealth-KD_paper_6.pdf).
48
+
- Question answering: [Yamada et al., 2017](https://arxiv.org/abs/1803.08652), [Poerner et al., 2020](https://arxiv.org/abs/1911.03681).
49
+
- Entity typing: [Yamada et al., 2018](https://arxiv.org/abs/1806.02960).
50
+
- Text classification: [Yamada et al., 2018](https://arxiv.org/abs/1806.02960), [Yamada and Shindo, 2019](https://arxiv.org/abs/1909.01259), [Alam et al., 2020](https://link.springer.com/chapter/10.1007/978-3-030-61244-3_9).
51
+
- Relation classification: [Poerner et al., 2020](https://arxiv.org/abs/1911.03681).
52
+
- Paraphrase detection: [Duong et al., 2018](https://ieeexplore.ieee.org/abstract/document/8606845).
53
+
- Knowledge graph completion: [Shah et al., 2019](https://aaai.org/ojs/index.php/AAAI/article/view/4162), [Shah et al., 2020](https://www.aclweb.org/anthology/2020.textgraphs-1.9/).
54
+
- Fake news detection: [Singh et al., 2019](https://arxiv.org/abs/1906.11126), [Ghosal et al., 2020](https://arxiv.org/abs/2010.10836).
55
+
- Plot analysis of movies: [Papalampidi et al., 2019](https://arxiv.org/abs/1908.10328).
56
+
- Novel entity discovery: [Zhang et al., 2020](https://arxiv.org/abs/2002.00206).
57
+
- Entity retrieval: [Gerritse et al., 2020](https://link.springer.com/chapter/10.1007%2F978-3-030-45439-5_7).
58
+
- Deepfake detection: [Zhong et al., 2020](https://arxiv.org/abs/2010.07475).
59
+
- Conversational information seeking: [Rodriguez et al., 2020](https://arxiv.org/abs/2005.00172).
60
+
- Query expansion: [Rosin et al., 2020](https://arxiv.org/abs/2012.12065).
0 commit comments