You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/index.md
+23-22Lines changed: 23 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,10 @@
1
-
<h1id="main_title">Wikipedia2Vec</h1>
1
+
# Wikipedia2Vec
2
+
2
3
---
3
4
4
5
<aclass="github-button"href="https://github.com/wikipedia2vec/wikipedia2vec"data-size="large"data-show-count="true"aria-label="Star wikipedia2vec/wikipedia2vec on GitHub">Star</a>
5
6
6
-
###Introduction
7
+
## Introduction
7
8
8
9
Wikipedia2Vec is a tool used for obtaining embeddings (or vector representations) of words and entities (i.e., concepts that have corresponding pages in Wikipedia) from Wikipedia.
9
10
It is developed and maintained by [Studio Ousia](http://www.ousia.jp).
@@ -15,31 +16,31 @@ This tool implements the [conventional skip-gram model](https://en.wikipedia.org
15
16
16
17
An empirical comparison between Wikipedia2Vec and existing embedding tools (i.e., FastText, Gensim, RDF2Vec, and Wiki2vec) is available [here](https://arxiv.org/abs/1812.06280).
17
18
18
-
###Pretrained Embeddings
19
+
## Pretrained Embeddings
19
20
20
21
Pretrained embeddings for 12 languages (i.e., English, Arabic, Chinese, Dutch, French, German, Italian, Japanese, Polish, Portuguese, Russian, and Spanish) can be downloaded from [this page](pretrained.md).
21
22
22
-
###Use Cases
23
+
## Use Cases
23
24
24
25
Wikipedia2Vec has been applied to the following tasks:
25
26
26
-
* Entity linking: [Yamada et al., 2016](https://arxiv.org/abs/1601.01343), [Eshel et al., 2017](https://arxiv.org/abs/1706.09147), [Chen et al., 2019](https://arxiv.org/abs/1911.03834), [Poerner et al., 2020](https://arxiv.org/abs/1911.03681), [van Hulst et al., 2020](https://arxiv.org/abs/2006.01969).
27
-
* Named entity recognition: [Sato et al., 2017](http://www.aclweb.org/anthology/I17-2017), [Lara-Clares and Garcia-Serrano, 2019](http://ceur-ws.org/Vol-2421/eHealth-KD_paper_6.pdf).
28
-
* Question answering: [Yamada et al., 2017](https://arxiv.org/abs/1803.08652), [Poerner et al., 2020](https://arxiv.org/abs/1911.03681).
29
-
* Entity typing: [Yamada et al., 2018](https://arxiv.org/abs/1806.02960).
30
-
* Text classification: [Yamada et al., 2018](https://arxiv.org/abs/1806.02960), [Yamada and Shindo, 2019](https://arxiv.org/abs/1909.01259), [Alam et al., 2020](https://link.springer.com/chapter/10.1007/978-3-030-61244-3_9).
31
-
* Relation classification: [Poerner et al., 2020](https://arxiv.org/abs/1911.03681).
32
-
* Paraphrase detection: [Duong et al., 2018](https://ieeexplore.ieee.org/abstract/document/8606845).
33
-
* Knowledge graph completion: [Shah et al., 2019](https://aaai.org/ojs/index.php/AAAI/article/view/4162), [Shah et al., 2020](https://www.aclweb.org/anthology/2020.textgraphs-1.9/).
34
-
* Fake news detection: [Singh et al., 2019](https://arxiv.org/abs/1906.11126), [Ghosal et al., 2020](https://arxiv.org/abs/2010.10836).
35
-
* Plot analysis of movies: [Papalampidi et al., 2019](https://arxiv.org/abs/1908.10328).
36
-
* Novel entity discovery: [Zhang et al., 2020](https://arxiv.org/abs/2002.00206).
37
-
* Entity retrieval: [Gerritse et al., 2020](https://link.springer.com/chapter/10.1007%2F978-3-030-45439-5_7).
38
-
* Deepfake detection: [Zhong et al., 2020](https://arxiv.org/abs/2010.07475).
39
-
* Conversational information seeking: [Rodriguez et al., 2020](https://arxiv.org/abs/2005.00172).
40
-
* Query expansion: [Rosin et al., 2020](https://arxiv.org/abs/2012.12065).
41
-
42
-
###References
27
+
- Entity linking: [Yamada et al., 2016](https://arxiv.org/abs/1601.01343), [Eshel et al., 2017](https://arxiv.org/abs/1706.09147), [Chen et al., 2019](https://arxiv.org/abs/1911.03834), [Poerner et al., 2020](https://arxiv.org/abs/1911.03681), [van Hulst et al., 2020](https://arxiv.org/abs/2006.01969).
28
+
- Named entity recognition: [Sato et al., 2017](http://www.aclweb.org/anthology/I17-2017), [Lara-Clares and Garcia-Serrano, 2019](http://ceur-ws.org/Vol-2421/eHealth-KD_paper_6.pdf).
29
+
- Question answering: [Yamada et al., 2017](https://arxiv.org/abs/1803.08652), [Poerner et al., 2020](https://arxiv.org/abs/1911.03681).
30
+
- Entity typing: [Yamada et al., 2018](https://arxiv.org/abs/1806.02960).
31
+
- Text classification: [Yamada et al., 2018](https://arxiv.org/abs/1806.02960), [Yamada and Shindo, 2019](https://arxiv.org/abs/1909.01259), [Alam et al., 2020](https://link.springer.com/chapter/10.1007/978-3-030-61244-3_9).
32
+
- Relation classification: [Poerner et al., 2020](https://arxiv.org/abs/1911.03681).
33
+
- Paraphrase detection: [Duong et al., 2018](https://ieeexplore.ieee.org/abstract/document/8606845).
34
+
- Knowledge graph completion: [Shah et al., 2019](https://aaai.org/ojs/index.php/AAAI/article/view/4162), [Shah et al., 2020](https://www.aclweb.org/anthology/2020.textgraphs-1.9/).
35
+
- Fake news detection: [Singh et al., 2019](https://arxiv.org/abs/1906.11126), [Ghosal et al., 2020](https://arxiv.org/abs/2010.10836).
36
+
- Plot analysis of movies: [Papalampidi et al., 2019](https://arxiv.org/abs/1908.10328).
37
+
- Novel entity discovery: [Zhang et al., 2020](https://arxiv.org/abs/2002.00206).
38
+
- Entity retrieval: [Gerritse et al., 2020](https://link.springer.com/chapter/10.1007%2F978-3-030-45439-5_7).
39
+
- Deepfake detection: [Zhong et al., 2020](https://arxiv.org/abs/2010.07475).
40
+
- Conversational information seeking: [Rodriguez et al., 2020](https://arxiv.org/abs/2005.00172).
41
+
- Query expansion: [Rosin et al., 2020](https://arxiv.org/abs/2012.12065).
42
+
43
+
## References
43
44
44
45
If you use Wikipedia2Vec in a scientific publication, please cite the following paper:
45
46
@@ -86,6 +87,6 @@ Ikuya Yamada, Hiroyuki Shindo, [Neural Attentive Bag-of-Entities Model for Text
0 commit comments