[deprecated - too many problems w dataset] Kylel/semeval2017 by kyleclo · Pull Request #16 · allenai/scibert

kyleclo · 2019-02-18T18:05:11Z

just the NER portion

…generate NER data for conll2003

…bigger than mention

armancohan

A few comments. Otherwise looks good.

armancohan · 2019-02-18T18:17:45Z

scripts/semeval2017_to_coll2003_and_relations.py

+                spacy_text = nlp(text)
+
+            # split sentences & tokenize
+            sent_token_spans = TokenSpan.find_sent_token_spans(text=text, sent_tokens=[[token.text for sent in spacy_text.sents for token in sent if token.text.strip() != '']])


BERT will probably complain about long sequences (entire abstract). So we need to do this at sentence level.

armancohan · 2019-02-18T18:19:50Z

sci_bert/common/span.py

+        return spans
+
+    @classmethod
+    def find_sent_token_spans(cls, text: str, sent_tokens: List[List[str]]) -> List[List['TokenSpan']]:


Seems duplicate of the first funciton.

armancohan · 2019-02-18T18:27:59Z

sci_bert/common/span.py

+            temp = []
+            index_sent += 1
+
+    # append remaining mentions


Comment that this is the last sentence only.

armancohan · 2019-02-18T18:29:08Z

sci_bert/common/span.py

+    assert len(sent_mention_spans) == len(sent_token_spans)
+    return sent_mention_spans
+
+    #


Consider removing unused code.

armancohan · 2019-02-18T18:32:14Z

scripts/semeval2017_to_coll2003_and_relations.py

+
+
+
+


You can comment these out as they are not related to entities.

…be too long for bert

…al format

arthurbrack · 2019-04-04T18:45:11Z

I have trained the model with the semeval2017 data and evaluated it afterwards with semeval2017 evaluation script. I'am confused why I get so different results:
"best_validation_f1-measure-overall": 0.5288376220052742
"test_f1-measure-overall": 0.4320540671010848,

Results reported by the semeval2017 evaluation script on test set:
precision recall f1-score support

Material 0.44 0.43 0.44 904
Process 0.41 0.36 0.38 954
Task 0.17 0.15 0.16 193

avg / total 0.40 0.37 0.39 2051

So F1 is 0.39.

Do you have maybe a reasonable explanation for this difference?

kyleclo · 2019-04-04T19:48:13Z

hey @arthurbra thanks for your interest! i've been looking into this difference as well, and it looks like the task definitions are different between what we've implemented here & the original semeval2017 task.

specifically, the 3 tasks in semeval2017 are (1) entity identification, (2) entity type classification, and (3) relation extraction. what we've implemented here combines both (1) and (2) into a single task (we're both extraction & tagging w/ the type at the same time). this will affect how f1 scoring is performed.

instead of using the sequence tagging model we have here, my plan is to adapt an existing model made for the semeval2017 task, and substitute in the various bert variants to replace glove/w2v embeddings.

arthurbrack · 2019-04-06T07:31:56Z

Hey @kyleclo thanks for your reply and explanation. I am really impressed by your results and want to learn more.

It is right, you perform entity identification and classification in one task. In my understanding this is task B in semeval 2017:
Subtask B: t_B = O,M, P, T for tokens being out- side a keyphrase, or being part of a material, pro- cess or task.

In calculateMeasures of semeval2017 evaluation script you can pass the parameter remove_anno="rel" which should ignore relations during evaluation (as far as I understand the code). I already used this parameter in the evaluation of my prev. post. So I assume there should be an another explanation.

It would be great if you could apply the code of the semeval 2017 winner with BERT. Unfortunatelly I was not able to find it (AI2 system of Ammar et. al.).

arthurbrack · 2019-10-19T17:54:39Z

Hi Kyle,

I think I have found the problem: in the prediction script model.eval() is not called, so that dropout is active during prediction.

Regards
Arthur

kyleclo added 5 commits February 17, 2019 22:02

Span data structure; functions for handling span labeling; script to …

6e6675a

…generate NER data for conll2003

fix bug where infinite while loop when labeling tokens because token …

4fcc10a

…bigger than mention

add logging to know if script has stalled

da1adca

add try-except to catch bad annotation in brat

f626a64

semeval2017 ner dataset

77f37a5

kyleclo requested review from armancohan and ibeltagy February 18, 2019 18:05

typsetting

6c0291d

armancohan approved these changes Feb 18, 2019

View reviewed changes

kyleclo added 18 commits February 18, 2019 11:44

refactor Span stuff for clarity

11bb876

fix bug in Span where recursively calling itself in <> and <= >= methods

33cd99a

add better comments; revert to sentence splitting since contexts can …

66db2a3

…be too long for bert

fix bug in span where wrong length

15921ca

handle bug with whitespacing in entity mention annotations

a092ba7

updated data to semeval2017 ner

cccd81c

clean up script for ner

48725b6

split semeval2017 script into NER and REL

f583662

relex data for semeval17

ee37144

data structure for Relation Mention

4b4edf8

Merge branch 'master' into kylel/semeval2017

fff051e

add char start/stop spans to conll2003 data for semeval

213346b

add script to add spans to semeval17 conll data

1a45df2

readd chunk label to semeval data; move end span to pos location

08f36bd

script for loading allenlnp model from beaker and predicting in semev…

acc5da9

…al format

scienceie2017 scripts

ab9d62b

test files for evaluating semeval

85b6499

update semeval predict script to pull experiment from beaker

2e77256

kyleclo changed the title ~~Kylel/semeval2017~~ [deprecated - too many problems w dataset] Kylel/semeval2017 Jun 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[deprecated - too many problems w dataset] Kylel/semeval2017#16

[deprecated - too many problems w dataset] Kylel/semeval2017#16
kyleclo wants to merge 24 commits intomasterfrom
kylel/semeval2017

kyleclo commented Feb 18, 2019

Uh oh!

armancohan left a comment

Uh oh!

armancohan Feb 18, 2019

Uh oh!

armancohan Feb 18, 2019

Uh oh!

armancohan Feb 18, 2019

Uh oh!

armancohan Feb 18, 2019

Uh oh!

armancohan Feb 18, 2019

Uh oh!

arthurbrack commented Apr 4, 2019

Uh oh!

kyleclo commented Apr 4, 2019

Uh oh!

arthurbrack commented Apr 6, 2019 •

edited

Loading

Uh oh!

arthurbrack commented Oct 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kyleclo commented Feb 18, 2019

Uh oh!

armancohan left a comment

Choose a reason for hiding this comment

Uh oh!

armancohan Feb 18, 2019

Choose a reason for hiding this comment

Uh oh!

armancohan Feb 18, 2019

Choose a reason for hiding this comment

Uh oh!

armancohan Feb 18, 2019

Choose a reason for hiding this comment

Uh oh!

armancohan Feb 18, 2019

Choose a reason for hiding this comment

Uh oh!

armancohan Feb 18, 2019

Choose a reason for hiding this comment

Uh oh!

arthurbrack commented Apr 4, 2019

Uh oh!

kyleclo commented Apr 4, 2019

Uh oh!

arthurbrack commented Apr 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arthurbrack commented Oct 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arthurbrack commented Apr 6, 2019 •

edited

Loading