[deprecated - too many problems w dataset] Kylel/semeval2017#16
[deprecated - too many problems w dataset] Kylel/semeval2017#16
Conversation
…generate NER data for conll2003
…bigger than mention
armancohan
left a comment
There was a problem hiding this comment.
A few comments. Otherwise looks good.
| spacy_text = nlp(text) | ||
|
|
||
| # split sentences & tokenize | ||
| sent_token_spans = TokenSpan.find_sent_token_spans(text=text, sent_tokens=[[token.text for sent in spacy_text.sents for token in sent if token.text.strip() != '']]) |
There was a problem hiding this comment.
BERT will probably complain about long sequences (entire abstract). So we need to do this at sentence level.
sci_bert/common/span.py
Outdated
| return spans | ||
|
|
||
| @classmethod | ||
| def find_sent_token_spans(cls, text: str, sent_tokens: List[List[str]]) -> List[List['TokenSpan']]: |
There was a problem hiding this comment.
Seems duplicate of the first funciton.
sci_bert/common/span.py
Outdated
| temp = [] | ||
| index_sent += 1 | ||
|
|
||
| # append remaining mentions |
There was a problem hiding this comment.
Comment that this is the last sentence only.
sci_bert/common/span.py
Outdated
| assert len(sent_mention_spans) == len(sent_token_spans) | ||
| return sent_mention_spans | ||
|
|
||
| # |
There was a problem hiding this comment.
Consider removing unused code.
|
|
||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
You can comment these out as they are not related to entities.
…be too long for bert
|
I have trained the model with the semeval2017 data and evaluated it afterwards with semeval2017 evaluation script. I'am confused why I get so different results: Results reported by the semeval2017 evaluation script on test set: Material 0.44 0.43 0.44 904 avg / total 0.40 0.37 0.39 2051 So F1 is 0.39. Do you have maybe a reasonable explanation for this difference? |
|
hey @arthurbra thanks for your interest! i've been looking into this difference as well, and it looks like the task definitions are different between what we've implemented here & the original semeval2017 task. specifically, the 3 tasks in semeval2017 are (1) entity identification, (2) entity type classification, and (3) relation extraction. what we've implemented here combines both (1) and (2) into a single task (we're both extraction & tagging w/ the type at the same time). this will affect how f1 scoring is performed. instead of using the sequence tagging model we have here, my plan is to adapt an existing model made for the semeval2017 task, and substitute in the various bert variants to replace glove/w2v embeddings. |
|
Hey @kyleclo thanks for your reply and explanation. I am really impressed by your results and want to learn more. It is right, you perform entity identification and classification in one task. In my understanding this is task B in semeval 2017: In calculateMeasures of semeval2017 evaluation script you can pass the parameter remove_anno="rel" which should ignore relations during evaluation (as far as I understand the code). I already used this parameter in the evaluation of my prev. post. So I assume there should be an another explanation. It would be great if you could apply the code of the semeval 2017 winner with BERT. Unfortunatelly I was not able to find it (AI2 system of Ammar et. al.). |
|
Hi Kyle, I think I have found the problem: in the prediction script model.eval() is not called, so that dropout is active during prediction. Regards |
just the NER portion