In this assignment you will build and improve a named entity recognizer for Tweets. This was the shared task for WNUT 2016. This is a very difficult task, with human performance in the upper 60%'s and machine performance ranging from 20% to 50%.
Clone (don't fork) the HW4 repository. It contains the following files:
train | Training data |
dev | Development data |
test | Test data: don't peek! |
conlleval.pl | Compute precision/recall |
You may write code in any language you choose. It should build and run on student*.cse.nd.edu
, but if this is not possible, please discuss it with the instructor before submitting. You may reuse code from your previous assignments, especially HW2.
Whenever the instructions below say to "report" something, it should be reported in the PDF file that you submit.
Greek B-other
Festival I-other
at O
St B-facility
Johns I-facility
before O
ASPEN B-geo-loc
Ew O
wait O
...
Blank lines mark sentence boundaries. Each nonblank line contains a word and its corresponding tag: B-type
for the beginning of a named entity of type type, I-type
for a word inside a named entity of type type, and O
for a word outside a named entity.
Write code to read in the training data. Report how many tokens, word types, and tag types there are.1
<unk>
and using add-0.1 smoothing. Report1 your model's tag bigram probabilities for:
p(B-person | O) =
p(B-person | B-person) =
p(I-person | B-person) =
p(B-person | I-person) =
p(I-person | I-person) =
p(O | I-person) =
and your model's (smoothed) tag-word probabilities for:
p(God | B-person) =
p(God | O) =
p(Justin | B-person) =
p(Justin | O) =
p(Lindsay | B-person) =
p(Lindsay | O) =
Thinking O O
about O O
Bowlounge B-facility B-person
on O O
Saturday O O
night O O
. O O
sur O O
mon O O
...
where the first column contains the words, the second column contains the correct tags, and the third column contains your system's tags. Report what your output looks like for the first five sentences of the development set.1perl conlleval.pl < dev.out
where dev.out
is replaced with the name of your output file. You should see something like this:
processed 16261 tokens with 661 phrases; found: 931 phrases; correct: 77.
accuracy: 71.99%; precision: 8.27%; recall: 11.65%; FB1: 9.67
company: precision: 85.71%; recall: 15.38%; FB1: 26.09 7
facility: precision: 17.65%; recall: 7.89%; FB1: 10.91 17
geo-loc: precision: 60.00%; recall: 15.52%; FB1: 24.66 30
movie: precision: 0.00%; recall: 0.00%; FB1: 0.00 5
musicartist: precision: 0.00%; recall: 0.00%; FB1: 0.00 67
other: precision: 2.63%; recall: 8.33%; FB1: 3.99 419
person: precision: 10.05%; recall: 21.64%; FB1: 13.73 368
product: precision: 7.14%; recall: 2.70%; FB1: 3.92 14
sportsteam: precision: 33.33%; recall: 1.43%; FB1: 2.74 3
tvshow: precision: 0.00%; recall: 0.00%; FB1: 0.00 1
The important score is FB1 (balanced F-measure) on all types, which is 9.67%.
Report your system's score on the development data.1 It should be at least 9%.3
Please submit all of the following in a gzipped tar archive (.tar.gz or .tgz; not .zip or .rar) via Sakai:
student*.cse.nd.edu
. If this is not possible, please discuss with the instructor before submitting.