In this assignment you will build and improve a named entity recognizer for Tweets. This was the shared task for WNUT 2016. This is a very difficult task, with human performance in the upper 60%'s and machine performance ranging from 20% to 50%.
Clone (don't fork) the HW4 repository. It contains the following files:
train | Training data |
dev | Development data |
test | Test data: don't peek! |
conlleval.pl | Compute precision/recall |
You may write code in any language you choose. It should build and run on student*.cse.nd.edu
, but if this is not possible, please discuss it with the instructor before submitting. You may reuse code from your previous assignments, especially HW2.
Whenever the instructions below say to "report" something, it should be reported in the PDF file that you submit.
Greek B-other
Festival I-other
at O
St B-facility
Johns I-facility
before O
ASPEN B-geo-loc
Ew O
wait O
...
Blank lines mark sentence boundaries. Each nonblank line contains a word and its corresponding tag: B-type
for the beginning of a named entity of type type, I-type
for a word inside a named entity of type type, and O
for a word outside a named entity.
Write code to read in the training data. Report how many tokens, word types, and tag types there are.1
<unk>
and using add-0.1 smoothing. Report1 your model's tag bigram probabilities for:
p(B-person | O) =
p(B-person | B-person) =
p(I-person | B-person) =
p(B-person | I-person) =
p(I-person | I-person) =
p(O | I-person) =
and your model's (smoothed) tag-word probabilities for:
p(God | B-person) =
p(God | O) =
p(Justin | B-person) =
p(Justin | O) =
p(Lindsay | B-person) =
p(Lindsay | O) =
Thinking O O
about O O
Bowlounge B-facility B-person
on O O
Saturday O O
night O O
. O O
sur O O
mon O O
...
where the first column contains the words, the second column contains the correct tags, and the third column contains your system's tags. Report what your output looks like for the first five sentences of the development set.1
Added 2018/11/14: Here are two strategies you can take for doing this.
# Assume w[1], ..., w[n] is the input string and w[n+1] =
# Assume that we have a model with
# - tag bigram probabilities P(t'|t)
# - tag-word probabilities P(w|t), with P(</s>|</s>) = 1
# Initialize table of Viterbi probabilities
# viterbi[i,t] is the probability of the best way of tagging words 1...i-1 with some tags and tagging word i as t
viterbi[i,t] = 0 for i = 0...n+1 and all tags t
viterbi[0,<s>] = 1
for i = 1, ..., n+1
for each tag t'
for each tag t
if viterbi[i-1,t] * P(t'|t) * P(w[i]|t') > viterbi[i,t']:
viterbi[i,t'] = viterbi[i-1,t] * P(t'|t) * P(w[i]|t')
pointer[i,t'] = (i-1,t)
return viterbi[n+1,</s>]
perl conlleval.pl < dev.out
where dev.out
is replaced with the name of your output file. You should see something like this:
processed 16261 tokens with 661 phrases; found: 931 phrases; correct: 77.
accuracy: 71.99%; precision: 8.27%; recall: 11.65%; FB1: 9.67
company: precision: 85.71%; recall: 15.38%; FB1: 26.09 7
facility: precision: 17.65%; recall: 7.89%; FB1: 10.91 17
geo-loc: precision: 60.00%; recall: 15.52%; FB1: 24.66 30
movie: precision: 0.00%; recall: 0.00%; FB1: 0.00 5
musicartist: precision: 0.00%; recall: 0.00%; FB1: 0.00 67
other: precision: 2.63%; recall: 8.33%; FB1: 3.99 419
person: precision: 10.05%; recall: 21.64%; FB1: 13.73 368
product: precision: 7.14%; recall: 2.70%; FB1: 3.92 14
sportsteam: precision: 33.33%; recall: 1.43%; FB1: 2.74 3
tvshow: precision: 0.00%; recall: 0.00%; FB1: 0.00 1
The important score is FB1 (balanced F-measure) on all types, which is 9.67%.
Report your system's score on the development data.1 It should be at least 9%.3
Added 2018/11/14: When I wrote the initial solution, I implemented it from scratch, not using the FST library; unfortunately, the FST library's composition is slow, so if you're using it, here's the trick to get it to train fast (about 1 minute per pass through the training data).
git pull
on the HW2 repository to get a new version of fst.py
. The new version has a function compose_right
that works just like compose
, but assumes that the FST on the right is smaller, and optimizes accordingly. Use this function instead.trans
is a transition, then we have that trans.q == ((t, _), i)
and trans.a == (t', w)
and trans.r = ((t', _), i+1)
. You can use those to compute the transition's weight, which is $\lambda(t,t') + \mu(t',w)$.Please submit all of the following in a gzipped tar archive (.tar.gz or .tgz; not .zip or .rar) via Sakai:
student*.cse.nd.edu
. If this is not possible, please discuss with the instructor before submitting.