In this assignment you will build and improve a named entity recognizer for Tweets. This was the shared task for WNUT 2016. This is a very difficult task, with human performance in the upper 60%'s and machine performance ranging from 20% to 50%.
Clone (don't fork) the HW4 repository. It contains the following files:
| ||Training data|
| ||Development data|
| ||Test data: don't peek!|
| ||Compute precision/recall|
You may write code in any language you choose. It should build and run on
student*.cse.nd.edu, but if this is not possible, please discuss it with the instructor before submitting. You may reuse code from your previous assignments, especially HW2.
Whenever the instructions below say to "report" something, it should be reported in the PDF file that you submit.
Blank lines mark sentence boundaries. Each nonblank line contains a word and its corresponding tag:
Greek B-other Festival I-other at O St B-facility Johns I-facility before O ASPEN B-geo-loc Ew O wait O ...
B-typefor the beginning of a named entity of type type,
I-typefor a word inside a named entity of type type, and
Ofor a word outside a named entity. Write code to read in the training data. Report how many tokens, word types, and tag types there are.1
<unk>and using add-0.1 smoothing. Report1 your model's tag bigram probabilities for:
and your model's (smoothed) tag-word probabilities for:
p(B-person | O) = p(B-person | B-person) = p(I-person | B-person) = p(B-person | I-person) = p(I-person | I-person) = p(O | I-person) =
p(God | B-person) = p(God | O) = p(Justin | B-person) = p(Justin | O) = p(Lindsay | B-person) = p(Lindsay | O) =
where the first column contains the words, the second column contains the correct tags, and the third column contains your system's tags. Report what your output looks like for the first five sentences of the development set.1
Thinking O O about O O Bowlounge B-facility B-person on O O Saturday O O night O O . O O sur O O mon O O ...
Added 2018/11/14: Here are two strategies you can take for doing this.
# Assume w, ..., w[n] is the input string and w[n+1] = # Assume that we have a model with # - tag bigram probabilities P(t'|t) # - tag-word probabilities P(w|t), with P(</s>|</s>) = 1 # Initialize table of Viterbi probabilities # viterbi[i,t] is the probability of the best way of tagging words 1...i-1 with some tags and tagging word i as t viterbi[i,t] = 0 for i = 0...n+1 and all tags t viterbi[0,<s>] = 1 for i = 1, ..., n+1 for each tag t' for each tag t if viterbi[i-1,t] * P(t'|t) * P(w[i]|t') > viterbi[i,t']: viterbi[i,t'] = viterbi[i-1,t] * P(t'|t) * P(w[i]|t') pointer[i,t'] = (i-1,t) return viterbi[n+1,</s>]
perl conlleval.pl < dev.out
dev.outis replaced with the name of your output file. You should see something like this:
The important score is FB1 (balanced F-measure) on all types, which is 9.67%. Report your system's score on the development data.1 It should be at least 9%.3
processed 16261 tokens with 661 phrases; found: 931 phrases; correct: 77. accuracy: 71.99%; precision: 8.27%; recall: 11.65%; FB1: 9.67 company: precision: 85.71%; recall: 15.38%; FB1: 26.09 7 facility: precision: 17.65%; recall: 7.89%; FB1: 10.91 17 geo-loc: precision: 60.00%; recall: 15.52%; FB1: 24.66 30 movie: precision: 0.00%; recall: 0.00%; FB1: 0.00 5 musicartist: precision: 0.00%; recall: 0.00%; FB1: 0.00 67 other: precision: 2.63%; recall: 8.33%; FB1: 3.99 419 person: precision: 10.05%; recall: 21.64%; FB1: 13.73 368 product: precision: 7.14%; recall: 2.70%; FB1: 3.92 14 sportsteam: precision: 33.33%; recall: 1.43%; FB1: 2.74 3 tvshow: precision: 0.00%; recall: 0.00%; FB1: 0.00 1
Added 2018/11/14: When I wrote the initial solution, I implemented it from scratch, not using the FST library; unfortunately, the FST library's composition is slow, so if you're using it, here's the trick to get it to train fast (about 1 minute per pass through the training data).
git pullon the HW2 repository to get a new version of
fst.py. The new version has a function
compose_rightthat works just like
compose, but assumes that the FST on the right is smaller, and optimizes accordingly. Use this function instead.
transis a transition, then we have that
trans.q == ((t, _), i)and
trans.a == (t', w)and
trans.r = ((t', _), i+1). You can use those to compute the transition's weight, which is $\lambda(t,t') + \mu(t',w)$.
Please submit all of the following in a gzipped tar archive (.tar.gz or .tgz; not .zip or .rar) via Sakai:
student*.cse.nd.edu. If this is not possible, please discuss with the instructor before submitting.