In this assignment you will build and improve a named entity recognizer using a hidden Markov Model and a structured perceptron.
You may write code in any language you choose. It should build and run on student*.cse.nd.edu
, but if this is not possible, please discuss it with the instructor before submitting. You may reuse code from your previous assignments, especially HW2.
Whenever the instructions below say to "report" something, it should be reported in the PDF file that you submit.
train-1k | Training data (1k sentences) |
train-10k | Training data (10k sentences) |
train-all | Training data (all sentences) |
dev | Development data |
test | Test data: don't peek! |
conlleval.pl | Compute precision/recall |
With O
a O
throw O
of O
57.28 B-QUANTITY
meters I-QUANTITY
, O
Chiang B-PERSON
shattered O
the O
world O
record O
for O
the O
javelin O
in O
the O
F12 B-CARDINAL
visually O
- O
impaired O
class O
. O
Chiang B-PERSON
...
Blank lines mark sentence boundaries. Each nonblank line contains a word and its corresponding tag: B-type
for the beginning of a named entity of type type, I-type
for a word inside a named entity of type type, and O
for a word outside a named entity.
Write code to read in train-1k
. Report how many tokens, word types, and tag types there are.1
train-1k
.5
<unk>
. Report2 your model's tag bigram probabilities for:
p(B-DATE | B-DATE) =
p(I-DATE | B-DATE) =
p(O | B-DATE) =
p(B-DATE | I-DATE) =
p(I-DATE | I-DATE) =
p(O | I-DATE) =
p(B-DATE | O) =
p(I-DATE | O) =
p(O | O) =
and your model's (smoothed) tag-word probabilities for:
p(one | B-DATE) =
p(one | I-DATE) =
p(one | O) =
p(February | B-DATE) =
p(February | I-DATE) =
p(February | O) =
# Assume w[1], ..., w[n-1] is the input string and w[n] = </s>
# Assume that we have a model with
# - tag bigram probabilities P(t'|t)
# - tag-word probabilities P(w|t), with P(</s>|</s>) = 1
# Initialize table of Viterbi probabilities
# viterbi[i,t] is the probability of the best way of tagging words 1...i-1 with some tags and tagging word i as t
viterbi[i,t] = 0 for i = 0...n+1 and all tags t
viterbi[0,<s>] = 1
for i = 1, ..., n
for each tag t'
for each tag t
if viterbi[i-1,t] * P(t'|t) * P(w[i]|t') > viterbi[i,t']:
viterbi[i,t'] = viterbi[i-1,t] * P(t'|t) * P(w[i]|t')
pointer[i,t'] = (i-1,t)
return viterbi[n,</s>]
On O O
the O B-DATE
16th B-ORDINAL B-DATE
Chen B-PERSON B-PERSON
attended O O
the O O
inauguration O O
of O O
President O B-PERSON
Hipolito B-PERSON I-PERSON
Mejia I-PERSON I-PERSON
. O O
The O O
...
where the first column contains the words, the second column contains the correct tags, and the third column contains your system's tags.
The conlleval script is a very finicky Perl script; please make sure that the columns are separated by exactly one space character, and that there is no leading or trailing whitespace, and that blank lines do not have any whitespace.
Report what your output looks like for the first five sentences of the development set.1
perl conlleval.pl < dev.out
where dev.out
is replaced with the name of your output file. You should see something like this:
processed 21608 tokens with 1635 phrases; found: 1652 phrases; correct: 472.
accuracy: 78.86%; precision: 28.57%; recall: 28.87%; FB1: 28.72
CARDINAL: precision: 48.48%; recall: 13.91%; FB1: 21.62 33
DATE: precision: 20.57%; recall: 28.86%; FB1: 24.02 209
EVENT: precision: 0.00%; recall: 0.00%; FB1: 0.00 369
FAC: precision: 0.00%; recall: 0.00%; FB1: 0.00 1
GPE: precision: 87.23%; recall: 39.27%; FB1: 54.16 235
LANGUAGE: precision: 0.00%; recall: 0.00%; FB1: 0.00 0
LAW: precision: 0.00%; recall: 0.00%; FB1: 0.00 0
LOC: precision: 0.00%; recall: 0.00%; FB1: 0.00 18
MONEY: precision: 28.57%; recall: 50.00%; FB1: 36.36 14
NORP: precision: 93.90%; recall: 45.83%; FB1: 61.60 82
ORDINAL: precision: 66.67%; recall: 14.63%; FB1: 24.00 9
ORG: precision: 9.39%; recall: 12.50%; FB1: 10.73 181
PERCENT: precision: 50.00%; recall: 25.00%; FB1: 33.33 4
PERSON: precision: 25.20%; recall: 35.16%; FB1: 29.36 381
QUANTITY: precision: 60.00%; recall: 25.00%; FB1: 35.29 10
TIME: precision: 0.00%; recall: 0.00%; FB1: 0.00 4
WORK_OF_ART: precision: 0.00%; recall: 0.00%; FB1: 0.00 102
The important score is FB1 (balanced F-measure) on all types, which is 46.15%.
Report your system's score on the development data.1 It should be at least 28%.2
λ₁(B-DATE, B-DATE) =
λ₁(B-DATE, I-DATE) =
λ₁(B-DATE, O) =
λ₁(I-DATE, B-DATE) =
λ₁(I-DATE, I-DATE) =
λ₁(I-DATE, O) =
λ₁(O, B-DATE) =
λ₁(O, I-DATE) =
λ₁(O, O) =
and your model's tag-word weights for:
λ₂(B-DATE, one) =
λ₂(I-DATE, one) =
λ₂(O, one) =
λ₂(B-DATE, February) =
λ₂(I-DATE, February) =
λ₂(O, February) =
Please submit all of the following in a gzipped tar archive (.tar.gz or .tgz; not .zip or .rar) via Sakai:
student*.cse.nd.edu
. If this is not possible, please discuss with the instructor before submitting.