Note: We thank Cam Harvey and others who
suggested some of the modifications we’ve included in these lists. The
word lists are described in:
Loughran and Bill McDonald, 2011, “When is a Liability not a
Liability? Textual Analysis,
Dictionaries, and 10-Ks,” Journal
of Finance, 66:1, 35-65.
Bodnaruk, Tim Loughran and Bill McDonald, 2015, “Using 10-K Text to Gauge
Financial Constraints,” Journal of
Financial and Quantitative Analysis, 50:4.
All word lists are contained in the Master Dictionary described
immediately below. Each row in the
Master Dictionary spreadsheet is a word.
Sentiment word lists are identified by column with members of the
given set identified by non-zero entries.
The non-zero entries represent the year in which the word was
added to a given sentiment list.
.cat and .NFO files
2014 Master Dictionary (click
Updated: March 2015
from release 4.0 of 2of12inf. Extended to include words appearing in
10-K documents that are not found in the original 2of12inf word
list. In addition to providing a
master word list, the dictionary includes statistics for word frequencies
in all 10-Ks from 1994-2014 (including 10-X variants). The dictionary reports counts,
proportion of total, average proportion per document, standard deviation
of proportion per document, document count (i.e., number of documents
containing at least one occurrence of the word), nine sentiment category
identifiers (e.g., negative, positive, uncertainty, litigious, modal,
constraining), Harvard Word List identifier, number of syllables, and
source for each word. Detailed
documentation appears here.
Loughran and Bill McDonald, 2013, “IPO First-Day Returns, Offer Price
Revisions, Volatility, and Form S-1 Language,” Journal of Financial Economics, 109:2, 307-326.
word list based on the union of negative, uncertainty and weak modal
of completed IPOs in STATA format
of withdrawn IPOs in STATA format
Readability – 10-K File Size
Loughran and Bill McDonald, 2014, “Measuring Readability in Financial
Disclosures”, Journal of Finance,
10-K file summaries below, which contains
both gross and net file sizes by filing date, CIK, and form type.