|
This page contains some tools
that are useful for textual analysis in financial applications. The essential method of textual analysis
goes by various labels in other disciplines such as content analysis,
natural language processing, information retrieval, or computational
linguistics. A growing literature
finds significant relations between stock price reactions to the sentiment
of information releases as measured by word classifications such as those
provided below.
|
|
·
2011
Master Dictionary
Derived from release 4.0 of 2of12inf. Extended to include words appearing in
10-K documents that are not found in the original 2of12inf word list. In addition to providing a master word
list, the dictionary includes statistics for word frequencies in all 10-Ks
from 1994-2011 (including 10-K variants except amended documents). The dictionary reports counts, proportion
of total, average proportion per document, standard deviation of proportion
per document, document count (i.e., number of documents containing at least
one occurrence of the word), nine word category identifiers (e.g.,
negative, positive, etc.), Harvard Word List identifier, number of
syllables, and source for each word.
·
Stop
words
1.
Generic
2.
Names
3.
Dates and numbers
4.
Geographic
5.
Currencies
|
|
·
1994-2011 10-X Summaries
A 91 meg file containing summary data for all 10-K variants (e.g., 10-K405,
10‑Q, 10‑KSB) for 1994-2011.
In addition to word counts for each of the Loughran/McDonald
dictionaries, it contains the filing date, fye,
form-type, file name, SIC, Fama/French Industry(48), total number of words,
total number of unique words (i.e., words used one or more times), gross
file size, net file size (after pre-parsing for tables, html, etc.), # of
ASCII encoded characters, # of HTML characters, # of XBRL characters, and #
of Table characters.
o
Documentation
for Stage One Parse
This document describes the process that strips the 10-X files down to text
files.
o
Documentation
of Master Dictionary and Document Dictionaries
This document describes the process used to parse
the stage one files into word counts and file attributes.
|
FMA 2012 Tutorial Session
Slides
|
·
Natural Language
Processing and Textual Analysis in Finance and Accounting
“IPO
First-Day Returns, Offer Price Revisions, Volatility, and Form S-1
Language”, Tim Loughran and Bill McDonald, 2013.
Aggregate
word list based on the union of negative, uncertainty and weak modal words:
·
Loughran_McDonald_AggregateIPOWordList.txt
Data:
·
Sample
of completed IPOs in STATA format
·
Sample
of withdrawn IPOs in STATA format
|