Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: dot

Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: mark

 

Data

Websites

·         Dow Jones Analytics

Bill McDonald
Professor of Finance

Thomas A. and James J. Bruder Chair in

   Administrative Leadership

Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Me.jpg

E‑Mail:

mcdonald.1@nd.edu

Address:

335 Mendoza College of Business

University of Notre Dame

Notre Dame, IN  46556

Telephone:

(574) 631‑5137

 

 

Textual Analysis

This page contains some tools that are useful for textual analysis in financial applications.  The essential method of textual analysis goes by various labels in other disciplines such as content analysis, natural language processing, information retrieval, or computational linguistics.  A growing literature finds significant relations between stock price reactions to the sentiment of information releases as measured by word classifications such as those provided below. 

Loughran and McDonald Financial Sentiment Dictionaries

Updated: 2012

Note:  We thank Cam Harvey and others who suggested some of the modifications we’ve included in these lists. The word lists are described in Loughran and McDonald (Journal of Finance, V66, pp. 35-65, 2011).  The dictionary files are comma delimited and contain a word followed by the version year.  Not for commercial use without authorization.  Copyright 2009. 

 

·         Negative Words

·         Positive Words

·         Uncertainty Words

·         Litigious Words

·         Modal Words Strong

·         Modal Words Weak

·         Download zip folder with all lists

·         Download zip folder in WordStat format (contains .cat and .NFO files)

Harvard-IV-4 Psychological Dictionary

TagNeg File with Inflections

·         Harvard IV Negative Word List_Inf.txt
Because of the inherent imprecision of stemming, we have expanded the Harvard list to include relevant inflections.

General Word Lists

·         2011 Master Dictionary
Derived from release 4.0 of 2of12inf.  Extended to include words appearing in 10-K documents that are not found in the original 2of12inf word list.  In addition to providing a master word list, the dictionary includes statistics for word frequencies in all 10-Ks from 1994-2011 (including 10-K variants except amended documents).  The dictionary reports counts, proportion of total, average proportion per document, standard deviation of proportion per document, document count (i.e., number of documents containing at least one occurrence of the word), nine word category identifiers (e.g., negative, positive, etc.), Harvard Word List identifier, number of syllables, and source for each word.

·         Stop words

1.        Generic

2.        Names

3.        Dates and numbers

4.        Geographic

5.       Currencies

 

10-X File Summaries

·          1994-2011 10-X Summaries
A 91 meg file containing summary data for all 10-K variants (e.g., 10-K405, 10‑Q, 10‑KSB) for 1994-2011.  In addition to word counts for each of the Loughran/McDonald dictionaries, it contains the filing date, fye, form-type, file name, SIC, Fama/French Industry(48), total number of words, total number of unique words (i.e., words used one or more times), gross file size, net file size (after pre-parsing for tables, html, etc.), # of ASCII encoded characters, # of HTML characters, # of XBRL characters, and # of Table characters.

o    Documentation for Stage One Parse
This document describes the process that strips the 10-X files down to text files.

o    Documentation of Master Dictionary and Document Dictionaries
This document describes the process used to parse the stage one files into word counts and file attributes.

 

FMA 2012 Tutorial Session Slides

·          Natural Language Processing and Textual Analysis in Finance and Accounting

 

IPO Data

 

“IPO First-Day Returns, Offer Price Revisions, Volatility, and Form S-1 Language”, Tim Loughran and Bill McDonald, 2013.

 

Aggregate word list based on the union of negative, uncertainty and weak modal words:

·          Loughran_McDonald_AggregateIPOWordList.txt

Data:

·          Sample of completed IPOs in STATA format

·          Sample of withdrawn IPOs in STATA format

 

 

 

 

© 2012 University of Notre Dame