* this example uses white and black non-hispanic * full time (hours>=30) full year (weeks>=40) * workers with no business (self employed) * income from the 2005-2007 march cps * the march cps asks people how much * they earned in the previous year so * this example uses data from 2004-2006 * the data is available from www.icpsr.org * increase memory and matsize to handle lots of * variables and observations set memory 100m set matsize 10000 set more off * write output to march_cps_example.los log using march_cps_example.log, replace use march_cps_example * keep full year workers, 40+ weeks in previous year keep if wkswork1>=40 * keep full time workers, hours>=30 keep if uhrswork>=30 * drop people with business losses or income keep if incbus==0 * hours last year gen hours=uhrswork*wkswork1 * real wages * cps in year t asks about income in year t-1 * make all earnings in real 2007 earnings gen y2007=year==2007 gen y2006=year==2006 gen y2005=year==2005 gen cpi=(1.913*y2005+1.9718*y2006+2.028*y2007)/2.112 gen wages=(incwage/cpi)/hours sum wages, detail * delete high and low values of wages drop if wages>300 drop if wages<4 * generate hispanic indicator * delete hispanics gen hispanic=hispan>0 drop if hispanic==1 * generate black and white indicators gen black=(race==200|race==801|race==805|race==806|race==807|race==810|race==811|race==814) gen white=race==100 * only keep black and white keep if black==1|white==1 * identify married people gen married=marst==1 * get summary statistics sum wages black married uhrswork wkswork1 * log wages gen wagesl=ln(wages) * cubic terms in age gen age2=age*age gen age3=age*age*age * get a complete set if dummy variables for education, industry, occupation xi i.educrec i.indly i.occly * get raw difference in means across race groups reg wagesl black * control for basic demongraphic variables reg wagesl black married age* _Ied* * control for industry reg wagesl black married age* _Ied* _Iind* * control for occupation reg wagesl black married age* _Ied* _Iind* _Iocc* log close