Reddit Post Experiment dataset can be found here:

http://www3.nd.edu/~tweninge/data/reddit_post_manipulation_data.csv

The R-Markdown code can be found here:

http://www3.nd.edu/~tweninge/data/reddit_report.Rmd

These are the statistics generated from the reddit data.

Basic dataset statistics:

Number of posts, N=

## [1] 93019

Control:

## [1] 31225

Positive Treatment:

## [1] 30998

Negative Treatment:

## [1] 30796

Post score (ups-downs) summary statistics

All scores:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     -67       1       2      30       8    4350

Control Scores:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     -67       1       2      29       8    4280

Positive Treatment Scores (without adjustment):

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     -59       1       2      32       9    4350

Negative Treatment Scores (without adjustment):

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     -26       0       1      28       7    3420

Boxplot Comparison:

plot of chunk unnamed-chunk-10

Boxplot Comparison (log):

## [1] "Skewness (log) by Treatment:"
## [1] "Control =  1.14299655650167"
## [1] "Positive =  1.08944528778635"
## [1] "Negative =  1.09341028291602"
## [1] "Kurtosis (log) by Treatment:"
## [1] "Control =  3.97168065319083"
## [1] "Positive =  3.94333809954101"
## [1] "Negative =  3.88151202033085"
## [1] "Boxplot Comparison (log) of Scores by Treatment"

plot of chunk unnamed-chunk-11

Line graph Comparison:

plot of chunk unnamed-chunk-12

Outlier Elimination:

## [1] "Skewness for all scores =  11.2295865064024"
## [1] "Kurtosis for all scores =  149.824273669107"
## [1] "Skewness for all scores (with outliers eliminated) =  6.4830125926408"
## [1] "Kurtosis for all scores (with outliers eliminated) =  54.9451510876046"

plot of chunk outlier eliminationplot of chunk outlier elimination

Distribution Test

plot of chunk unnamed-chunk-13plot of chunk unnamed-chunk-13plot of chunk unnamed-chunk-13

Shapiro Test Control:

## 
##  Shapiro-Wilk normality test
## 
## data:  sample(nothingScore, 5000)
## W = 0.1511, p-value < 2.2e-16

Shapiro Test Positive Treatment:

## 
##  Shapiro-Wilk normality test
## 
## data:  sample(upvoteScore, 5000)
## W = 0.1693, p-value < 2.2e-16

Shapiro Test Negative Treatment:

## 
##  Shapiro-Wilk normality test
## 
## data:  sample(downvoteScore, 5000)
## W = 0.153, p-value < 2.2e-16

KS Test Control:

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  nothingScore
## D = 0.6226, p-value < 2.2e-16
## alternative hypothesis: two-sided

KS Test Positive Treatment:

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  upvoteScore
## D = 0.6523, p-value < 2.2e-16
## alternative hypothesis: two-sided

KS Test Negative Treatment:

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  downvoteScore
## D = 0.5128, p-value < 2.2e-16
## alternative hypothesis: two-sided

plot of chunk unnamed-chunk-20plot of chunk unnamed-chunk-20plot of chunk unnamed-chunk-20

Fit to distribution:

## [1] "Normal:"
## [1] "Control"
## [1] "AIC:  325792.703578005"
## [1] "BIC:  325808.907844949"
## [1] "LL:  -162894.351789002"
## [1] "Positive"
## [1] "AIC:  337728.745256123"
## [1] "BIC:  337745.009527972"
## [1] "LL:  -168862.372628062"
## [1] "Negative"
## [1] "AIC:  278863.29341751"
## [1] "BIC:  278879.167068734"
## [1] "LL:  -139429.646708755"
## [1] "Exponential:"
## [1] "Control"
## [1] "AIC:  226366.207162895"
## [1] "BIC:  226374.309296367"
## [1] "LL:  -113182.103581447"
## [1] "Positive"
## [1] "AIC:  236484.318007565"
## [1] "BIC:  236492.450143489"
## [1] "LL:  -118241.159003782"
## [1] "Negative"
## [1] "AIC:  196098.068459806"
## [1] "BIC:  196106.005285419"
## [1] "LL:  -98048.0342299032"
## [1] "Pareto:"
## [1] "Control"
## [1] "AIC:  69915.6180434678"
## [1] "BIC:  69923.7201769401"
## [1] "LL:  -34956.8090217339"
## [1] "Positive"
## [1] "AIC:  76304.3996584613"
## [1] "BIC:  76312.5317943858"
## [1] "LL:  -38151.1998292307"
## [1] "Negative"
## [1] "AIC:  61716.7897895901"
## [1] "BIC:  61724.7266152022"
## [1] "LL:  -30857.394894795"

Fit to Pareto:

## 
## Call:
## glm(formula = y ~ log(1:max(i)), family = quasipoisson())
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -5.923  -0.629  -0.423  -0.332   6.939  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    9.08635    0.00687    1322   <2e-16 ***
## log(1:max(i)) -1.45422    0.00289    -503   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 0.6669)
## 
##     Null deviance: 251865.1  on 4284  degrees of freedom
## Residual deviance:   2793.1  on 4283  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

plot of chunk unnamed-chunk-22

## 
## Call:  glm(formula = y ~ log(1:max(i)), family = quasipoisson())
## 
## Coefficients:
##   (Intercept)  log(1:max(i))  
##          9.09          -1.45  
## 
## Degrees of Freedom: 4284 Total (i.e. Null);  4283 Residual
## Null Deviance:       252000 
## Residual Deviance: 2790  AIC: NA
## 
## Call:
## glm(formula = y ~ log(1:max(i)), family = quasipoisson())
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -9.674  -0.714  -0.493  -0.390   6.215  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    8.87178    0.00818    1085   <2e-16 ***
## log(1:max(i)) -1.42186    0.00331    -430   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 0.7778)
## 
##     Null deviance: 198956.5  on 3418  degrees of freedom
## Residual deviance:   2687.5  on 3417  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

plot of chunk unnamed-chunk-22

## 
## Call:  glm(formula = y ~ log(1:max(i)), family = quasipoisson())
## 
## Coefficients:
##   (Intercept)  log(1:max(i))  
##          8.87          -1.42  
## 
## Degrees of Freedom: 3418 Total (i.e. Null);  3417 Residual
## Null Deviance:       199000 
## Residual Deviance: 2690  AIC: NA
## 
## Call:
## glm(formula = y ~ log(1:max(i)), family = quasipoisson())
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -22.757   -0.707   -0.481   -0.381   11.060  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    9.05276    0.00757    1196   <2e-16 ***
## log(1:max(i)) -1.41477    0.00299    -474   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 0.8083)
## 
##     Null deviance: 251384.9  on 4345  degrees of freedom
## Residual deviance:   3851.8  on 4344  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5