Reddit Post Experiment dataset can be found here:

http://www3.nd.edu/~tweninge/data/reddit_post_manipulation_data.csv

The R-Markdown code can be found here:

http://www3.nd.edu/~tweninge/data/reddit_report.Rmd

These are the statistics generated from the reddit data.

Basic dataset statistics:

Number of posts, N=

## [1] 93019

Control:

## [1] 31225

Positive Treatment:

## [1] 30998

Negative Treatment:

## [1] 30796

Post score (ups-downs) summary statistics

All scores:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     -67       1       2      30       8    4350

Control Scores:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     -67       1       2      29       8    4280

Positive Treatment Scores (without adjustment):

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     -59       1       2      32       9    4350

Negative Treatment Scores (without adjustment):

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     -26       0       1      28       7    3420

Boxplot Comparison:

plot of chunk unnamed-chunk-10

Boxplot Comparison (log):

## [1] "Skewness (log) by Treatment:"

## [1] "Control =  1.14299655650167"

## [1] "Positive =  1.08944528778635"

## [1] "Negative =  1.09341028291602"

## [1] "Kurtosis (log) by Treatment:"

## [1] "Control =  3.97168065319083"

## [1] "Positive =  3.94333809954101"

## [1] "Negative =  3.88151202033085"

## [1] "Boxplot Comparison (log) of Scores by Treatment"

plot of chunk unnamed-chunk-11

Line graph Comparison:

plot of chunk unnamed-chunk-12

Outlier Elimination:

## [1] "Skewness for all scores =  11.2295865064024"

## [1] "Kurtosis for all scores =  149.824273669107"

## [1] "Skewness for all scores (with outliers eliminated) =  6.4830125926408"

## [1] "Kurtosis for all scores (with outliers eliminated) =  54.9451510876046"

plot of chunk outlier elimination

Distribution Test

plot of chunk unnamed-chunk-13

Shapiro Test Control:

## 
##  Shapiro-Wilk normality test
## 
## data:  sample(nothingScore, 5000)
## W = 0.1511, p-value < 2.2e-16

Shapiro Test Positive Treatment:

## 
##  Shapiro-Wilk normality test
## 
## data:  sample(upvoteScore, 5000)
## W = 0.1693, p-value < 2.2e-16

Shapiro Test Negative Treatment:

## 
##  Shapiro-Wilk normality test
## 
## data:  sample(downvoteScore, 5000)
## W = 0.153, p-value < 2.2e-16

KS Test Control:

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  nothingScore
## D = 0.6226, p-value < 2.2e-16
## alternative hypothesis: two-sided

KS Test Positive Treatment:

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  upvoteScore
## D = 0.6523, p-value < 2.2e-16
## alternative hypothesis: two-sided

KS Test Negative Treatment:

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  downvoteScore
## D = 0.5128, p-value < 2.2e-16
## alternative hypothesis: two-sided

plot of chunk unnamed-chunk-20

Fit to distribution:

## [1] "Normal:"

## [1] "Control"

## [1] "AIC:  325792.703578005"

## [1] "BIC:  325808.907844949"

## [1] "LL:  -162894.351789002"

## [1] "Positive"

## [1] "AIC:  337728.745256123"

## [1] "BIC:  337745.009527972"

## [1] "LL:  -168862.372628062"

## [1] "Negative"

## [1] "AIC:  278863.29341751"

## [1] "BIC:  278879.167068734"

## [1] "LL:  -139429.646708755"

## [1] "Exponential:"

## [1] "Control"

## [1] "AIC:  226366.207162895"

## [1] "BIC:  226374.309296367"

## [1] "LL:  -113182.103581447"

## [1] "Positive"

## [1] "AIC:  236484.318007565"

## [1] "BIC:  236492.450143489"

## [1] "LL:  -118241.159003782"

## [1] "Negative"

## [1] "AIC:  196098.068459806"

## [1] "BIC:  196106.005285419"

## [1] "LL:  -98048.0342299032"

## [1] "Pareto:"

## [1] "Control"

## [1] "AIC:  69915.6180434678"

## [1] "BIC:  69923.7201769401"

## [1] "LL:  -34956.8090217339"

## [1] "Positive"

## [1] "AIC:  76304.3996584613"

## [1] "BIC:  76312.5317943858"

## [1] "LL:  -38151.1998292307"

## [1] "Negative"

## [1] "AIC:  61716.7897895901"

## [1] "BIC:  61724.7266152022"

## [1] "LL:  -30857.394894795"

Fit to Pareto:

## 
## Call:
## glm(formula = y ~ log(1:max(i)), family = quasipoisson())
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -5.923  -0.629  -0.423  -0.332   6.939  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    9.08635    0.00687    1322   <2e-16 ***
## log(1:max(i)) -1.45422    0.00289    -503   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 0.6669)
## 
##     Null deviance: 251865.1  on 4284  degrees of freedom
## Residual deviance:   2793.1  on 4283  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

plot of chunk unnamed-chunk-22

## 
## Call:  glm(formula = y ~ log(1:max(i)), family = quasipoisson())
## 
## Coefficients:
##   (Intercept)  log(1:max(i))  
##          9.09          -1.45  
## 
## Degrees of Freedom: 4284 Total (i.e. Null);  4283 Residual
## Null Deviance:       252000 
## Residual Deviance: 2790  AIC: NA

## 
## Call:
## glm(formula = y ~ log(1:max(i)), family = quasipoisson())
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -9.674  -0.714  -0.493  -0.390   6.215  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    8.87178    0.00818    1085   <2e-16 ***
## log(1:max(i)) -1.42186    0.00331    -430   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 0.7778)
## 
##     Null deviance: 198956.5  on 3418  degrees of freedom
## Residual deviance:   2687.5  on 3417  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

plot of chunk unnamed-chunk-22

## 
## Call:  glm(formula = y ~ log(1:max(i)), family = quasipoisson())
## 
## Coefficients:
##   (Intercept)  log(1:max(i))  
##          8.87          -1.42  
## 
## Degrees of Freedom: 3418 Total (i.e. Null);  3417 Residual
## Null Deviance:       199000 
## Residual Deviance: 2690  AIC: NA

## 
## Call:
## glm(formula = y ~ log(1:max(i)), family = quasipoisson())
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -22.757   -0.707   -0.481   -0.381   11.060  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    9.05276    0.00757    1196   <2e-16 ***
## log(1:max(i)) -1.41477    0.00299    -474   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 0.8083)
## 
##     Null deviance: 251384.9  on 4345  degrees of freedom
## Residual deviance:   3851.8  on 4344  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

## 
## Call:  glm(formula = y ~ log(1:max(i)), family = quasipoisson())
## 
## Coefficients:
##   (Intercept)  log(1:max(i))  
##          9.05          -1.41  
## 
## Degrees of Freedom: 4345 Total (i.e. Null);  4344 Residual
## Null Deviance:       251000 
## Residual Deviance: 3850  AIC: NA

Comparison

## [1] "KS Test -- Positive Treatment vs Control p= 0"

## [1] "KS Test -- Positive Treatment vs Control p= 0"

## [1] "Sanity Check -- KS Test -- Control vs Control p= 0.99588096530338"

## [1] "Wilcox Test -- Positive Treatment vs Control p= 5.86264455955917e-53"

## [1] "Wilcox Test -- Negative Treatment vs Control p= 7.83462479119778e-73"

## [1] "Sanity Check -- Wilcox Test -- Control vs Control p= 0.621666799285642"

## [1] "T Test -- Log Positive Treatment vs Log Control p= 1.68975176107589e-20"

## [1] "T Test -- Log Negative Treatment vs Log Control p= 1.68557267325307e-09"

## [1] "Sanity Check -- T Test -- Log Control vs Log Control p= 0.506586669634357"

## [1] "KS Test -- Positive Treatment vs Control at  0  sec - D =  0.086927836982747 p = 0"
## [1] "KS Test -- Positive Treatment vs Control at  30  sec - D =  0.0868038249225742 p = 0"
## [1] "KS Test -- Positive Treatment vs Control at  60  sec - D =  0.0875402520359694 p = 0"
## [1] "KS Test -- Positive Treatment vs Control at  300  sec - D =  0.0825614363984992 p = 0"
## [1] "KS Test -- Positive Treatment vs Control at  600  sec - D =  0.0821767185437847 p = 0"
## [1] "KS Test -- Positive Treatment vs Control at  1800  sec - D =  0.0865049398796312 p = 0"
## [1] "KS Test -- Positive Treatment vs Control at  3600  sec - D =  0.0782516860489065 p = 0"

## [1] "KS Test -- Nevative Treatment vs Control at  0  sec - D =  0.118812991237424 p = 0"
## [1] "KS Test -- Nevative Treatment vs Control at  30  sec - D =  0.110131960763721 p = 0"
## [1] "KS Test -- Nevative Treatment vs Control at  60  sec - D =  0.110522258386419 p = 0"
## [1] "KS Test -- Nevative Treatment vs Control at  300  sec - D =  0.11244828603096 p = 0"
## [1] "KS Test -- Nevative Treatment vs Control at  600  sec - D =  0.119906905396027 p = 0"
## [1] "KS Test -- Nevative Treatment vs Control at  1800  sec - D =  0.0969316008492061 p = 0"
## [1] "KS Test -- Nevative Treatment vs Control at  3600  sec - D =  0.0992090647728554 p = 0"

## [1] "Wilcox Test -- Positive Treatment vs Control at  0  sec - p = 6.09216002326674e-14"
## [1] "Wilcox Test -- Positive Treatment vs Control at  30  sec - p = 1.36078023236168e-18"
## [1] "Wilcox Test -- Positive Treatment vs Control at  60  sec - p = 6.23268794568268e-18"
## [1] "Wilcox Test -- Positive Treatment vs Control at  300  sec - p = 1.83359295515348e-13"
## [1] "Wilcox Test -- Positive Treatment vs Control at  600  sec - p = 2.18364887990707e-11"
## [1] "Wilcox Test -- Positive Treatment vs Control at  1800  sec - p = 6.22235743880408e-15"
## [1] "Wilcox Test -- Positive Treatment vs Control at  3600  sec - p = 8.55656029924071e-12"

## [1] "Wilcox Test -- Negative Treatment vs Control at  0  sec - p = 3.28974328705637e-22"
## [1] "Wilcox Test -- Negative Treatment vs Control at  30  sec - p = 2.53183805069274e-17"
## [1] "Wilcox Test -- Negative Treatment vs Control at  60  sec - p = 4.52130963522907e-24"
## [1] "Wilcox Test -- Negative Treatment vs Control at  300  sec - p = 2.57964251980208e-21"
## [1] "Wilcox Test -- Negative Treatment vs Control at  600  sec - p = 9.97773954203552e-27"
## [1] "Wilcox Test -- Negative Treatment vs Control at  1800  sec - p = 4.26017895022146e-15"
## [1] "Wilcox Test -- Negative Treatment vs Control at  3600  sec - p = 1.52093006947079e-11"

## [1] "T Test -- Log Positive Treatment vs Log Control at  0  sec - p = 1.26564116427947e-07"
## [1] "T Test -- Log Positive Treatment vs Log Control at  30  sec - p = 1.20667230619452e-06"
## [1] "T Test -- Log Positive Treatment vs Log Control at  60  sec - p = 1.62710683848065e-07"
## [1] "T Test -- Log Positive Treatment vs Log Control at  300  sec - p = 4.37299441131072e-06"
## [1] "T Test -- Log Positive Treatment vs Log Control at  600  sec - p = 0.000561252607330452"
## [1] "T Test -- Log Positive Treatment vs Log Control at  1800  sec - p = 2.52817614638093e-07"
## [1] "T Test -- Log Positive Treatment vs Log Control at  3600  sec - p = 2.77337105484995e-05"

## [1] "T Test -- Log Negative Treatment vs Log Control at  0  sec - p = 0.00515517796907591"
## [1] "T Test -- Log Negative Treatment vs Log Control at  30  sec - p = 0.00115130587497559"
## [1] "T Test -- Log Negative Treatment vs Log Control at  60  sec - p = 0.0768726272685686"
## [1] "T Test -- Log Negative Treatment vs Log Control at  300  sec - p = 0.00180717685086556"
## [1] "T Test -- Log Negative Treatment vs Log Control at  600  sec - p = 0.0414153750128376"
## [1] "T Test -- Log Negative Treatment vs Log Control at  1800  sec - p = 0.0303008389080154"
## [1] "T Test -- Log Negative Treatment vs Log Control at  3600  sec - p = 3.20358247499746e-07"

plot of chunk unnamed-chunk-26

## ymax not defined: adjusting position using y instead

plot of chunk unnamed-chunk-26

Score deciles by wait time and action

plot of chunk unnamed-chunk-27

Total votes as function of action

plot of chunk unnamed-chunk-28

Top 10 most frequently ocurring Subreddits Results

Summary Statistics:

##      action       subreddit    N  score    sd     se     ci
## 1  Downvote       AskReddit 1150  1.790 12.02 0.3545 0.6956
## 2   Nothing       AskReddit 1131  3.200 25.76 0.7658 1.5026
## 3    Upvote       AskReddit 1186  5.347 32.00 0.9293 1.8232
## 4  Downvote           funny  878  7.481 46.50 1.5694 3.0803
## 5   Nothing           funny  984  7.842 41.54 1.3244 2.5989
## 6    Upvote           funny  942 10.510 52.68 1.7164 3.3685
## 7  Downvote   AdviceAnimals  651  8.542 37.02 1.4510 2.8492
## 8   Nothing   AdviceAnimals  734  9.811 44.45 1.6406 3.2208
## 9    Upvote   AdviceAnimals  698 13.910 58.55 2.2163 4.3514
## 10 Downvote            pics  550  6.055 31.80 1.3558 2.6632
## 11  Nothing            pics  561 10.282 49.71 2.0987 4.1223
## 12   Upvote            pics  545  9.251 35.08 1.5025 2.9514
## 13 Downvote leagueoflegends  554  3.859 42.35 1.7993 3.5342
## 14  Nothing leagueoflegends  517  4.965 44.06 1.9377 3.8067
## 15   Upvote leagueoflegends  531  6.414 40.46 1.7560 3.4496
## 16 Downvote          gaming  406 11.852 60.31 2.9932 5.8841
## 17  Nothing          gaming  405 14.146 59.01 2.9325 5.7648
## 18   Upvote          gaming  403 14.538 67.60 3.3672 6.6195
## 19 Downvote          videos  424  4.349 34.11 1.6567 3.2564
## 20  Nothing          videos  407  5.902 36.12 1.7905 3.5199
## 21   Upvote          videos  360 12.522 60.45 3.1862 6.2660
## 22 Downvote             aww  323 16.653 54.66 3.0415 5.9837
## 23  Nothing             aww  305 24.492 74.60 4.2718 8.4061
## 24   Upvote             aww  304 20.178 53.93 3.0931 6.0867

plot of chunk unnamed-chunk-29

Reddit Post Experiment

Tim Weninger, Thomas J. Johnston, Maria Glenski

2/6/2015