R海拾遗-三因素重复

时间:2022-07-25
本文章向大家介绍R海拾遗-三因素重复,主要内容包括其使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。

三因素重复测量方差分析

sunqi

2020/7/26

概述

三因素重复测量资料方差分析,在这项研究中,研究人员想要评估饮食和运动对10个久坐的人减肥的影响。参与者参与了四项试验:(1)不节食,不锻炼;(2)节食;(3)锻炼;(4)节食和锻炼相结合。

10个参与者完成了所有四项试验,每次试验持续9周,在每次试验的开始(t1)、中间(t2)和结束(t3)测量体重减轻评分。

为了确定饮食、运动和时间对减肥评分是否存在显著的交互作用,可以进行三向重复测量方差分析。

代码

导入数据

set.seed(123)
library(tidyverse)
library(ggpubr)
library(rstatix)
data("weightloss", package = "datarium")
weightloss %>% sample_n_by(diet, exercises, size = 1)
## # A tibble: 4 x 6
##   id    diet  exercises    t1    t2    t3
##   <fct> <fct> <fct>     <dbl> <dbl> <dbl>
## 1 3     no    no         11.4  11.1  11.4
## 2 3     no    yes        12.0  13.7  17.8
## 3 10    yes   no         11.1  15.3  10.9
## 4 2     yes   yes        11.4  14.8  15.1
# 转换长数据
weightloss <- weightloss %>%
  gather(key = "time", value = "score", t1, t2, t3) %>%
  convert_as_factor(id, time)
# 对每个变量抽样查看
set.seed(123)
weightloss %>% sample_n_by(diet, exercises, time, size = 1)
## # A tibble: 12 x 5
##    id    diet  exercises time  score
##    <fct> <fct> <fct>     <fct> <dbl>
##  1 3     no    no        t1    11.4
##  2 3     no    no        t2    11.1
##  3 10    no    no        t3    11.4
##  4 2     no    yes       t1     9.96
##  5 6     no    yes       t2    13.0
##  6 11    no    yes       t3    19
##  7 5     yes   no        t1    11.6
##  8 4     yes   no        t2     9.73
##  9 6     yes   no        t3    12.7
## 10 9     yes   yes       t1    11.8
## 11 10    yes   yes       t2    14.1
## 12 11    yes   yes       t3    15.1
# 描述
weightloss %>%
  group_by(diet, exercises, time) %>%
  get_summary_stats(score, type = "mean_sd")
## # A tibble: 12 x 7
##    diet  exercises time  variable     n  mean    sd
##    <fct> <fct>     <fct> <chr>    <dbl> <dbl> <dbl>
##  1 no    no        t1    score       12  10.9 0.868
##  2 no    no        t2    score       12  11.6 1.30
##  3 no    no        t3    score       12  11.4 0.935
##  4 no    yes       t1    score       12  10.8 1.27
##  5 no    yes       t2    score       12  13.4 1.01
##  6 no    yes       t3    score       12  16.8 1.53
##  7 yes   no        t1    score       12  11.7 0.938
##  8 yes   no        t2    score       12  12.4 1.42
##  9 yes   no        t3    score       12  13.8 1.43
## 10 yes   yes       t1    score       12  11.4 1.09
## 11 yes   yes       t2    score       12  13.2 1.22
## 12 yes   yes       t3    score       12  14.7 0.625
# 可视化
bxp <- ggboxplot(
  weightloss, x = "exercises", y = "score",
  color = "time", palette = "jco",
  facet.by = "diet", short.panel.labs = FALSE
  )
bxp
# 对假设进行检验
# 异常值,无极端值
weightloss %>%
  group_by(diet, exercises, time) %>%
  identify_outliers(score)
## # A tibble: 5 x 7
##   diet  exercises time  id    score is.outlier is.extreme
##   <fct> <fct>     <fct> <fct> <dbl> <lgl>      <lgl>
## 1 no    no        t3    2      13.2 TRUE       FALSE
## 2 yes   no        t1    1      10.2 TRUE       FALSE
## 3 yes   no        t1    3      13.2 TRUE       FALSE
## 4 yes   no        t1    4      10.2 TRUE       FALSE
## 5 yes   no        t2    10     15.3 TRUE       FALSE
# 正态性;全部符合
weightloss %>%
  group_by(diet, exercises, time) %>%
  shapiro_test(score)
## # A tibble: 12 x 6
##    diet  exercises time  variable statistic     p
##    <fct> <fct>     <fct> <chr>        <dbl> <dbl>
##  1 no    no        t1    score        0.917 0.264
##  2 no    no        t2    score        0.957 0.743
##  3 no    no        t3    score        0.965 0.851
##  4 no    yes       t1    score        0.922 0.306
##  5 no    yes       t2    score        0.912 0.229
##  6 no    yes       t3    score        0.953 0.674
##  7 yes   no        t1    score        0.942 0.528
##  8 yes   no        t2    score        0.982 0.989
##  9 yes   no        t3    score        0.931 0.395
## 10 yes   yes       t1    score        0.914 0.241
## 11 yes   yes       t2    score        0.947 0.598
## 12 yes   yes       t3    score        0.937 0.464
# qq图
ggqqplot(weightloss, "score", ggtheme = theme_bw()) +
  facet_grid(diet + exercises ~ time, labeller = "label_both")
# 计算
res.aov <- anova_test(
  data = weightloss, dv = score, wid = id,
  within = c(diet, exercises, time)
  )
# 饮食、锻炼和时间三者的交互作用具有统计学意义,F(2,22) = 14.24, p = 0.00011
get_anova_table(res.aov)
## ANOVA Table (type III tests)
##
##                Effect  DFn   DFd       F        p p<.05   ges
## 1                diet 1.00 11.00   6.021 3.20e-02     * 0.028
## 2           exercises 1.00 11.00  58.928 9.65e-06     * 0.284
## 3                time 2.00 22.00 110.942 3.22e-12     * 0.541
## 4      diet:exercises 1.00 11.00  75.356 2.98e-06     * 0.157
## 5           diet:time 1.38 15.17   0.603 5.01e-01       0.013
## 6      exercises:time 2.00 22.00  20.826 8.41e-06     * 0.274
## 7 diet:exercises:time 2.00 22.00  14.246 1.07e-04     * 0.147
# 进行事后检验

## 如果3个因素存在交互作用

# 1. Simple two-way interaction: run two-way interaction at each level of third variable,
# 2. Simple simple main effect: run one-way model at each level of second variable
# 3. simple simple pairwise comparisons: run pairwise or other post-hoc comparisons if necessary.

# 按饮食分层,对剩下两因素分析
two.way <- weightloss %>%
  group_by(diet) %>%
  anova_test(dv = score, wid = id, within = c(exercises, time))
two.way
## # A tibble: 2 x 2
##   diet  anova
##   <fct> <list>
## 1 no    <anov_tst [3]>
## 2 yes   <anov_tst [3]>
# 提取方差分析结果
# 可以看出,在非饮食组,运动和时间存在交互,而饮食组不存在
get_anova_table(two.way)
## # A tibble: 6 x 8
##   diet  Effect           DFn   DFd     F        p `p<.05`   ges
##   <fct> <chr>          <dbl> <dbl> <dbl>    <dbl> <chr>   <dbl>
## 1 no    exercises          1    11 72.8  3.53e- 6 "*"     0.526
## 2 no    time               2    22 71.7  2.32e-10 "*"     0.587
## 3 no    exercises:time     2    22 28.9  6.92e- 7 "*"     0.504
## 4 yes   exercises          1    11 13.4  4.00e- 3 "*"     0.038
## 5 yes   time               2    22 20.5  9.30e- 6 "*"     0.49
## 6 yes   exercises:time     2    22  2.57 9.90e- 2 ""      0.06
# 计算时间效应
time.effect <- weightloss %>%
  group_by(diet, exercises) %>%
  anova_test(dv = score, wid = id, within = time)
time.effect
## # A tibble: 4 x 3
##   diet  exercises anova
##   <fct> <fct>     <list>
## 1 no    no        <anov_tst [3]>
## 2 no    yes       <anov_tst [3]>
## 3 yes   no        <anov_tst [3]>
## 4 yes   yes       <anov_tst [3]>
get_anova_table(time.effect) %>%
  filter(diet == "no")
## # A tibble: 2 x 9
##   diet  exercises Effect   DFn   DFd     F        p `p<.05`   ges
##   <fct> <fct>     <chr>  <dbl> <dbl> <dbl>    <dbl> <chr>   <dbl>
## 1 no    no        time       2    22  1.32 2.86e- 1 ""      0.075
## 2 no    yes       time       2    22 78.8  9.30e-11 "*"     0.801
# 多重比较
pwc <- weightloss %>%
  group_by(diet, exercises) %>%
  pairwise_t_test(score ~ time, paired = TRUE, p.adjust.method = "bonferroni") %>%
  select(-df, -statistic)
# 筛选非饮食和锻炼
pwc %>% filter(diet == "no", exercises == "yes") %>%
  select(-p)
## # A tibble: 3 x 9
##   diet  exercises .y.   group1 group2    n1    n2        p.adj p.adj.signif
##   <fct> <fct>     <chr> <chr>  <chr>  <int> <int>        <dbl> <chr>
## 1 no    yes       score t1     t2        12    12 0.000741     ***
## 2 no    yes       score t1     t3        12    12 0.0000000121 ****
## 3 no    yes       score t2     t3        12    12 0.000257     ***
# 可视化

pwc <- pwc %>% add_xy_position(x = "exercises")
pwc.filtered <- pwc %>%
  filter(diet == "no", exercises == "yes")
bxp +
  stat_pvalue_manual(pwc.filtered, tip.length = 0, hide.ns = TRUE) +
  labs(
    subtitle = get_test_label(res.aov, detailed = TRUE),
    caption = get_pwc_label(pwc)
  )

结束语

无论是三因素还是两因素,总体的思路就是如果存在交互作用,就不断的分层分层,如果不存在交互,那就直接进行事后比较。 love&peace