Ice Cream Survey analysis
This work is a preliminary data analysis task done on a survey data set done on ice cream brands. Its purpose is to clean and prepare the data set so that it can be used most easily by the analysts. This data set accompanies the “Seven Summits of Marketing Research” text by Greg Allenby and Jeff Brazell and is used with their permission.
knitr::include_graphics("./ice_cream.png")
knitr::include_graphics("./survey.png")
Reading the Ice Cream data set
The first step is to load the data and required packages.
library(dplyr)
library(psych)
library(DT)
library(tidyverse)
ic<- read.csv("./IceCream_raw.csv")
#ic<- read.csv("http://bus-sawtooth.mcmaster.ca/M733_ONLINE_F2020/IceCream_raw.csv")
and take a glance at what it offers.
headTail(ic) %>% datatable( rownames = TRUE, filter="top", options = list(pageLength = 10, scrollX=T))
Reducing the raw data to essential variables
In this analysis we are only focusing on the Dreyer’s brand. Based on the questionnaires we only need to analyze certain columns.
d_names <- c(grep("^D.$", names(ic), value = TRUE), "D10")
q11 <- grep("^Q11_", names(ic), value = TRUE)
q11_1 <- grep("^Q11_1_", q11, value = TRUE) ##Only for the Dreyer's brand
s7 <- grep("^S7_", names(ic), value = TRUE)
s8 <- grep("^S8_", names(ic), value = TRUE)
col_names <- c("ID",s7, s8, "Q1_1","Q2_1","Q3_1", q11_1, d_names)
The data and its selected variables are depicted below. The S7 and S8 are chose from the screener questions to identify the participants who have heard about or have purchased our brand in the past 6 months. Also the behavioral questions needed for this analysis that are Q1,Q2,Q3 and Q11 are selected for the Dreyer’s brand. Each question is coded with a number in its name that shows the corresponding ice cream brand. The number assigned to the Dreyer’s brand is 1, therefore we choose Q1_1, Q2_1 ,etc.
ic_sub <- ic[col_names]
headTail(ic_sub,5,5) %>% datatable( rownames = TRUE, filter="top", options = list(pageLength = 12, scrollX=T))
Filtering the participants
Starting to inspect the Dryer’s brand. it is evident that many NA values are present in S8 questions. Which by looking at the S7 question (Having heard of this brand) it shows that those are the people who have not heard of this brand and therefore should be omitted from this analysis.
Heard of Dreyer’s Brand
S7_1
|
Freq
|
0
|
247
|
1
|
354
|
NA
|
0
|
Purchased Dreyer’s brand in last 6 months
S8_1
|
Freq
|
0
|
234
|
1
|
120
|
NA
|
247
|
The following tables shows the participants (rows) that have the S8_1 variable missing. It also shows that these are the people who have not heard of our brand (S7_1 == 0).
ic[is.na(ic["S8_1"]),c("ID", "S7_1", "S8_1","Q1_1","Q2_1","Q3_1" )] %>% datatable( rownames = TRUE, filter="top", options = list(pageLength = 10, scrollX=T))
filtering rows
As the variable S7_1 has no missing values and it can effectively identify the people who have not heard of, therefore not purchased the Dreyer’s brand. These are the people who we need to exclude from the analysis and save the remaining subset to ic_sub_1
ic_sub_1 <- ic_sub %>% filter(!(S7_1 == 0))
Checking the S7_1 and S8_1 in the new filtered data
sjmisc::frq(ic_sub_1$S7_1, out = 'v', title = "Heard of Dreyer's Brand (S7_1)")
Heard of Dreyer’s Brand (S7_1)
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
1
|
|
354
|
100
|
100
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=1.00 · σ=0.00
|
sjmisc::frq(ic_sub_1$S8_1, out = 'v', title = "Purchased Dreyer's brand in last 6 months (S8_1)")
Purchased Dreyer’s brand in last 6 months (S8_1)
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
0
|
|
234
|
66.1
|
66.1
|
66.1
|
1
|
|
120
|
33.9
|
33.9
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=0.34 · σ=0.47
|
You can see that there is no missing values anymore in these two variables.
Missing values analyses
We filtered our subset of the data for the Dreyer’s brand analysis. Now let’s look at missing values in this subset.
options(max.print = 100000)
library(inspectdf)
ic_sub_1 %>% inspect_na() %>% datatable( rownames = TRUE, filter="top", options = list(pageLength = 10, scrollX=T))
and the following figures to see the pattern of the missing data.
library(VIM)
ic_sub_1[, c(2:47)] %>% aggr(col=c('navyblue','red'), numbers=TRUE, sortVars=TRUE, cex.axis=.7, gap=3, ylab=c("Histogram of missing data","Pattern"))
Variables sorted by number of missings:
Variable Count
Q1_1 0.66101695
Q2_1 0.66101695
Q3_1 0.66101695
S8_3 0.45762712
Q11_1_1 0.28248588
Q11_1_2 0.28248588
Q11_1_3 0.28248588
Q11_1_4 0.28248588
Q11_1_7 0.28248588
Q11_1_8 0.28248588
Q11_1_9 0.28248588
Q11_1_10 0.28248588
Q11_1_11 0.28248588
Q11_1_13 0.28248588
Q11_1_15 0.28248588
Q11_1_5 0.27966102
Q11_1_6 0.27966102
Q11_1_12 0.27966102
Q11_1_14 0.27966102
Q11_1_16 0.27966102
Q11_1_17 0.27966102
S8_2 0.25423729
S8_7 0.17796610
S8_4 0.03389831
S8_5 0.01412429
S8_6 0.01412429
S7_1 0.00000000
S7_2 0.00000000
S7_3 0.00000000
S7_4 0.00000000
S7_5 0.00000000
S7_6 0.00000000
S7_7 0.00000000
S7_99 0.00000000
S8_1 0.00000000
S8_99 0.00000000
D1 0.00000000
D2 0.00000000
D3 0.00000000
D4 0.00000000
D5 0.00000000
D6 0.00000000
D7 0.00000000
D8 0.00000000
D9 0.00000000
D10 0.00000000
First the S7 and S8 for other brands are removed. Now we look at the remaining variables and their missing counts.
ic_sub_2 <- ic_sub_1[c("ID", "S7_1", "S8_1", "Q1_1", "Q2_1", "Q3_1", q11_1, d_names)]
library(inspectdf)
ic_sub_2 %>% inspect_na %>% datatable( rownames = TRUE, filter="top", options = list(pageLength = 10, scrollX=T))
The demographics variables have no missing values. The subset of S7_1 and S8_1 has no missing point either. We only have 28% missing in Q11 questions and 66% for Q1-Q3. The latter makes total sense because those are the people who have not purchased the product in last 6 months, therefore, have not been asked to answer Q1-Q3 questions. These questions are missing 234 rows which is the same number of people who have not purchased the brand in S8_1. The code below shows this.
ic_sub_2[c("ID", "S7_1", "S8_1", "Q1_1", "Q2_1", "Q3_1")] %>% filter(S8_1 ==0) %>% nrow()
[1] 234
Therefore, the number of missing points in Q1-Q3 is the same as number of people who have not purchased the brand in 6 months and their missing values should not be imputed as it does not make sense to do so!
Imputing the valid variables using mice
The only valid variables that require imputation are the seventeen Q11s. So only this subset of our “ic_sub_2” is passed to ‘mice’ function. The following code uses ‘mice’ to generate m=5 new imputed data sets using the Random Forest Imputation method.
library(mice)
set.seed(456)
ic_to_impute <- ic_sub_2[q11_1]
tempData <- mice(ic_to_impute, m=5, maxit=50, meth='rf', seed=500, print=FALSE) # This uses 'mice' to generate m=5 new imputed datasets using the 'rf' method and '500' as a seed to get the process initiated.
Taking a look at the imputed subset, it can be seen that there is no missing value anymore.
compData <- mice::complete(tempData, 1)
headTail(compData, 20) %>% datatable( rownames = TRUE, filter="top", options = list(pageLength = 10, scrollX=T))
# No missing point!
compData %>% inspect_na() %>% datatable( rownames = TRUE, filter="top", options = list(pageLength = 10, scrollX=T))
Using the ‘sjmisc’ package to merge the 5 datasets into 1
The “merge_imputations()” function from the ‘sjmisc’ package merges the 5 imputed data sets into one with the original number of rows, i.e., not in long format and depicts some comparison plots between the original variables and their imputed version.
library(sjmisc)
mice_mrg <- merge_imputations(
ic_to_impute,
tempData,
summary = c("hist" ),
filter = NULL
)
mice_mrg
Printing the merged of all 5 imputed data set
head( mice_mrg$data, 20 ) %>% datatable(rownames = TRUE, filter="top", options = list(pageLength = 10, scrollX=T))
Comparing the imputed and original data sets statistically
The t-test is done to compare the means and variance of the imputed Q11s and the original subset.
1st mice imp. using rf: Comparing means
|
fs n
|
fs micerf n
|
fs mean
|
micerf mean
|
p-value
|
Q11_1_1
|
254
|
354
|
4.15
|
4.16
|
0.94
|
Q11_1_2
|
254
|
354
|
4.17
|
4.17
|
0.97
|
Q11_1_3
|
254
|
354
|
4.07
|
4.05
|
0.83
|
Q11_1_4
|
254
|
354
|
3.93
|
3.95
|
0.85
|
Q11_1_5
|
255
|
354
|
4.58
|
4.62
|
0.75
|
Q11_1_6
|
255
|
354
|
4.04
|
4.04
|
0.97
|
Q11_1_7
|
254
|
354
|
3.76
|
3.79
|
0.78
|
Q11_1_8
|
254
|
354
|
4.64
|
4.69
|
0.65
|
Q11_1_9
|
254
|
354
|
4.11
|
4.10
|
0.91
|
Q11_1_10
|
254
|
354
|
4.45
|
4.48
|
0.80
|
Q11_1_11
|
254
|
354
|
4.83
|
4.90
|
0.52
|
Q11_1_12
|
255
|
354
|
4.78
|
4.88
|
0.37
|
Q11_1_13
|
254
|
354
|
4.26
|
4.25
|
0.89
|
Q11_1_14
|
255
|
354
|
3.59
|
3.61
|
0.85
|
Q11_1_15
|
254
|
354
|
3.47
|
3.52
|
0.66
|
Q11_1_16
|
255
|
354
|
4.77
|
4.82
|
0.65
|
Q11_1_17
|
255
|
354
|
4.70
|
4.81
|
0.33
|
Comparing variances
|
fs var
|
micerf var
|
p-value
|
Q11_1_1
|
2.34
|
1.75
|
0.01
|
Q11_1_2
|
2.02
|
1.50
|
0.01
|
Q11_1_3
|
2.11
|
1.54
|
0.01
|
Q11_1_4
|
2.18
|
1.62
|
0.01
|
Q11_1_5
|
2.18
|
1.66
|
0.02
|
Q11_1_6
|
2.08
|
1.52
|
0.01
|
Q11_1_7
|
2.65
|
2.02
|
0.02
|
Q11_1_8
|
2.28
|
1.71
|
0.01
|
Q11_1_9
|
1.95
|
1.45
|
0.01
|
Q11_1_10
|
2.07
|
1.58
|
0.02
|
Q11_1_11
|
1.93
|
1.49
|
0.03
|
Q11_1_12
|
2.04
|
1.56
|
0.02
|
Q11_1_13
|
1.85
|
1.39
|
0.01
|
Q11_1_14
|
2.31
|
1.78
|
0.02
|
Q11_1_15
|
1.94
|
1.47
|
0.02
|
Q11_1_16
|
2.14
|
1.61
|
0.01
|
Q11_1_17
|
2.35
|
1.79
|
0.02
|
By looking at the p-value and comparing them to our confidence interval of 95%. Imputation does well in the means but rather poorly in the variance. In the variance table, many of the seventeen questions have a p-value under the 0.05 limit. Evidently the merge-of-five imputations decreases the variance too much and has statistically significant difference with the original subset in terms of variance.
This was surprising to me therefore I looked at each of the 5 imputed data sets to see how they perform in terms of variance.
3rd mice imp. using rf: Comparing means
|
fs n
|
fs micerf n
|
fs mean
|
micerf mean
|
p-value
|
Q11_1_1
|
254
|
354
|
4.15
|
4.14
|
0.95
|
Q11_1_2
|
254
|
354
|
4.17
|
4.16
|
0.93
|
Q11_1_3
|
254
|
354
|
4.07
|
4.06
|
0.89
|
Q11_1_4
|
254
|
354
|
3.93
|
3.91
|
0.87
|
Q11_1_5
|
255
|
354
|
4.58
|
4.62
|
0.75
|
Q11_1_6
|
255
|
354
|
4.04
|
4.04
|
0.97
|
Q11_1_7
|
254
|
354
|
3.76
|
3.76
|
0.99
|
Q11_1_8
|
254
|
354
|
4.64
|
4.71
|
0.59
|
Q11_1_9
|
254
|
354
|
4.11
|
4.12
|
0.93
|
Q11_1_10
|
254
|
354
|
4.45
|
4.47
|
0.88
|
Q11_1_11
|
254
|
354
|
4.83
|
4.88
|
0.69
|
Q11_1_12
|
255
|
354
|
4.78
|
4.88
|
0.41
|
Q11_1_13
|
254
|
354
|
4.26
|
4.23
|
0.79
|
Q11_1_14
|
255
|
354
|
3.59
|
3.55
|
0.74
|
Q11_1_15
|
254
|
354
|
3.47
|
3.47
|
0.98
|
Q11_1_16
|
255
|
354
|
4.77
|
4.82
|
0.69
|
Q11_1_17
|
255
|
354
|
4.70
|
4.84
|
0.26
|
Comparing variances
|
fs var
|
micerf var
|
p-value
|
Q11_1_1
|
2.34
|
1.88
|
0.06
|
Q11_1_2
|
2.02
|
1.64
|
0.07
|
Q11_1_3
|
2.11
|
1.68
|
0.05
|
Q11_1_4
|
2.18
|
1.78
|
0.08
|
Q11_1_5
|
2.18
|
1.89
|
0.22
|
Q11_1_6
|
2.08
|
1.61
|
0.03
|
Q11_1_7
|
2.65
|
2.55
|
0.72
|
Q11_1_8
|
2.28
|
1.92
|
0.14
|
Q11_1_9
|
1.95
|
1.56
|
0.05
|
Q11_1_10
|
2.07
|
1.74
|
0.13
|
Q11_1_11
|
1.93
|
1.72
|
0.33
|
Q11_1_12
|
2.04
|
1.74
|
0.17
|
Q11_1_13
|
1.85
|
1.52
|
0.09
|
Q11_1_14
|
2.31
|
2.08
|
0.35
|
Q11_1_15
|
1.94
|
1.75
|
0.36
|
Q11_1_16
|
2.14
|
1.85
|
0.22
|
Q11_1_17
|
2.35
|
2.02
|
0.19
|
As seen in the above tables, the single imputed data sets perfrom much better in terms of not changing the variance significantly. Moreover, the 4th imputed data set performs better than all five and the merged one.
The below histograms for one of the imputed variables (Q11_1_1) shows that the merged imputed data set is inclined to impute the missing values to be equal to the mod and thus, decrease the variance significantly. Whereas, the single imputed data set distributes the missing values in all 7 levels based on the original mean and variance.
pacman::p_load(epiDisplay)
i = q11_1[7]
tab1(ic_to_impute[i], main = "Q11_1_1 distribution in the original data set" )
ic_to_impute[i] :
Frequency %(NA+) %(NA-)
1 33 9.3 13.0
2 28 7.9 11.0
3 34 9.6 13.4
4 77 21.8 30.3
5 50 14.1 19.7
6 19 5.4 7.5
7 13 3.7 5.1
<NA> 100 28.2 0.0
Total 354 100.0 100.0
tab1(mice_mrg$data[i], main = "Q11_1_1 distribution in the merged-of-5 imputed data set" )
mice_mrg$data[i] :
Frequency Percent Cum. percent
1 33 9.3 9.3
2 30 8.5 17.8
3 55 15.5 33.3
4 141 39.8 73.2
5 63 17.8 91.0
6 19 5.4 96.3
7 13 3.7 100.0
Total 354 100.0 100.0
tab1(compData[i], main = "Q11_1_1 distribution in the 4th imputed data set")
compData[i] :
Frequency Percent Cum. percent
1 43 12.1 12.1
2 40 11.3 23.4
3 51 14.4 37.9
4 103 29.1 66.9
5 77 21.8 88.7
6 23 6.5 95.2
7 17 4.8 100.0
Total 354 100.0 100.0
Combining the imputed data
The best imputed data set is the fourth one in all 5 imputed data sets. Therefore, it is chosen to be combined with the original data set.
ic_sub_2[q11_1] <- compData
ic_sub_2 %>% datatable( rownames = TRUE, filter="top", caption = "The imputed and complete data set", options = list(pageLength = 10, scrollX=T))
Renaming the variables
The variables in the final subset of the ice cream data are renamed to make further analysis easier.
demo_names <- c("Gender", "Age", "Marital", "Children_less_18", "Household_size", "Education", "Employment", "Ethnicity",
"Household_income", "Residence_state")
names(ic_sub_2) <- c(c("ID", "Heard_of_brand", "Purchased_last6mo", "Satisfaction", "Buying_likelihood", "Recommend_likelihood",
"is_relaxing", "is_wholesome", "is_fun",
"is_exciting","is_premium_quality", "is_memorable", "is_treat", "is_good_for_regular",
"is_interesting", "taste_better_other_brands", "has_many_flavors", "is_enjoyable", "has_best_value/price",
"is_organic", "is_low_cal", "is_great_for_family", "is_great_for_guests") , demo_names)
ic_sub_2 %>% datatable( rownames = TRUE, filter="top", options = list(pageLength = 10, scrollX=T))
Recoding the demographic variables
Now it’s time to recode the values of different variables to a more meaningful format. Let’s first look at the demographic variables and see their current values.
library(wrapr)
tt<- function(x) { table( x, useNA = "ifany" ) }
24:33 %.>% ( function(x) { sapply( ic_sub_2[, (x)], tt ) } ) (.)
$Gender
x
1 2
127 227
$Age
x
2 3 4 5 6 7
18 75 49 60 58 94
$Marital
x
1 2 3 4 5 6 7
71 36 185 43 2 14 3
$Children_less_18
x
1 2
61 293
$Household_size
x
1 2 3 4 5 6 7 11
73 177 60 31 10 1 1 1
$Education
x
1 2 3 4 5 6
20 10 114 110 28 72
$Employment
x
1 2 3 4 5 6
159 35 16 23 101 20
$Ethnicity
x
1 2 3 4 5 6
285 19 13 22 9 6
$Household_income
x
1 2 3 4 5 6 7 8 9
126 106 43 16 16 7 5 1 34
$Residence_state
x
1 2 3 4 5 6 7 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
3 7 9 1 63 9 6 9 3 5 5 16 6 3 3 3 1 2 2 4 16 8 2 6 1 3
29 30 31 32 33 34 36 37 38 39 41 42 43 44 45 46 47 48 50
10 1 8 2 8 2 10 3 12 13 4 1 6 38 3 1 5 22 9
Now using different tools and the provided questionnaires, the following demographic variables are recoded and depicted in tables below.
ic_sub_2$Gender <- ifelse( ic_sub_2$Gender == 1, "Male", "Female")
frq(ic_sub_2$Gender, out="v", show.na=TRUE, title= "Gender of the participants")
Gender of the participants
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Female
|
|
227
|
64.12
|
64.12
|
64.12
|
Male
|
|
127
|
35.88
|
35.88
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=1.36 · σ=0.48
|
ic_sub_2$Age <- recode_factor(ic_sub_2$Age,`1` = "Under 18",
`2` = "18-24",
`3` = "25-34",
`4` = "35-44",
`5` = "45-54",
`6` = "55-64",
`7` = "65 or older")
frq(ic_sub_2$Age, out="v", show.na=TRUE, title= "Age of the participants")
Age of the participants
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
18-24
|
|
18
|
5.08
|
5.08
|
5.08
|
25-34
|
|
75
|
21.19
|
21.19
|
26.27
|
35-44
|
|
49
|
13.84
|
13.84
|
40.11
|
45-54
|
|
60
|
16.95
|
16.95
|
57.06
|
55-64
|
|
58
|
16.38
|
16.38
|
73.45
|
65 or older
|
|
94
|
26.55
|
26.55
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=3.98 · σ=1.64
|
ic_sub_2$Marital <- recode_factor(ic_sub_2$Marital,`1` = "Single, not living with domestic partner",
`2` = "Single, living with domestic partner",
`3` = "Married",
`4` = "Divorced",
`5` = "Separated",
`6` = "Widowed",
`7` = "Prefer not to say")
frq(ic_sub_2$Marital, out="v",sort.frq = c("desc"), show.na=TRUE, title= "Marital status of the participants")
Marital status of the participants
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Married
|
|
185
|
52.26
|
52.26
|
52.26
|
Single, not living with domestic partner
|
|
71
|
20.06
|
20.06
|
72.32
|
Divorced
|
|
43
|
12.15
|
12.15
|
84.46
|
Single, living with domestic partner
|
|
36
|
10.17
|
10.17
|
94.63
|
Widowed
|
|
14
|
3.95
|
3.95
|
98.59
|
Prefer not to say
|
|
3
|
0.85
|
0.85
|
99.44
|
Separated
|
|
2
|
0.56
|
0.56
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=2.78 · σ=1.22
|
ic_sub_2$Children_less_18 <- ifelse(ic_sub_2$Children_less_18 == 1, "Yes", "No")
frq(ic_sub_2$Children_less_18, out="v", show.na=TRUE, title= "Does Participants have children under the age of 18 living with them")
Does Participants have children under the age of 18 living with them
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
No
|
|
293
|
82.77
|
82.77
|
82.77
|
Yes
|
|
61
|
17.23
|
17.23
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=1.17 · σ=0.38
|
No need to recode the household size as it is a numerical variable
frq(ic_sub_2$Household_size, out="v", show.na=TRUE, title= "Number of people living in participants household")
Number of people living in participants household
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
1
|
|
73
|
20.62
|
20.62
|
20.62
|
2
|
|
177
|
50
|
50
|
70.62
|
3
|
|
60
|
16.95
|
16.95
|
87.57
|
4
|
|
31
|
8.76
|
8.76
|
96.33
|
5
|
|
10
|
2.82
|
2.82
|
99.15
|
6
|
|
1
|
0.28
|
0.28
|
99.44
|
7
|
|
1
|
0.28
|
0.28
|
99.72
|
11
|
|
1
|
0.28
|
0.28
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=2.27 · σ=1.12
|
ic_sub_2$Education <- recode_factor(ic_sub_2$Education,`1` = "High school or less",
`2` = "Trade/Technical school",
`3` = "Some college or Associate's Degree",
`4` = "Graduated College/Bachelor's Degree",
`5` = "Attended Graduate School",
`6` = "Advanced Degree (Master's, PhD.D.)")
frq(ic_sub_2$Education, out="v", show.na=TRUE, title= "Education of the participants")
Education of the participants
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
High school or less
|
|
20
|
5.65
|
5.65
|
5.65
|
Trade/Technical school
|
|
10
|
2.82
|
2.82
|
8.47
|
Some college or Associate’s Degree
|
|
114
|
32.2
|
32.2
|
40.68
|
Graduated College/Bachelor’s Degree
|
|
110
|
31.07
|
31.07
|
71.75
|
Attended Graduate School
|
|
28
|
7.91
|
7.91
|
79.66
|
Advanced Degree (Master’s, PhD.D.)
|
|
72
|
20.34
|
20.34
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=3.94 · σ=1.36
|
ic_sub_2$Ethnicity <- recode_factor(ic_sub_2$Ethnicity,`1` = "High school or less",
`2` = "Trade/Technical school",
`3` = "Some college or Associate's Degree",
`4` = "Graduated College/Bachelor's Degree",
`5` = "Attended Graduate School",
`6` = "Advanced Degree (Master's, PhD.D.)")
frq(ic_sub_2$Ethnicity, out="v", show.na=TRUE, title= "Education of the participants")
Education of the participants
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
High school or less
|
|
285
|
80.51
|
80.51
|
80.51
|
Trade/Technical school
|
|
19
|
5.37
|
5.37
|
85.88
|
Some college or Associate’s Degree
|
|
13
|
3.67
|
3.67
|
89.55
|
Graduated College/Bachelor’s Degree
|
|
22
|
6.21
|
6.21
|
95.76
|
Attended Graduate School
|
|
9
|
2.54
|
2.54
|
98.31
|
Advanced Degree (Master’s, PhD.D.)
|
|
6
|
1.69
|
1.69
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=1.50 · σ=1.16
|
ic_sub_2$Employment <- recode_factor(ic_sub_2$Employment,`1` = "Employed full-time (30+ hrs)",
`2` = "Employed part time",
`3` = "Not Currently Employed",
`4` = "Student",
`5` = "Retired",
`6` = "Homemaker")
frq(ic_sub_2$Employment, out="v", show.na=TRUE, title= "Education of the participants")
Education of the participants
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Employed full-time (30+ hrs)
|
|
159
|
44.92
|
44.92
|
44.92
|
Employed part time
|
|
35
|
9.89
|
9.89
|
54.8
|
Not Currently Employed
|
|
16
|
4.52
|
4.52
|
59.32
|
Student
|
|
23
|
6.5
|
6.5
|
65.82
|
Retired
|
|
101
|
28.53
|
28.53
|
94.35
|
Homemaker
|
|
20
|
5.65
|
5.65
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=2.81 · σ=1.89
|
ic_sub_2$Household_income <- recode_factor(ic_sub_2$Household_income,`1` = "under $50,000",
`2` = "$50,000 just under $75,000",
`3` = "$75,000 just under $100,000",
`4` = "$100,000 just under $125,000",
`5` = "$125,000 just under $150,000",
`6` = "$150,000 just under $175,000",
`7` = "$175,000 just under $200,000",
`8` = "$200,000 or more",
`9` = "Prefer not to say")
frq(ic_sub_2$Household_income, out="v", show.na=TRUE, title= "Household income of the participants")
Household income of the participants
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
under $50,000
|
|
126
|
35.59
|
35.59
|
35.59
|
$50,000 just under $75,000
|
|
106
|
29.94
|
29.94
|
65.54
|
$75,000 just under $100,000
|
|
43
|
12.15
|
12.15
|
77.68
|
$100,000 just under $125,000
|
|
16
|
4.52
|
4.52
|
82.2
|
$125,000 just under $150,000
|
|
16
|
4.52
|
4.52
|
86.72
|
$150,000 just under $175,000
|
|
7
|
1.98
|
1.98
|
88.7
|
$175,000 just under $200,000
|
|
5
|
1.41
|
1.41
|
90.11
|
$200,000 or more
|
|
1
|
0.28
|
0.28
|
90.4
|
Prefer not to say
|
|
34
|
9.6
|
9.6
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=2.83 · σ=2.42
|
The states were assigned a number, the actual name is not evident. There are a total of 50 states.
frq(ic_sub_2$Residence_state, out="v", show.na=TRUE, title= "Residence state of the participants")
Residence state of the participants
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
1
|
|
3
|
0.85
|
0.85
|
0.85
|
2
|
|
7
|
1.98
|
1.98
|
2.82
|
3
|
|
9
|
2.54
|
2.54
|
5.37
|
4
|
|
1
|
0.28
|
0.28
|
5.65
|
5
|
|
63
|
17.8
|
17.8
|
23.45
|
6
|
|
9
|
2.54
|
2.54
|
25.99
|
7
|
|
6
|
1.69
|
1.69
|
27.68
|
10
|
|
9
|
2.54
|
2.54
|
30.23
|
11
|
|
3
|
0.85
|
0.85
|
31.07
|
12
|
|
5
|
1.41
|
1.41
|
32.49
|
13
|
|
5
|
1.41
|
1.41
|
33.9
|
14
|
|
16
|
4.52
|
4.52
|
38.42
|
15
|
|
6
|
1.69
|
1.69
|
40.11
|
16
|
|
3
|
0.85
|
0.85
|
40.96
|
17
|
|
3
|
0.85
|
0.85
|
41.81
|
18
|
|
3
|
0.85
|
0.85
|
42.66
|
19
|
|
1
|
0.28
|
0.28
|
42.94
|
20
|
|
2
|
0.56
|
0.56
|
43.5
|
21
|
|
2
|
0.56
|
0.56
|
44.07
|
22
|
|
4
|
1.13
|
1.13
|
45.2
|
23
|
|
16
|
4.52
|
4.52
|
49.72
|
24
|
|
8
|
2.26
|
2.26
|
51.98
|
25
|
|
2
|
0.56
|
0.56
|
52.54
|
26
|
|
6
|
1.69
|
1.69
|
54.24
|
27
|
|
1
|
0.28
|
0.28
|
54.52
|
28
|
|
3
|
0.85
|
0.85
|
55.37
|
29
|
|
10
|
2.82
|
2.82
|
58.19
|
30
|
|
1
|
0.28
|
0.28
|
58.47
|
31
|
|
8
|
2.26
|
2.26
|
60.73
|
32
|
|
2
|
0.56
|
0.56
|
61.3
|
33
|
|
8
|
2.26
|
2.26
|
63.56
|
34
|
|
2
|
0.56
|
0.56
|
64.12
|
36
|
|
10
|
2.82
|
2.82
|
66.95
|
37
|
|
3
|
0.85
|
0.85
|
67.8
|
38
|
|
12
|
3.39
|
3.39
|
71.19
|
39
|
|
13
|
3.67
|
3.67
|
74.86
|
41
|
|
4
|
1.13
|
1.13
|
75.99
|
42
|
|
1
|
0.28
|
0.28
|
76.27
|
43
|
|
6
|
1.69
|
1.69
|
77.97
|
44
|
|
38
|
10.73
|
10.73
|
88.7
|
45
|
|
3
|
0.85
|
0.85
|
89.55
|
46
|
|
1
|
0.28
|
0.28
|
89.83
|
47
|
|
5
|
1.41
|
1.41
|
91.24
|
48
|
|
22
|
6.21
|
6.21
|
97.46
|
50
|
|
9
|
2.54
|
2.54
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=24.56 · σ=16.31
|
Recoding the Perception and Attitude variables
Let’s look at the current values of these variables. Which are Q1-Q3 and Q11_1_1 to Q11_1_17
frq(ic_sub_2$Satisfaction, out="v", show.na=TRUE, title= "Different values in the Behavioral questions")
Different values in the Behavioral questions
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
1
|
|
1
|
0.28
|
0.83
|
0.83
|
2
|
|
1
|
0.28
|
0.83
|
1.67
|
4
|
|
7
|
1.98
|
5.83
|
7.5
|
5
|
|
22
|
6.21
|
18.33
|
25.83
|
6
|
|
43
|
12.15
|
35.83
|
61.67
|
7
|
|
46
|
12.99
|
38.33
|
100
|
NA
|
NA
|
234
|
66.1
|
NA
|
NA
|
total N=354 · valid N=120 · x̄=6.01 · σ=1.07
|
The values span from 1 to 7 therefore we use a 7 Point Likert Scale to recode them. “1” being Extremely negative and “7” the extremely positive.
As these variables would be used for further analyses, to save their natural numerical format, the recoded categorical columns would be appended to end of data set.
ic_sub_2$Satisfaction_cat <- dplyr::recode_factor(ic_sub_2$Satisfaction,
`1` = "Extremely Dissatisfied",
`2` = "Dissatisfied",
`3` = "Somewhat Dissatisfied",
`4` = "Neither Dissatisfied nor Satisfied",
`5` = "Somewhat Satisfied",
`6` = "Satisfied",
`7` = "Extremely Satisfied"
)
y <- c("Extremely Satisfied","Satisfied", "Somewhat Satisfied", "Neither Dissatisfied nor Satisfied",
"Somewhat Dissatisfied", "Dissatisfied", "Extremely Dissatisfied")
ic_sub_2$Satisfaction_cat <- factor(ic_sub_2$Satisfaction_cat, levels = y )
frq(ic_sub_2$Satisfaction_cat, out="v", show.na=TRUE, title= "How satisfied with Ice Cream Brand")
How satisfied with Ice Cream Brand
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Extremely Satisfied
|
|
46
|
12.99
|
38.33
|
38.33
|
Satisfied
|
|
43
|
12.15
|
35.83
|
74.17
|
Somewhat Satisfied
|
|
22
|
6.21
|
18.33
|
92.5
|
Neither Dissatisfied nor Satisfied
|
|
7
|
1.98
|
5.83
|
98.33
|
Somewhat Dissatisfied
|
|
0
|
0
|
0
|
98.33
|
Dissatisfied
|
|
1
|
0.28
|
0.83
|
99.17
|
Extremely Dissatisfied
|
|
1
|
0.28
|
0.83
|
100
|
NA
|
NA
|
234
|
66.1
|
NA
|
NA
|
total N=354 · valid N=120 · x̄=1.99 · σ=1.07
|
Now we look at Q2 or Buying_likelihood variable in the data set.
ic_sub_2$Buying_likelihood_cat <- dplyr::recode_factor(ic_sub_2$Buying_likelihood,
`1` = "Not at all Likely to purchase again",
`2` = "Not Likely to purchase again",
`3` = "Not Somewhat Likely to purchase again",
`4` = "Neither Likely nor Unlikely to purchase again",
`5` = "Somewhat Likely to purchase again",
`6` = "Likely to purchase again",
`7` = "Extremely Likely to purchase again"
)
y <- c("Extremely Likely to purchase again", "Likely to purchase again", "Somewhat Likely to purchase again",
"Neither Likely nor Unlikely to purchase again", "Somewhat Not Likely to purchase again",
"Not Likely to purchase again", "Not at all Likely to purchase again")
ic_sub_2$Buying_likelihood_cat <- factor(ic_sub_2$Buying_likelihood_cat, levels = y )
frq(ic_sub_2$Buying_likelihood_cat, out="v", show.na=TRUE, title= "How Likely to purchase brand again")
How Likely to purchase brand again
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Extremely Likely to purchase again
|
|
74
|
20.9
|
61.67
|
61.67
|
Likely to purchase again
|
|
29
|
8.19
|
24.17
|
85.83
|
Somewhat Likely to purchase again
|
|
11
|
3.11
|
9.17
|
95
|
Neither Likely nor Unlikely to purchase again
|
|
6
|
1.69
|
5
|
100
|
Somewhat Not Likely to purchase again
|
|
0
|
0
|
0
|
100
|
Not Likely to purchase again
|
|
0
|
0
|
0
|
100
|
Not at all Likely to purchase again
|
|
0
|
0
|
0
|
100
|
NA
|
NA
|
234
|
66.1
|
NA
|
NA
|
total N=354 · valid N=120 · x̄=1.57 · σ=0.86
|
Now we look at Q3 or Recommendation likelihood variable in the data set.
ic_sub_2$Recommend_likehood_cat <- dplyr::recode_factor(ic_sub_2$Recommend_likelihood,
`1` = "Not at all Likely to recommend",
`2` = "Not Likely to recommend",
`3` = "Not Somewhat Likely to recommend",
`4` = "Neither Likely nor Unlikely to recommend",
`5` = "Somewhat Likely to recommend",
`6` = "Likely to recommend",
`7` = "Extremely Likely to recommend"
)
y <- c("Extremely Likely to recommend", "Likely to recommend", "Somewhat Likely to recommend",
"Neither Likely nor Unlikely to recommend", "Somewhat Not Likely to recommend",
"Not Likely to recommend", "Not at all Likely to recommend")
ic_sub_2$Recommend_likehood_cat <- factor(ic_sub_2$Recommend_likehood_cat, levels = y )
frq(ic_sub_2$Recommend_likehood_cat, out="v", show.na=TRUE, title= "How Likely to recommend brand again")
How Likely to recommend brand again
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Extremely Likely to recommend
|
|
57
|
16.1
|
48.72
|
48.72
|
Likely to recommend
|
|
40
|
11.3
|
34.19
|
82.91
|
Somewhat Likely to recommend
|
|
13
|
3.67
|
11.11
|
94.02
|
Neither Likely nor Unlikely to recommend
|
|
6
|
1.69
|
5.13
|
99.15
|
Somewhat Not Likely to recommend
|
|
0
|
0
|
0
|
99.15
|
Not Likely to recommend
|
|
1
|
0.28
|
0.85
|
100
|
Not at all Likely to recommend
|
|
0
|
0
|
0
|
100
|
NA
|
NA
|
237
|
66.95
|
NA
|
NA
|
total N=354 · valid N=117 · x̄=1.76 · σ=0.94
|
Recoding the brand perception and opinion variables
All 17 variables in this area (Q_1_1 to Q_1_17) could be coded with the same format. Again for not losing their numerical for further analyses, they are recoded into new variables and added to the data set. First we make those new categorical variables and then recode them. Tables of 17 “recoded” perception questions of Dreyer’s brand appear next.
library(forcats)
ftt <- function(x) { ordered( x, levels = c("1", "2", "3", "4", "5", "6", "7")) }
ic_sub_2[, 37:53] <- 7:24 %.>% ( function(x) { lapply( ic_sub_2[, (x)], ftt ) } ) (.)
rr <- function(x) { dplyr::recode_factor( x,
`1` = "Does not describe at all",
`2` = "Does not describe",
`3` = "Not Somewhat does describe",
`4` = "Neither does nor does not describe",
`5` = "Somewhat does describe well",
`6` = "Does describe well",
`7` = "Describes extremely well"
) }
ic_sub_2[, 37:53] <- 37:53 %.>% ( function(x) { lapply( ic_sub_2[, (x)], rr ) } ) (.)
frq( fct_rev(ic_sub_2[,37]), out="v", show.na=TRUE, title= "is relaxing")
is relaxing
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Describes extremely well
|
|
15
|
4.24
|
4.24
|
4.24
|
Does describe well
|
|
39
|
11.02
|
11.02
|
15.25
|
Somewhat does describe well
|
|
67
|
18.93
|
18.93
|
34.18
|
Neither does nor does not describe
|
|
157
|
44.35
|
44.35
|
78.53
|
Not Somewhat does describe
|
|
31
|
8.76
|
8.76
|
87.29
|
Does not describe
|
|
25
|
7.06
|
7.06
|
94.35
|
Does not describe at all
|
|
20
|
5.65
|
5.65
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=3.86 · σ=1.37
|
frq( fct_rev(ic_sub_2[,38]), out="v", show.na=TRUE, title= "is wholesome" )
is wholesome
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Describes extremely well
|
|
13
|
3.67
|
3.67
|
3.67
|
Does describe well
|
|
39
|
11.02
|
11.02
|
14.69
|
Somewhat does describe well
|
|
66
|
18.64
|
18.64
|
33.33
|
Neither does nor does not describe
|
|
161
|
45.48
|
45.48
|
78.81
|
Not Somewhat does describe
|
|
37
|
10.45
|
10.45
|
89.27
|
Does not describe
|
|
26
|
7.34
|
7.34
|
96.61
|
Does not describe at all
|
|
12
|
3.39
|
3.39
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=3.84 · σ=1.28
|
frq( fct_rev(ic_sub_2[,39]), out="v", show.na=TRUE, title= "is fun" )
is fun
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Describes extremely well
|
|
14
|
3.95
|
3.95
|
3.95
|
Does describe well
|
|
28
|
7.91
|
7.91
|
11.86
|
Somewhat does describe well
|
|
57
|
16.1
|
16.1
|
27.97
|
Neither does nor does not describe
|
|
181
|
51.13
|
51.13
|
79.1
|
Not Somewhat does describe
|
|
37
|
10.45
|
10.45
|
89.55
|
Does not describe
|
|
14
|
3.95
|
3.95
|
93.5
|
Does not describe at all
|
|
23
|
6.5
|
6.5
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=3.94 · σ=1.30
|
frq( fct_rev(ic_sub_2[,40]), out="v", show.na=TRUE, title= "is exciting" )
is exciting
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Describes extremely well
|
|
13
|
3.67
|
3.67
|
3.67
|
Does describe well
|
|
27
|
7.63
|
7.63
|
11.3
|
Somewhat does describe well
|
|
44
|
12.43
|
12.43
|
23.73
|
Neither does nor does not describe
|
|
176
|
49.72
|
49.72
|
73.45
|
Not Somewhat does describe
|
|
38
|
10.73
|
10.73
|
84.18
|
Does not describe
|
|
36
|
10.17
|
10.17
|
94.35
|
Does not describe at all
|
|
20
|
5.65
|
5.65
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=4.09 · σ=1.33
|
frq( fct_rev(ic_sub_2[,41]), out="v", show.na=TRUE, title= "has premium quality")
has premium quality
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Describes extremely well
|
|
31
|
8.76
|
8.76
|
8.76
|
Does describe well
|
|
71
|
20.06
|
20.06
|
28.81
|
Somewhat does describe well
|
|
70
|
19.77
|
19.77
|
48.59
|
Neither does nor does not describe
|
|
130
|
36.72
|
36.72
|
85.31
|
Not Somewhat does describe
|
|
29
|
8.19
|
8.19
|
93.5
|
Does not describe
|
|
13
|
3.67
|
3.67
|
97.18
|
Does not describe at all
|
|
10
|
2.82
|
2.82
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=3.38 · σ=1.37
|
frq( fct_rev(ic_sub_2[,42]), out="v", show.na=TRUE, title= "is memorable" )
is memorable
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Describes extremely well
|
|
17
|
4.8
|
4.8
|
4.8
|
Does describe well
|
|
20
|
5.65
|
5.65
|
10.45
|
Somewhat does describe well
|
|
61
|
17.23
|
17.23
|
27.68
|
Neither does nor does not describe
|
|
171
|
48.31
|
48.31
|
75.99
|
Not Somewhat does describe
|
|
48
|
13.56
|
13.56
|
89.55
|
Does not describe
|
|
21
|
5.93
|
5.93
|
95.48
|
Does not describe at all
|
|
16
|
4.52
|
4.52
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=3.96 · σ=1.27
|
frq( fct_rev(ic_sub_2[,43]), out="v", show.na=TRUE, title= "Bought as special treat but not regularly")
Bought as special treat but not regularly
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Describes extremely well
|
|
17
|
4.8
|
4.8
|
4.8
|
Does describe well
|
|
23
|
6.5
|
6.5
|
11.3
|
Somewhat does describe well
|
|
77
|
21.75
|
21.75
|
33.05
|
Neither does nor does not describe
|
|
103
|
29.1
|
29.1
|
62.15
|
Not Somewhat does describe
|
|
51
|
14.41
|
14.41
|
76.55
|
Does not describe
|
|
40
|
11.3
|
11.3
|
87.85
|
Does not describe at all
|
|
43
|
12.15
|
12.15
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=4.24 · σ=1.60
|
frq( fct_rev(ic_sub_2[,44]), out="v", show.na=TRUE, title= "is good for regular consumption")
is good for regular consumption
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Describes extremely well
|
|
37
|
10.45
|
10.45
|
10.45
|
Does describe well
|
|
60
|
16.95
|
16.95
|
27.4
|
Somewhat does describe well
|
|
100
|
28.25
|
28.25
|
55.65
|
Neither does nor does not describe
|
|
113
|
31.92
|
31.92
|
87.57
|
Not Somewhat does describe
|
|
20
|
5.65
|
5.65
|
93.22
|
Does not describe
|
|
11
|
3.11
|
3.11
|
96.33
|
Does not describe at all
|
|
13
|
3.67
|
3.67
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=3.29 · σ=1.39
|
frq( fct_rev(ic_sub_2[,45]), out="v", show.na=TRUE, title= "is interesting")
is interesting
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Describes extremely well
|
|
13
|
3.67
|
3.67
|
3.67
|
Does describe well
|
|
32
|
9.04
|
9.04
|
12.71
|
Somewhat does describe well
|
|
62
|
17.51
|
17.51
|
30.23
|
Neither does nor does not describe
|
|
174
|
49.15
|
49.15
|
79.38
|
Not Somewhat does describe
|
|
41
|
11.58
|
11.58
|
90.96
|
Does not describe
|
|
16
|
4.52
|
4.52
|
95.48
|
Does not describe at all
|
|
16
|
4.52
|
4.52
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=3.88 · σ=1.25
|
frq( fct_rev(ic_sub_2[,46]), out="v", show.na=TRUE, title= "Tastes better than other brands" )
Tastes better than other brands
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Describes extremely well
|
|
24
|
6.78
|
6.78
|
6.78
|
Does describe well
|
|
51
|
14.41
|
14.41
|
21.19
|
Somewhat does describe well
|
|
85
|
24.01
|
24.01
|
45.2
|
Neither does nor does not describe
|
|
135
|
38.14
|
38.14
|
83.33
|
Not Somewhat does describe
|
|
35
|
9.89
|
9.89
|
93.22
|
Does not describe
|
|
13
|
3.67
|
3.67
|
96.89
|
Does not describe at all
|
|
11
|
3.11
|
3.11
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=3.53 · σ=1.32
|
frq( fct_rev(ic_sub_2[,47]), out="v", show.na=TRUE, title= "Has many flavors")
Has many flavors
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Describes extremely well
|
|
43
|
12.15
|
12.15
|
12.15
|
Does describe well
|
|
70
|
19.77
|
19.77
|
31.92
|
Somewhat does describe well
|
|
98
|
27.68
|
27.68
|
59.6
|
Neither does nor does not describe
|
|
109
|
30.79
|
30.79
|
90.4
|
Not Somewhat does describe
|
|
17
|
4.8
|
4.8
|
95.2
|
Does not describe
|
|
11
|
3.11
|
3.11
|
98.31
|
Does not describe at all
|
|
6
|
1.69
|
1.69
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=3.12 · σ=1.31
|
frq( fct_rev(ic_sub_2[,48]), out="v", show.na=TRUE, title= "Is enjoyable" )
Is enjoyable
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Describes extremely well
|
|
36
|
10.17
|
10.17
|
10.17
|
Does describe well
|
|
82
|
23.16
|
23.16
|
33.33
|
Somewhat does describe well
|
|
102
|
28.81
|
28.81
|
62.15
|
Neither does nor does not describe
|
|
97
|
27.4
|
27.4
|
89.55
|
Not Somewhat does describe
|
|
21
|
5.93
|
5.93
|
95.48
|
Does not describe
|
|
6
|
1.69
|
1.69
|
97.18
|
Does not describe at all
|
|
10
|
2.82
|
2.82
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=3.12 · σ=1.32
|
frq( fct_rev(ic_sub_2[,49]), out="v", show.na=TRUE, title= "Has best value for price" )
Has best value for price
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Describes extremely well
|
|
18
|
5.08
|
5.08
|
5.08
|
Does describe well
|
|
31
|
8.76
|
8.76
|
13.84
|
Somewhat does describe well
|
|
69
|
19.49
|
19.49
|
33.33
|
Neither does nor does not describe
|
|
170
|
48.02
|
48.02
|
81.36
|
Not Somewhat does describe
|
|
41
|
11.58
|
11.58
|
92.94
|
Does not describe
|
|
13
|
3.67
|
3.67
|
96.61
|
Does not describe at all
|
|
12
|
3.39
|
3.39
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=3.77 · σ=1.23
|
frq( fct_rev(ic_sub_2[,50]), out="v", show.na=TRUE, title= "is natural/organic")
is natural/organic
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Describes extremely well
|
|
8
|
2.26
|
2.26
|
2.26
|
Does describe well
|
|
18
|
5.08
|
5.08
|
7.34
|
Somewhat does describe well
|
|
45
|
12.71
|
12.71
|
20.06
|
Neither does nor does not describe
|
|
151
|
42.66
|
42.66
|
62.71
|
Not Somewhat does describe
|
|
42
|
11.86
|
11.86
|
74.58
|
Does not describe
|
|
47
|
13.28
|
13.28
|
87.85
|
Does not describe at all
|
|
43
|
12.15
|
12.15
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=4.45 · σ=1.44
|
frq( fct_rev(ic_sub_2[,51]), out="v", show.na=TRUE, title= "is low calorie")
is low calorie
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Describes extremely well
|
|
6
|
1.69
|
1.69
|
1.69
|
Does describe well
|
|
9
|
2.54
|
2.54
|
4.24
|
Somewhat does describe well
|
|
42
|
11.86
|
11.86
|
16.1
|
Neither does nor does not describe
|
|
153
|
43.22
|
43.22
|
59.32
|
Not Somewhat does describe
|
|
61
|
17.23
|
17.23
|
76.55
|
Does not describe
|
|
45
|
12.71
|
12.71
|
89.27
|
Does not describe at all
|
|
38
|
10.73
|
10.73
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=4.53 · σ=1.32
|
frq( fct_rev(ic_sub_2[,52]), out="v", show.na=TRUE, title= "Great for whole family")
Great for whole family
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Describes extremely well
|
|
44
|
12.43
|
12.43
|
12.43
|
Does describe well
|
|
69
|
19.49
|
19.49
|
31.92
|
Somewhat does describe well
|
|
85
|
24.01
|
24.01
|
55.93
|
Neither does nor does not describe
|
|
117
|
33.05
|
33.05
|
88.98
|
Not Somewhat does describe
|
|
22
|
6.21
|
6.21
|
95.2
|
Does not describe
|
|
8
|
2.26
|
2.26
|
97.46
|
Does not describe at all
|
|
9
|
2.54
|
2.54
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=3.18 · σ=1.36
|
frq( fct_rev(ic_sub_2[,53]), out="v", show.na=TRUE, title= "Great for guests")
Great for guests
val
|
label
|
frq
|
raw.prc
|
valid.prc
|
cum.prc
|
Describes extremely well
|
|
42
|
11.86
|
11.86
|
11.86
|
Does describe well
|
|
75
|
21.19
|
21.19
|
33.05
|
Somewhat does describe well
|
|
100
|
28.25
|
28.25
|
61.3
|
Neither does nor does not describe
|
|
94
|
26.55
|
26.55
|
87.85
|
Not Somewhat does describe
|
|
20
|
5.65
|
5.65
|
93.5
|
Does not describe
|
|
9
|
2.54
|
2.54
|
96.05
|
Does not describe at all
|
|
14
|
3.95
|
3.95
|
100
|
NA
|
NA
|
0
|
0
|
NA
|
NA
|
total N=354 · valid N=354 · x̄=3.16 · σ=1.42
|
Changing the names of new categorical variables. Adding a suffix to imply their categorical format.
q <- c("is_relaxing", "is_wholesome", "is_fun",
"is_exciting","is_premium_quality", "is_memorable", "is_treat", "is_good_for_regular",
"is_interesting", "taste_better_other_brands", "has_many_flavors", "is_enjoyable", "has_best_value/price",
"is_organic", "is_low_cal", "is_great_for_family", "is_great_for_guests")
colnames(ic_sub_2)[37:53] <- gsub("$", "_cat", q)
Recoding the Heard of brand and Purchased brands variables and checking the data.
ic_sub_2$Heard_of_brand <- ifelse(ic_sub_2$Heard_of_brand ==1, 'yes', 'no')
ic_sub_2$Purchased_last6mo <- ifelse(ic_sub_2$Purchased_last6mo ==1, 'yes', 'no')
Saving the final data set
This recoded, imputed and filtered dataset is saved in your working directory by using the code below.
write.csv(ic_sub_2, "./final_data.csv", row.names = FALSE)