Visualizing Likert Responses Using R and ggplot
In my role as a grader for a particular class, I was supposed to take a survey. For each question, there were 6 options viz. “NA”, “Strongly Disagree”, “Disagree”, “Neutral”, “Agree”, “Strongly Agree”. The best way to present this kind of survey data in one single plot is to have horizontal bar graphs with each bar representing a question. It helps to realize and understand what the overall response was for the survey.
Below is a detailed description with code snippets as to how to produce such a chart.
I am using ggplot and sqldf for visualizing and aggregation respectively. The reports were captured in separate csv files(one for each respondent) and had to be combined into a single file.
The file looks something as below; all the questions are stacked up below and for each category, the report contains the total number of responses in that category.
library(ggplot2)
library(dplyr)
library(sqldf)#enlist all the files in the working directory
temp = list.files(pattern="*.csv")
#read all the files from temp and rbind them to form a final dataframe
survey <- do.call(rbind,lapply(temp,read.csv))#keeping only relevant columns
survey <- survey[,c("Answer","Answer.Match","X..Responses")]
colnames(survey) <- c("question","category","responses")
The list.files command added all the files in my working directory to the variable temp and since I was only concerned with the entire data, I simply pasted all the tables below each other using the rbind function.
Each question had 6 categories and hence I wanted question wise and category wise percentage for the data.
#aggregating for all the responses into a df that contains only the question wise category wise total for all the students
agg_table <- sqldf::sqldf("select question, category, SUM(responses) as total from survey group by question, category")#question wise sum and percentage calculation for each category
summarized_table <- agg_table %>%
group_by(question) %>%
mutate(countT= sum(total)) %>%
group_by(category, add=TRUE) %>%
mutate(per=round(100*total/countT,2))#we have to reorder the factors to have a color palette that makes intuitive sensesummarized_table$category <- relevel(summarized_table$category,"N/A")
summarized_table$category <- relevel(summarized_table$category,"Strongly Disagree")
summarized_table$category <- relevel(summarized_table$category,"Disagree")
summarized_table$category <- relevel(summarized_table$category,"Neutral")
summarized_table$category <- relevel(summarized_table$category,"Agree")
summarized_table$category <- relevel(summarized_table$category,"Strongly Agree")
The resultant summarized table looks something as below:
Now that we have the data in a required format, we allow ggplot to work its magic. Using a horizontal bar chart for every question in the survey, we build a normal bar graph and fill it with the categories NA, Disagree, Agree etc. and then flip its coordinates.
#define the colors on the scalemyColors <- c("darkgreen","green","orange","red","darkred","black")#actual plot creation
ggplot(data = summarized_table, aes(x =question , y = per, fill = category))
+geom_bar(stat="identity", width = 0.7)
+scale_fill_manual (values=myColors)
+coord_flip()
+ ylab("Percentage")
+ xlab("Question")
+theme(axis.text=element_text(size=12),
axis.title=element_text(size=14,face="bold"))
+ggtitle("Project Survey Results")
+ theme(plot.title = element_text(size = 20, face = "bold",hjust = 0.5))
The output looks as below which is easy to read and understand.
This seems a nice way to infer the data from the survey!