Tweet Data Summary

Summarizing your Data

Lets ensure we have our village data file. Load the file from your computer.

For easier reproducibility, I am again downloading the file from the website.

download.file(url="",destfile = "df.csv",cacheOK=TRUE)#Downloading the file in the name of "df.csv"

village<-read.csv("df.csv",header=T,sep="\t",stringsAsFactors = FALSE)#Importing the file into your RStudio environment

Lets use some basic statistical summary operations.

## [1] 7901.724
## [1] 6300
#Not readable? Lets change it into a dataframe
freq_table<-data.frame(Freq)#notice data.frame function
names(freq_table)[names(freq_table) == 'Var1'] <- 'Gender'#Recall renaming
#Lets see two variable table
table(village$Land_own, village$Gender)
##                   Female Male
##   Marginal farmer     15   44
##   Small farmer         3   25
freq_table2<-table(village$Land_own, village$Gender)
names(freq_table2)[names(freq_table2) == 'Var1'] <- 'Land ownership'
names(freq_table2)[names(freq_table2) == 'Var2']<- 'Gender'
#Using Summary
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3500    6300    7902    9900   55000
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   30.00   41.00   55.00   53.09   62.00   85.00
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   4.000   6.000   6.241   7.000  21.000
#Try summary with qualitative variable
##    Length     Class      Mode 
##        87 character character
#not expected result! Takeaway: summary works for quantitative variables.

Visualizing your results

plot(HH_size, Income)

#change data point type
plot(HH_size, Income, type="l")#does not make sense? try type="p"

#Add color
plot(HH_size, Income, type="p", col="blue")

#customize scale
plot(HH_size, Income, type="p", col="blue", log="y")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 1 y value <= 0 omitted from
## logarithmic plot

Explore more about “plot” in help search bar.

Lets try a bar plot with categorical variables

barplot(table(village$Land_own, village$Gender))#by default it is stacked

#change the appearance
barplot(table(village$Land_own, village$Gender),beside=T)

#add colour 
barplot(table(village$Land_own, village$Gender),beside=T,col=c("green","blue"))

#Adding labels
barplot(table(village$Land_own, village$Gender),beside=T,col=c("green","blue"),      ylab="Land ownership", xlab="Gender")

#Adding legend
barplot(table(village$Land_own, village$Gender),beside=T,col=c("green","blue"),      ylab="Land ownership", xlab="Gender",legend.text = T,args.legend =list(x="bottomright"))

#Add title
barplot(table(village$Land_own, village$Gender),beside=T,col=c("green","blue"),     ylab="Land ownership", xlab="Gender",legend.text = T,args.legend =list(x="bottomright"),main="Land Ownership")

You can do more visualization and customization with the package “ggplot2”. It is an improved version of “ggplot” package.

Happy learning!