We should have a place to store results and locate the place from where we can import data. Computer calls it directory.
read.csv(file=“path to your file in computer”,header=T)
You can also store the file by giving an object name (e.g. village) in the following way.
village<-read.csv(file=“path to your file in computer”,header=T)
Alternatively, use file menu
download.file(url="https://dataverse.harvard.edu/api/access/datafile/4805891",destfile = "df.csv",cacheOK=TRUE)#Downloading the file in the name of "df.csv" village<-read.csv("df.csv",header=T,sep="\t",stringsAsFactors = FALSE)#Importing the file into your RStudio environment
Notice options with the ‘download.file’ and ‘read.csv’ functions destfile:location where the downloaded file will be stored from the website cacheOK:Sometimes server stores the web data in cache that is not live.
Based on your need, it can be set ‘true’ or ‘false’ Read the documentation to explore more with relevant examples. stingAsFactors:removes the ‘factor’ mode from the variables. sep:As we saw in the above examples, sep can take semicolon, comma,
Question Why do we set ‘factor’ mode to ‘false’ while downloading files?
Suppose, we want to import multiple files (having “.csv” format) into our RStudio environment. You can try list.files function.
A hypothetical example is as follows.
Lets assume, we want to import multiple village files. The following chunk of command will do.
villages<-list.files(path=“path to your directory where files are stored”,pattern=“*.csv”)
We created a storage called “villages” where all the files having .csv extension will be stored in a list fomrat into our RStudio global environment.
To use them, we need to apply another set of codes
The above line will ensure all the files are now stored in “myfiles” object and the same can be extracted for further use.
There are a few popular packages that help us navigating different file types in R.
Package? What are those?
Packages are developed by advance R users that contain variety of functions to perform specific tasks.
In order to use those functions, we first install the package and then load the library of all the functions of the package.
This is usual way of using a package and associated functions in R.
Lets install a package and call the functions of the package by using.
install.packages(“haven”)#installing library(haven)#loading spss_sample<-read_sav(file=“path to your file”)
To save memory, we also need to uninstall/remove packages. Go to packages tab and uncheck the packages you want to remove. Alternatively, use detach command. See more about a function in search bar of help tab in the bottom right pane of RStudio.
View(village)#Try using table icon on the right in Environment pane head(village,n=5L)#Check with top five rows tail(village,n=3L)#Check with last three rows #More views tail(village, c(6L, 2L)) tail(village, c(2L,5L)) #Locating your column, row and variable income<-village$Income#Income variable gender<-village$Gender#Gender variable village[1,1]#First row, first column. village[2,3]#second row, third column village[,1]#All rows, first column names(village) colnames(village) rownames(village) #Rename your variable names(village)[names(village)=='Block_Name']<-'Blocks' rownames(village)[rownames(village)=='1']<-'firstrow' #sorting sort(village$Income,decreasing=TRUE)
Notice the [ ] symbol. [ ] tells that the object is a data frame. So, point to remember- we use small brackets ( ) to apply functions and big brackets to [ ] denote a data frame (that we understand as a data file).
Want to try more user friendly English language way instead of using bracket symbols? Install “dplyr” package. The package is like a grammar of data wrangling. The package is highly helpful in variable selection, renaming and filtering data from your data file.
To do so, use the following code.
Unlike other program applications, the functions (select, rename etc.) we want to use will not automatically become straight forward available for our use.
## ## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats': ## ## filter, lag
## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union
Lets try a few simple commands such as selecting a variable and renaming.
library(dplyr) #selecting income variable Income<-select(village, Income) #renaming Edu_highest variable rename(village,education="Edu_highest")
You can explore more about the functions and package from the help tab.
You can now go to next section of the turtorial on summarizing your data.
You can go back to previous section about handling your data.