Handling Data files

Handeling data files

We should have a place to store results and locate the place from where we can import data. Computer calls it directory.

Importing file(s)

Importing single file from your computer

read.csv(file=“path to your file in computer”,header=T)

You can also store the file by giving an object name (e.g. village) in the following way.

village<-read.csv(file=“path to your file in computer”,header=T)

Alternatively, use file menu

Downloading from the website

download.file(url="https://dataverse.harvard.edu/api/access/datafile/4805891",destfile = "df.csv",cacheOK=TRUE)#Downloading the file in the name of "df.csv"

village<-read.csv("df.csv",header=T,sep="\t",stringsAsFactors = FALSE)#Importing the file into your RStudio environment

Notice options with the ‘download.file’ and ‘read.csv’ functions destfile:location where the downloaded file will be stored from the website cacheOK:Sometimes server stores the web data in cache that is not live.

Based on your need, it can be set ‘true’ or ‘false’ Read the documentation to explore more with relevant examples. stingAsFactors:removes the ‘factor’ mode from the variables. sep:As we saw in the above examples, sep can take semicolon, comma,

Question Why do we set ‘factor’ mode to ‘false’ while downloading files?

Importing multiple files from your computer

Suppose, we want to import multiple files (having “.csv” format) into our RStudio environment. You can try list.files function.

A hypothetical example is as follows.

Lets assume, we want to import multiple village files. The following chunk of command will do.

villages<-list.files(path=“path to your directory where files are stored”,pattern=“*.csv”)

We created a storage called “villages” where all the files having .csv extension will be stored in a list fomrat into our RStudio global environment.

To use them, we need to apply another set of codes

myfiles<-lapply(villages, read.delim)

The above line will ensure all the files are now stored in “myfiles” object and the same can be extracted for further use.

Importing different file format from your computer

There are a few popular packages that help us navigating different file types in R.

Package? What are those?

Packages are developed by advance R users that contain variety of functions to perform specific tasks.

In order to use those functions, we first install the package and then load the library of all the functions of the package.

This is usual way of using a package and associated functions in R.

Lets install a package and call the functions of the package by using.

install.packages(“haven”)#installing library(haven)#loading spss_sample<-read_sav(file=“path to your file”)

To save memory, we also need to uninstall/remove packages. Go to packages tab and uncheck the packages you want to remove. Alternatively, use detach command. See more about a function in search bar of help tab in the bottom right pane of RStudio.

Explore your Data

View(village)#Try using table icon on the right in Environment pane
head(village,n=5L)#Check with top five rows
tail(village,n=3L)#Check with last three rows
#More views
tail(village, c(6L, 2L))
tail(village, c(2L,5L))
#Locating your column, row and variable
income<-village$Income#Income variable
gender<-village$Gender#Gender variable
village[1,1]#First row, first column. 
village[2,3]#second row, third column
village[,1]#All rows, first column
names(village)
colnames(village)
rownames(village)
#Rename your variable
names(village)[names(village)=='Block_Name']<-'Blocks'
rownames(village)[rownames(village)=='1']<-'firstrow'
#sorting
sort(village$Income,decreasing=TRUE)

Notice the [ ] symbol. [ ] tells that the object is a data frame. So, point to remember- we use small brackets ( ) to apply functions and big brackets to [ ] denote a data frame (that we understand as a data file).

Want to try more user friendly English language way instead of using bracket symbols? Install “dplyr” package. The package is like a grammar of data wrangling. The package is highly helpful in variable selection, renaming and filtering data from your data file.

To do so, use the following code.

#install.packages("dplyr")

Unlike other program applications, the functions (select, rename etc.) we want to use will not automatically become straight forward available for our use.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Lets try a few simple commands such as selecting a variable and renaming.

library(dplyr)
#selecting income variable
Income<-select(village, Income)
#renaming Edu_highest variable
rename(village,education="Edu_highest")

You can explore more about the functions and package from the help tab.

You can now go to next section of the turtorial on summarizing your data.

Basics of data summary

You can go back to previous section about handling your data.

Beginning with R

Happy learning!