# R Introduction

### by "Blag" - Senior Developer Evangelist ## What is R?

• R is an Open Source programming language used to create statistics graphics and manipulate data.
• R is based on the "S" programming language.
• R can be used on Windows, Linux and Mac OS Systems.
• R has a great and vibrant community with thousands of libraries read to be installed and used.

## Who uses R?

• Facebook - For behaviour analysis related to status updates and profile pictures.
• Twitter - For data visualization and semantic clustering
• Microsoft - Acquired Revolution R company and use it for a variety of purposes.
• Uber - For statistical analysis.
• Airbnb - Scale data science.
• IBM - Joined R Consortium Group

• Vectors
• Sequences
• Factors
• Matrices
• Arrays
• Data.Frames
• Time/Dates
• Lists
• ## Vectors

### Most common and primitive type. They must consist of elements of the same type

```				```
my_vector<-c(1,2,3,4,5)

my_vector

##  1 2 3 4 5
``````

### Generate a sequence of numbers

```				```
my_sequence<-seq(1, 10)

my_sequence

##  1 2 3 4 5 6 7 8 9 10

my_sequence<-seq(1, 10, by = 2)

my_sequence

##  1 3 5 7 9
``````

## Factors

### Factors help us create levels for vectors

```				```
letters<-c("a", "b", "c", "b", "b", "b", "a")

letters

##  "a" "b" "c" "b" "b" "b" "a"

fact<-factor(letters)

fact

##  a b c b b b a

## Levels: a b c
``````

## Matrices

### Matrices are vectors with more than one dimension

```				```
var<-(1:4)

dim(var)<-c(2, 2)

var

##         [ ,1]    [ ,2]

## [1, ]       1        3

## [2, ]       2        4
``````

### Another syntax, which is actually shorter

```				```
var<-matrix(1:4, 2, 2)

var

##         [ ,1]    [ ,2]

## [1, ]       1        3

## [2, ]       2        4
``````

## More on Matrices

### We can also create matrices using vectors. CBIND for columns and RBIND for rows

```				```
var<-cbind(c(1, 2, 3), c(4, 5, 6))

var

##         [ ,1]    [ ,2]

## [1, ]       1        4

## [2, ]       2        5

## [3, ]       3        6
``````

```				```
var<-rbind(c(1, 2, 3), c(4, 5, 6))

var

##         [ ,1]    [ ,2]    [, 3]

## [1, ]       1        2        3

## [2, ]       4        5        6
``````

## Arrays

### Arrays are like vectors and matrices...same thing with different syntax

```				```
var<-array(1:4, dim=c(2, 2))

var

##         [ ,1]    [ ,2]

## [1, ]       1        3

## [2, ]       2        4
``````

## Data.Frames

### Data.Frames are like matrices, allowing us to combine different types. It's the most used type in R. We can add names to colums. If you have ever used ABAP...they're your Internal Tables -;)

```				```
names<-c("Mr. A", "Mr. B", "Mr. C")

ages<-c(37, 32, 34)

people<-data.frame(names = names, age = ages)

people

##     name   age

## 1  Mr. A    37

## 2  Mr. B    32

## 3  Mr, C    34
``````

## Time/Dates

### Time/Dates allows us to create...yep...Time and Dates variable types

```				```
Year_2015<-ts(data=c(1:12), start=c(2015), freq = 12))

Year_2015

##       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec

## 2015    1    2    3    4    5    6    7    8    9   10   11   12

MyBirthday<-c("22/11/1977")

MyBirthday<-as.data(MyBirthday, "%d/%m/%Y")

MyBirthday<-format(MyBirthday, format = "%B %d % Y")

MyBirthday

##  "November 22, 1977"
``````

## Lists

### Lists are basically hashed arrays, where we can have a key and a value

```				```
Person<-list(name="Blag", age=39,

title="Developer Evangelist")

Person

## \$name

##  "Blag"

##

## \$age

##  37

##

## \$title

##  "Developer Evangelist"
``````

## Functions

### Every command in R is a function, so we can create our own functions as well

```				```

result <- num1 + num2

return(result)

}

##  25
``````

## Working with files

### ```				```

Text_File

##      Name  Language   Number_of_Projects

## 1    Blag         R                   10

## 2   Rocky     Rails                   30

## 3  Juergen     Ruby                    5

Text_File\$Language

##  R  Rails  Ruby

## Levels: R Rails Ruby

Text_Files\$Number_of_Projects

##  10  30  5
``````

## Data Manipulation

### We can modify the value of our variables by using for example subscripts

```				```
vector<-c("R", "ABAP", "C++", "Python")

vector

##  "ABAP"

vector[1:3]

##  "R"   "ABAP"   "C++"

vector

##  "R"   "Haskell"   "C++"   "Python"
``````

### We can also use some interesting functions

```				```
vector<-c(1, 2, 3, 4, 5)

length(vector)

##  5

sum(vector)

##  15

prod(vector)

##  120
``````
```				```
max(vector)

##  5

min(vector)

##  1

mean(vector)

##  3
``````

### Of course...we have more...

```				```
vector<-c(4, 5, 1, 3, 2)

sort(vector)

##  1 2 3 4 5

rev(sort(vector))

##  5 4 3 2 1

vector<-c(4, 4, 5, 5, 1, 3, 2, 1)

duplicated(vector)

##  FALSE TRUE FALSE FALSE FALSE FALSE TRUE
``````
```				```
unique(vector)

##  4 5 1 3 2

diff(vector)

##  0 1 0 -4 -2 -1 -1
``````

## Getting all together

```				```
vector<-c(1, 2, 3, 4, 5)

summary(vector)

## Min.   1st Qu.   Median   Mean   3rd. Qu   Max.

##    1         2        3      3         4      5

var(vector) #Variance

##  2.5

sd(vector) #Standard Deviation

#  1.581139
``````

## Aggregation on Data Frames

### Data.Frames allow us to use Aggregates, which are really nice...

```				```
airlines<-c("AA", "AA", "CA", "CA")

flights<-c(123, 50, 250, 180)

planes<-data.frame(Airlines=airlines, Flights=flights)

Planes_Sum<-aggregate(Flights~Airlines, data=planes,

FUN=sum)

Planes_Sum

##     Airlines   Flights

## 1         AA       173

## 2         AC       430
``````

## Fun with Statistics

### R is mainly used for statistics...so let's see a couple of examples...

```				```
people<-c("Mr. A", "Mr. B", "Mrs. C", "Mr. D")

salary<-c(12000, 10000, 15000, 8000)

df_people<-data.frame(People=people, Salary=salary)

n<-nrow(df_people) #Number of rows

df<-((n - 1) / n) #Degrees of freedom

pvar<-df * var(df_people\$Salary) #Variance

psd<-round(sqrt(pvar) #Standard Deviation

pmean<-mean(df_people\$Salary) #Mean

plot(df_people)

box()

abline(h=pmean, col="green")

abline(h=pmean + psd, col="blue")

abline(h=pmean - psd, col="blue")
``````

### ```				```
people<-c("Mr. A", "Mr. B", "Mrs. C", "Mr. D")

passed<-c(60, 65, 90, 20)

failed<-c(40, 35, 10, 80)

data<-data.frame(People=people, Passed=passed,

Failed=failed)

boxplot(data\$Passed, data\$Failed,

horizontal = T,

names=c("Passed", "Failed"),

col=c("turquoise", "tomato"),

xlab="Testing", main="Testing Passed

and Failed")
``````

### ``` ``` Min. 1st Qu. Median Mean 3rd Qu. Max. #Failed 10.00 28.75 37.50 41.25 50.00 80.00 Min. 1st Qu. Median Mean 3rd Qu. Max. #Passed 20.00 50.00 62.50 58.75 71.25 90.00 ``````

```				```
people<-c("Mr. A", "Mr. B", "Mrs. C", "Mr. D",

"Mrs. E")

salary<-c(12000, 10000, 15000, 8000)

years<-c(10, 8, 15, 11)

df_people<-data.frame(Salary=salary, Years=years)

res<-lm(Salary~Years, data=df_people)  #Linear Model

newdata = data.frame(Salary=0, Years=13)

pred_salary<-predict(res, newdata, interval=predict)

newdata["Salary"]<-pred_salary[]

df_people<-rbind(df_people, newdata)

plot(df_people)

text(df_people\$Salary, df_people\$Years,

labels=people, cex =0.9, pos=3)

points(newdata, col=red, pch=19)
``````

### ``` ``` df_people\$Salary ##  12634.62 ``````

```				```
#install.packages("ggplot2")

library("ggplot2")

people<-c("Mr. A", "Mr. B", "Mrs. C", "Mr. D",

"Mrs. E")

salary<-c(12000, 10000, 15000, 8000, 12634.62)

years<-c(10, 8, 15, 11, 13)

df_people<-data.frame(People=people, Salary=salary,

Years=years)

ggplot(df_people,aes(x=Salary,y=Years,fill=People)) +

geom_bar(position="dodge",stat="identity")
``````

### ```				```
#install.packages("forecast")

library("forecast")

units<-c(1200,2000,1500,2500,5000,1560,1234,5123,4000,

2000,1100,2300,2300,4000,3245,1000,3020,1260,

2300,1300,1400,1000,4000,1280,2000,1200,5000,

2340,1900)

result_ts<-ts(units,frequency=12,start=c(2013,1))

fit <- nnetar(result_ts)

fcast <- forecast(fit,h=7)

plot(fcast)
``````

### ```				```
#install.libraries("shiny", "plotrix")

library("shiny")

library("plotrix")

runApp(list(

ui = bootstrapPage(

pageWithSidebar(

headerPanel("R on the Web with Shiny"),

sidebarPanel(sliderInput("n","Salary:",min=1000,

max=15000,value=12600)),

mainPanel(plotOutput('plot', width="100%",

height="600px"))

)),
``````
```				```
server = function(input, output) {

output\$plot <- renderPlot({

#input\$n

people<-c("Mr. A","Mr. B","Mrs. C","Mr. D",

"Mrs. E")

salary<-c(12000,10000,15000,8000,input\$n)

merged<-data.frame(People=people,Salary=salary)

salary_sum<-sum(merged\$Salary)

merged\$Percentage<-mapply(function(x)

floor(x*100/salary_sum),

merged\$Salary)
``````
```				```
labels<-paste(merged\$People," ",

merged\$Percentage,"%",sep="")

pie3D(merged\$Salary,labels=labels)

})

}

))
``````