R Introduction

by "Blag" - Senior Developer Evangelist

Return to Geeky Thursday

What is R?


  • R is an Open Source programming language used to create statistics graphics and manipulate data.
  • R is based on the "S" programming language.
  • R can be used on Windows, Linux and Mac OS Systems.
  • R has a great and vibrant community with thousands of libraries read to be installed and used.

Who uses R?


  • Facebook - For behaviour analysis related to status updates and profile pictures.
  • Google - For advertising effectiveness and economic forecasting.
  • Twitter - For data visualization and semantic clustering
  • Microsoft - Acquired Revolution R company and use it for a variety of purposes.
  • Uber - For statistical analysis.
  • Airbnb - Scale data science.
  • IBM - Joined R Consortium Group

How to install R?


R can be downloaded from CRAN


While it's not mandatory, is better to install RStudio


RStudio is an awesome R IDE that will help us to work better.

First steps


R can be runned on a terminal or command line interface

Variables

  • Vectors
  • Sequences
  • Factors
  • Matrices
  • Arrays
  • Data.Frames
  • Time/Dates
  • Lists
  • Vectors


    Most common and primitive type. They must consist of elements of the same type

    				
    my_vector<-c(1,2,3,4,5)
    
    
    my_vector
    
    
    ## [1] 1 2 3 4 5 
    				

    Sequences


    Generate a sequence of numbers

    				
    my_sequence<-seq(1, 10)
    
    
    my_sequence
    
    
    ## [1] 1 2 3 4 5 6 7 8 9 10
    
    
    my_sequence<-seq(1, 10, by = 2)
    
    
    my_sequence
    
    
    ## [1] 1 3 5 7 9
    				

    Factors


    Factors help us create levels for vectors

    				
    letters<-c("a", "b", "c", "b", "b", "b", "a")
    
    
    letters
    
    
    ## [1] "a" "b" "c" "b" "b" "b" "a"
    
    
    fact<-factor(letters)
    
    
    
    fact
    
    
    ## [1] a b c b b b a
    
    
    ## Levels: a b c
    				

    Matrices

    Matrices are vectors with more than one dimension

    				
    var<-(1:4)
    
    
    dim(var)<-c(2, 2)
    
    
    var
    
    
    ##         [ ,1]    [ ,2]
     
    ## [1, ]       1        3
    
    ## [2, ]       2        4
    				

    Another syntax, which is actually shorter

    				
    var<-matrix(1:4, 2, 2)
    
    
    var
    
    
    ##         [ ,1]    [ ,2]
     
    ## [1, ]       1        3
    
    ## [2, ]       2        4
    				

    More on Matrices

    We can also create matrices using vectors. CBIND for columns and RBIND for rows

    				
    var<-cbind(c(1, 2, 3), c(4, 5, 6))
    
    
    var
    
    
    ##         [ ,1]    [ ,2]
    
    ## [1, ]       1        4 
    
    ## [2, ]       2        5
    
    ## [3, ]       3        6 
    				

    				
    var<-rbind(c(1, 2, 3), c(4, 5, 6))
    
    
    var
    
    
    ##         [ ,1]    [ ,2]    [, 3]
    
    ## [1, ]       1        2        3 
    
    ## [2, ]       4        5        6
    				

    Arrays

    Arrays are like vectors and matrices...same thing with different syntax

    				
    var<-array(1:4, dim=c(2, 2))
    
    var
    
    
    ##         [ ,1]    [ ,2]
     
    ## [1, ]       1        3
    
    ## [2, ]       2        4
    				

    Data.Frames

    Data.Frames are like matrices, allowing us to combine different types. It's the most used type in R. We can add names to colums. If you have ever used ABAP...they're your Internal Tables -;)

    				
    names<-c("Mr. A", "Mr. B", "Mr. C")
    
    
    ages<-c(37, 32, 34)
    
    
    people<-data.frame(names = names, age = ages)
    
    
    people
    
    
    ##     name   age
    
    ## 1  Mr. A    37
    
    ## 2  Mr. B    32
    
    ## 3  Mr, C    34
    				

    Time/Dates

    Time/Dates allows us to create...yep...Time and Dates variable types

    				
    Year_2015<-ts(data=c(1:12), start=c(2015), freq = 12))
    
    Year_2015
    
    
    ##       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
    
    ## 2015    1    2    3    4    5    6    7    8    9   10   11   12
    
    
    
    MyBirthday<-c("22/11/1977")
    
    MyBirthday<-as.data(MyBirthday, "%d/%m/%Y")
    
    MyBirthday<-format(MyBirthday, format = "%B %d % Y")
    
    MyBirthday
    
    
    ## [1] "November 22, 1977"
    				

    Lists

    Lists are basically hashed arrays, where we can have a key and a value

    				
    Person<-list(name="Blag", age=39, 
    
                 title="Developer Evangelist")
    
    Person
    
    
    ## $name
    
    ## [1] "Blag"
    
    ## 
    
    ## $age
    
    ## [1] 37
    
    ## 
    
    ## $title
    
    ## [1] "Developer Evangelist"
    				

    Functions

    Every command in R is a function, so we can create our own functions as well

    				
    add_numbers<-function(num1, num2) {
    
    	result <- num1 + num2
    
    	return(result)
    
    }
    
    
    add_numbers(10, 15)
    
    
    ## [1] 25
    				

    Working with files

    Text files must be separated by "Tab" and have an extra line at the end...

    We need to use read.table to read a file.

    We can use Read.CSV to read a CSV file.

    				
    Text_File<-read.table("files/Text_File.dat", header=T)
    
    Text_File
    
    
    ##      Name  Language   Number_of_Projects
    
    ## 1    Blag         R                   10
     
    ## 2   Rocky     Rails                   30
     
    ## 3  Juergen     Ruby                    5
    
    
    
    Text_File$Language
    
    
    ## [1] R  Rails  Ruby
    
    ## Levels: R Rails Ruby
    
    
    
    Text_Files$Number_of_Projects
    
    
    ## [1] 10  30  5
    				

    Data Manipulation

    We can modify the value of our variables by using for example subscripts

    				
    vector<-c("R", "ABAP", "C++", "Python")
    
    vector[2]
    
    ## [1] "ABAP"
    
    
    
    vector[1:3]
    
    ## [1] "R"   "ABAP"   "C++"
    
    
    
    vector[2]<-"Haskell"
    
    vector
    
    ## [1] "R"   "Haskell"   "C++"   "Python"
    				

    We can also use some interesting functions

    				
    vector<-c(1, 2, 3, 4, 5)
    
    
    length(vector)
    
    ## [1] 5
    
    
    
    sum(vector)
    
    ## [1] 15
    
    
    
    prod(vector)
    
    
    ## [1] 120
    				
    				
    max(vector)
    
    ## [1] 5
    
    
    
    min(vector)
    
    ## [1] 1
    
    
    
    mean(vector)
    
    
    ## [1] 3
    				

    Of course...we have more...

    				
    vector<-c(4, 5, 1, 3, 2)
    
    
    sort(vector)
    
    ## [1] 1 2 3 4 5
    
    
    
    rev(sort(vector))
    
    ## [1] 5 4 3 2 1
    
    
    
    vector<-c(4, 4, 5, 5, 1, 3, 2, 1)
    
    
    duplicated(vector)
    
    ## [1] FALSE TRUE FALSE FALSE FALSE FALSE TRUE
    				
    				
    unique(vector)
    
    ## [1] 4 5 1 3 2
    
    
    
    diff(vector)
    
    ## [1] 0 1 0 -4 -2 -1 -1
    				

    Getting all together

    				
    vector<-c(1, 2, 3, 4, 5)
    
    summary(vector)
    
    ## Min.   1st Qu.   Median   Mean   3rd. Qu   Max.
    
    ##    1         2        3      3         4      5
    
    
    
    var(vector) #Variance
    
    ## [1] 2.5
    
    
    
    sd(vector) #Standard Deviation
    
    # [1] 1.581139
    				

    Aggregation on Data Frames

    Data.Frames allow us to use Aggregates, which are really nice...

    				
    airlines<-c("AA", "AA", "CA", "CA")
    
    flights<-c(123, 50, 250, 180)
    
    planes<-data.frame(Airlines=airlines, Flights=flights)
    
    Planes_Sum<-aggregate(Flights~Airlines, data=planes, 
    
                          FUN=sum)
    
    Planes_Sum
    
    
    
    ##     Airlines   Flights
    
    ## 1         AA       173
    
    ## 2         AC       430
    				

    Fun with Statistics


    R is mainly used for statistics...so let's see a couple of examples...


    				
    people<-c("Mr. A", "Mr. B", "Mrs. C", "Mr. D")
    
    salary<-c(12000, 10000, 15000, 8000)
    
    df_people<-data.frame(People=people, Salary=salary)
    
    n<-nrow(df_people) #Number of rows
    
    df<-((n - 1) / n) #Degrees of freedom
    
    pvar<-df * var(df_people$Salary) #Variance
    
    psd<-round(sqrt(pvar) #Standard Deviation
    
    pmean<-mean(df_people$Salary) #Mean
    
    plot(df_people)
    
    box()
    
    abline(h=pmean, col="green")
    
    abline(h=pmean + psd, col="blue")
    
    abline(h=pmean - psd, col="blue")
    				

    				
    people<-c("Mr. A", "Mr. B", "Mrs. C", "Mr. D")
    
    passed<-c(60, 65, 90, 20)
    
    failed<-c(40, 35, 10, 80)
    
    data<-data.frame(People=people, Passed=passed, 
    
                     Failed=failed)
    
    
    boxplot(data$Passed, data$Failed,
    
            horizontal = T,
            
            names=c("Passed", "Failed"),
            
            col=c("turquoise", "tomato"),
            
            xlab="Testing", main="Testing Passed 
            
                                  and Failed")
    				

    				
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.      #Failed
    
      10.00   28.75   37.50   41.25   50.00   80.00					
    					
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.      #Passed
      
      20.00   50.00   62.50   58.75   71.25   90.00
                    

    				
    people<-c("Mr. A", "Mr. B", "Mrs. C", "Mr. D", 
    
              "Mrs. E")
    
    salary<-c(12000, 10000, 15000, 8000)
    
    years<-c(10, 8, 15, 11)
    
    df_people<-data.frame(Salary=salary, Years=years)
    
    res<-lm(Salary~Years, data=df_people)  #Linear Model
    
    newdata = data.frame(Salary=0, Years=13)
    
    pred_salary<-predict(res, newdata, interval=predict)
    
    newdata["Salary"]<-pred_salary[[1]][1]
    
    df_people<-rbind(df_people, newdata)
    
    plot(df_people)
    
    text(df_people$Salary, df_people$Years, 
    
         labels=people, cex =0.9, pos=3)
         
    points(newdata, col=red, pch=19)
    				

    				
    df_people$Salary[5]
    
    
    ## [1] 12634.62
                    

    				
    #install.packages("ggplot2")
    
    library("ggplot2")
    
    people<-c("Mr. A", "Mr. B", "Mrs. C", "Mr. D", 
    
              "Mrs. E")
              
    salary<-c(12000, 10000, 15000, 8000, 12634.62)
    
    years<-c(10, 8, 15, 11, 13)
    
    df_people<-data.frame(People=people, Salary=salary, 
    
                          Years=years)
    
    ggplot(df_people,aes(x=Salary,y=Years,fill=People)) +
      
           geom_bar(position="dodge",stat="identity")
    				

    				
    #install.packages("forecast")
    
    library("forecast")
    
    units<-c(1200,2000,1500,2500,5000,1560,1234,5123,4000,
    
             2000,1100,2300,2300,4000,3245,1000,3020,1260,
             
             2300,1300,1400,1000,4000,1280,2000,1200,5000,
             
             2340,1900)
    
    
    result_ts<-ts(units,frequency=12,start=c(2013,1))
    
    fit <- nnetar(result_ts)
    
    fcast <- forecast(fit,h=7)
    
    plot(fcast)
    				

    				
    #install.libraries("shiny", "plotrix")
    
    library("shiny")
    
    library("plotrix")
    
    
    runApp(list(
    
      ui = bootstrapPage(
    
        pageWithSidebar(
    
          headerPanel("R on the Web with Shiny"),
    
          sidebarPanel(sliderInput("n","Salary:",min=1000,
          
                                   max=15000,value=12600)),
    
          mainPanel(plotOutput('plot', width="100%", 
          
                               height="600px"))
    
        )),
    				
    				
      server = function(input, output) {
    
        output$plot <- renderPlot({
    
          #input$n
    
          people<-c("Mr. A","Mr. B","Mrs. C","Mr. D",
          
                    "Mrs. E")
    
          salary<-c(12000,10000,15000,8000,input$n)
    
          merged<-data.frame(People=people,Salary=salary)
    
          salary_sum<-sum(merged$Salary)
    
          merged$Percentage<-mapply(function(x) 
          
                             floor(x*100/salary_sum),
                             
                             merged$Salary)
    				
    				
          labels<-paste(merged$People," ",
          
                        merged$Percentage,"%",sep="")
    
          pie3D(merged$Salary,labels=labels)
    
        }) 
    
      }
    
    ))
    				

    That's it for now


    R is a very powerful language


    R can be used for Machine Learning as it has many awesome libraries


    Go ahead and learn more R!...


    Contact Information


    Blag --> blag@blagarts.com

    @Blag on Twitter

    Go back home...