Hello! This page is created to assist in grasping the fundamentals of R. I have curated some essential concepts that are crucial for comprehending the utilization of R. I hope you enjoy!
Download R: https://cran.r-project.org/
Download R Studio: https://www.rstudio.com/products/rstudio/
R Studio has four main panels:
The source editor
The source editor is where you write most (if not all) of your code. You can have multiple tabs for the source editor for different .R and markdown files
The Console
If you type code here it will be executed immediately and not saved. This is normally only used for short commands.
Mainly, you will use the console to see the output of your code
Workspace Browser
Here you find your “environment” which is how you can see all of your objects
Files/plots/viewer/help
There are many different tabs here. The most important one is the plots tab, this is where any plots you create will be displayed
Additionally, you have the files tab which shows you the files in your current working directory
Basic Functionality
Although R is frequently used to accomplish amazing feats of statistical analysis and data visualization, when starting out it is best to learn to use R as though it were a fancy calculator.
R can do everything that a calculator can do:
Furthermore, it has the ability to place the results of these calculations into an "object", allowing you to save the value for later.
Object Oriented Programming
These objects are extremely important as R is an object oriented programming language. In short, object-oriented programming languages provide a way to store data within objects, by encapsulating data and functions that operate on that data together. This allows for a clear and organized structure for the code, making it easier to understand, maintain, and reuse.
Data Types
There are five fundamental data types in R that you should be aware of:
1.) Logical: TRUE, FALSE
Also denoted T or F logical data types can be used in conditional statements and can be used to filter or index data in a data frame or vector
2.) Double: 12.333, 40.3913, 1.0
In R a double is one of the most common data types you will encounter. It is used to denote any real decimal number, also known as a float.
Similarly, the numeric data type also denotes any number. The integer data type denotes any non-floating point integer.
Mostly, however, R will denote any numeric value as a double
3.) Character: “Hello”, “1”, “FALSE”
The character data type is used to represent text or strings of data and is enclosed in either double or single quotes.
It is important to note that if you place logical or a numeric data type in quotes, it becomes a character
Data Structures
Vectors
A vector is a list of the same type of data type. You can use the c() function to create a vector.
Matrices
A matrix in R is a 2-dimensional data structure that contains elements of the same data type (e.g. numeric, character, logical).
A quick note about notation...
Matrices in R follow the format where the row number is listed first, followed by the column number
Data Frames
A data frame in R is a two-dimensional data structure that is used to store data, with rows representing observations and columns representing variables. It can handle a variety of data types including numeric, character, and logical values, and is one of the most common types of data structures in social science research. A few notes about data frames
They are similar to matrices in that they have rows and columns, but unlike matrices, data frames can have different data types for each column
They can be created from a variety of sources, such as CSV files, spreadsheets, or databases.
Understanding functions (what they are, how to use them, how to make them) is one of the most important aspects of learning R.
What are Functions?!
Functions in R are self-contained blocks of code that perform specific tasks, and are designed to accept inputs and return outputs. We have already used quite a few functions today. Some examples include:
c()
typeof()
as.data.frame()
matrix()
Functions accept an input, and produce an output.
Inputs: Functions accept inputs, also known as arguments, which can be used to control the behavior of the function
Functions return outputs, which can be used in other parts of the code or assigned to variables.
How do you use functions?
To use a function, you have to call it. All this means is that you type in the name of the function followed by parentheses (see sum() example above). Next, you type in the arguments that the function asks for. If you want to get information on a specific function, type help(function name). For example, if I wanted help understanding the sum function, I would type:
This results in a description of the function and it's usage in the files/plots/viewer/help panel of Rstudio
R has many functions, so many that I couldn't list them all here. However, here are some of the more useful ones:
c() - Concatenates elements into a vector.
mean() - Calculates the mean of a vector of numbers.
sum() - Calculates the sum of a vector of numbers.
sort() - Sorts a vector of numbers in ascending or descending order.
unique() - Returns unique values of a vector.
max() - Returns the maximum value of a vector.
min() - Returns the minimum value of a vector.
sqrt() - Returns the square root of a number.
round() - Rounds a number to a specified number of decimal places.
head() - Returns the first n rows of a data frame or matrix.
tail() - Returns the last n rows of a data frame or matrix.
str() - Displays the structure of an object in R.
table() - Creates a contingency table of the counts of unique values.
Writing Functions
Even though there are many many functions in R, sometimes you may want to write your own function. Knowing how to do this is important as 1.) It allows you to understand how R works a little bit better, and 2.) Gives you the ability to create functions for your specific needs.
Even though writing a function sounds rather intimidating, actually doing it is pretty simple. Here is an example:
Most of what you do in R will involve manipulating external data. That is why reading in external files is so important! However, this can also become really confusing without knowing some basics about how files are stored on your computer.
** QUICK INTERLUDE FOR AN EXPLANATION OF DIRECTORIES **
File directories, also known as file folders or directories, are a way to organize and store files and other directories on a computer's file system. A file directory acts like a container, holding files and other directories in a hierarchical structure. At the top of the hierarchy is the root directory, which serves as the starting point for the file system and is represented by a forward slash (/) on Unix-based systems and by a drive letter followed by a backslash (e.g. C:\) on Windows-based systems. From the root directory, you can navigate through the file system to access other directories and files.
** BACK TO R **
When you go to read in an external file in R, the easiest way to do this is to make sure that the file you want is what is known as your "working directory." Working directory simply means the folder/file that your R script is in.
There are many ways to do this, but the most common way is to use the read.csv() function.
To do this, type read.csv("") and put quotes inside the brackets, next with your cursor inside the quotes, use the tab key and you will be able to scroll through all the files in your working directory. Click on the one you want to use.
Next, make sure you assign the output of this function to an object so that you can save it for later
Next time we will start looking at how to gather insight from data we have read into R!
What is an R library? Put simply, a library (sometimes called a package) is a collection of functions, documentation (and sometimes data!) that were written by others to expand the capacity of R. For example, say you write a function that you find really useful and you want to share it with the world. If you create a library on the Comprehensive R Archive Network (CRAN), others can download your function as a library.
Installing libraries
To install a library you simply need to type install.packages() and put the name of the library you are wanting to install in quotes like this: install.packages("dplyr"). Once you have installed the package, you are going to need to load it into your R environment. This lets R know what library it needs to load into your environment. To load a library type library() followed by the name of the library you want to load without quotes like this: library(dplyr)
An "if-else" statement in R allows you to specify a condition, and execute different code based on whether the condition is true or false. Take the following example:
In this example, the value of coffee_price is set to 3, and the if-else statement checks whether it is less than 2, between 2 and 4, or greater than 4. Depending on the outcome, a different message is printed to the console.
Question!
What would the above code do if the coffee price was set to 1? What if it were set to 5?
Exercise!
Can you write an R code that creates a variable representing the weight of coffee beans, and then uses an if-else statement to determine whether the coffee beans are considered light, average, or heavy based on their weight?
What R (are) loops?
In computer programming, a loop is a structure that allows you to repeat a block of code multiple times. Loops are useful when you need to perform a specific task repeatedly, such as processing a list of items or repeating an action until a certain condition is met.
While loops:
A "while" loop is a type of loop that allows you to repeatedly execute a block of code while a certain condition is true. The loop will continue to run until the condition becomes false. For example:
In this example, the variable coffee_cups is set to 0, and the variable coffee_limit is set to 5. The while loop runs as long as coffee_cups is less than coffee_limit. Within the loop, coffee_cups is incremented by 1 each time, and a message is printed to the console indicating that a coffee cup is being filled.
Exercise!
Can you write an R code that creates a variable representing the total weight of coffee beans used to fill a number of coffee cups, and then uses a while loop to add a certain amount of coffee beans to each cup, printing a message each time a cup is filled, until a target total weight of coffee beans has been reached?
What R (are) for loops?
In computer programming, a for loop is a structure that allows you to repeat a block of code for a specific number of times. For loops are useful when you have a collection of items that you want to process one at a time, or if you want to run the same operation over and over again on some data.
In this example, the variable coffee_cups is a vector of five strings representing the names of coffee cups. The for loop runs through each element in the coffee_cups vector, assigning each element in turn to the variable cup. Within the loop, a message is printed to the console indicating that a coffee cup is being filled.
A note on syntax...
What is happening when you write for (cup in coffee_cups) is not immediately obvious. The above code is looking at each element in the coffee_cups vector individually. The object cup that we have created in the loop is a temporary element that holds the data for each item in the list. So the first time the loop runs cup will represent "cup 1" the second time in runs cup will represent "cup 2" and so on. The fact that we named this holder object cup does not actually matter, we could have named it anything we like (such as i, or COFFEE) and the loop would run just the same.
Here is another, slightly more complicated example:
In this example, there are two vectors, coffee_cups and coffee_weights. The for loop runs through each index of the vectors, from 1 to the length of the coffee_cups vector. At each iteration, the corresponding elements of the two vectors are assigned to the variables cup and weight, respectively. Within the loop, an if-else statement is used to categorize the coffee weight for each cup. If the weight is less than 100g, a message indicating that the coffee weight is light is printed. If the weight is greater than or equal to 100g and less than 120g, a message indicating that the coffee weight is average is printed. Otherwise, a message indicating that the coffee weight is heavy is printed.
Breaking out of a loop!
If you want to stop your loop, simply include a break statement. A break statement is a control flow construct in programming that allows you to prematurely exit a loop. When a break statement is executed, the loop terminates and the program continues with the next statement after the loop. For example:
In this example, we have a while loop that runs while the number of coffee beans in the jar is less than the number of coffee beans needed to make a cup of coffee. The loop increments the number of coffee beans in the jar by 50 each time it runs. However, if the number of coffee beans in the jar ever becomes equal to or greater than the number of coffee beans needed, the break statement is executed and the loop terminates.
Exercise!
Can you write an R code that calculates the minimum number of people that need to visit a cafe for it to make a profit. Assume that each person orders a cup of coffee for $5 and the cafe's overhead costs are $100. The program should use a for loop to increment the number of customers from 1 to 100 and determine the profit or loss for each iteration. When the profit is greater than or equal to $0, the loop should print the number of customers and stop.
Most of what I have put here is a compilation of what I have learned from excellent teachers and online searches. If you want to find more R guides here are some that I use frequently.