How to Perform Data Analysis with R Studio

R Studio is a powerful statistical computing and data analysis tool widely used in academia and industry for statistical modeling, visualization, and machine learning. Below is a step-by-step guide to performing data analysis using R Studio.


📌 Step 1: Install & Set Up R Studio

1. Install R and R Studio

  • Download R from CRAN.
  • Download R Studio from RStudio.
  • Install both and open R Studio.

2. Install Necessary Packages

Packages add functionality to R Studio. You can install them using:

r
install.packages("tidyverse") # Includes dplyr, ggplot2, and more
install.packages("readr") # For importing data
install.packages("ggplot2") # For data visualization
install.packages("caret") # For machine learning models
install.packages("psych") # For descriptive statistics

✅ Tip: Load a package before use with:

r
library(tidyverse)

📌 Step 2: Import Data into R Studio

1. Load a Built-in Dataset

r
data(mtcars) # Loads the mtcars dataset
head(mtcars) # Displays the first 6 rows

2. Import a CSV File

r
data <- read.csv("C:/Users/YourName/Documents/data.csv", header = TRUE)

3. Import an Excel File

First, install the necessary package:

r
install.packages("readxl")
library(readxl)

Then, load the file:

r
data <- read_excel("C:/Users/YourName/Documents/data.xlsx", sheet = 1)

4. Import an SPSS (.sav) File

r
install.packages("haven")
library(haven)
data <- read_sav("C:/Users/YourName/Documents/data.sav")

📌 Step 3: Exploring & Cleaning Data

1. View Dataset Structure

r
str(data) # Shows data structure
summary(data) # Summary statistics
head(data) # First few rows
dim(data) # Dimensions (rows and columns)

2. Rename Columns

r
colnames(data) <- c("ID", "Age", "Score", "Gender")

3. Remove Missing Values

r
data <- na.omit(data) # Removes all rows with NA values

4. Create New Variables

r
data$NewVariable <- data$Score * 2

📌 Step 4: Perform Descriptive Statistics

r
mean(data$Age) # Mean of Age
median(data$Age) # Median of Age
sd(data$Age) # Standard deviation
summary(data) # Quick overview of dataset

For categorical data (e.g., Gender distribution):

r
table(data$Gender)

📌 Step 5: Perform Statistical Tests

1. Correlation Analysis

r
cor(data$Age, data$Score)

2. T-Test (Compare Two Groups)

r
t.test(Score ~ Gender, data = data)

3. ANOVA (Compare More Than Two Groups)

r
anova_model <- aov(Score ~ Gender, data = data)
summary(anova_model)

4. Chi-Square Test (For Categorical Data)

r
chisq.test(table(data$Gender, data$Category))

📌 Step 6: Regression Analysis

1. Simple Linear Regression

r
model <- lm(Score ~ Age, data = data)
summary(model)

2. Multiple Linear Regression

r
model <- lm(Score ~ Age + Gender, data = data)
summary(model)

3. Logistic Regression (For Binary Outcomes)

r
log_model <- glm(Gender ~ Age + Score, data = data, family = binomial)
summary(log_model)

📌 Step 7: Data Visualization with ggplot2

1. Histogram

r
library(ggplot2)
ggplot(data, aes(x = Score)) +
geom_histogram(binwidth = 5, fill = "blue", color = "black")

2. Scatter Plot

r
ggplot(data, aes(x = Age, y = Score)) +
geom_point() +
geom_smooth(method = "lm", color = "red")

3. Boxplot

r
ggplot(data, aes(x = Gender, y = Score, fill = Gender)) +
geom_boxplot()

📌 Step 8: Machine Learning (Basic Example with caret Package)

1. Split Data into Training & Testing Sets

r
install.packages("caret")
library(caret)
set.seed(123) # Set random seed for reproducibility
trainIndex <- createDataPartition(data$Score, p = 0.7, list = FALSE)
trainData <- data[trainIndex, ]
testData <- data[-trainIndex, ]

2. Train a Decision Tree Model

r
install.packages("rpart")
library(rpart)
model <- rpart(Score ~ Age + Gender, data = trainData, method = "anova")

3. Make Predictions

r
predictions <- predict(model, testData)

4. Evaluate Model Performance

r
mean((predictions - testData$Score)^2) # Mean Squared Error (MSE)

📌 Step 9: Exporting Results

1. Save Processed Data to CSV

r
write.csv(data, "C:/Users/YourName/Documents/cleaned_data.csv", row.names = FALSE)

2. Save a Model for Future Use

r
saveRDS(model, "model.rds")

To load the model later:

r
loaded_model <- readRDS("model.rds")

📌 Summary: R Studio Data Analysis Workflow

✅ Import Data → ✅ Explore & Clean Data → ✅ Perform Statistical Tests → ✅ Run Regression Models → ✅ Visualize Data → ✅ Export Results

🚀 Need help with R Studio? Our tutors at StatisticsProjectHelper.com provide expert guidance in R programming, statistical modeling, and machine learning. Contact us today!