We’ll simulate data to demonstrate the relationship between independent and dependent variables.
1.1 Example: Hours of Study and Test Scores
# Simulate dataset.seed(123)hours_of_study <-c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) # Independent Variabletest_scores <-50+5* hours_of_study +rnorm(10, mean =0, sd =5) # Dependent Variable# Combine into a data framedata <-data.frame(Hours = hours_of_study, Scores = test_scores)# Visualize the relationshiplibrary(ggplot2)ggplot(data, aes(x = Hours, y = Scores)) +geom_point(color ="blue", size =3) +geom_smooth(method ="lm", se =FALSE, color ="red") +labs(title ="Effect of Hours of Study on Test Scores",x ="Hours of Study (Independent Variable)",y ="Test Scores (Dependent Variable)" ) +theme_minimal()
`geom_smooth()` using formula = 'y ~ x'
Interpretation of the Graph
X-axis (Independent Variable): Hours of study.
Y-axis (Dependent Variable): Test scores.
As hours of study increase, test scores generally increase, showing a positive relationship.
1.2 Inverse Relationship Example
An inverse relationship occurs when one variable increases while the other decreases. For instance, the more time spent on social media, the lower the grades a student might achieve.
# Simulate dataset.seed(42)social_media_hours <-seq(1, 10, by =1) # Independent Variablegrades <-100-5* social_media_hours +rnorm(10, mean =0, sd =2) # Dependent Variable# Combine into a data frameinverse_data <-data.frame(SocialMediaHours = social_media_hours, Grades = grades)# Visualize the inverse relationshiplibrary(ggplot2)ggplot(inverse_data, aes(x = SocialMediaHours, y = Grades)) +geom_point(color ="red", size =3) +geom_smooth(method ="lm", se =FALSE, color ="blue") +labs(title ="Inverse Relationship: Social Media Hours vs. Grades",x ="Social Media Hours (Independent Variable)",y ="Grades (Dependent Variable)" ) +theme_minimal()
`geom_smooth()` using formula = 'y ~ x'
1.3 No-Relation Example
A no-relation scenario means that changes in one variable do not systematically affect the other. For instance, the number of books read by a person and their shoe size typically have no relationship.
# Simulate dataset.seed(123)books_read <-seq(1, 10, by =1) # Independent Variableshoe_size <-rnorm(10, mean =9, sd =1) # Random data with no relation to books_read# Combine into a data frameno_relation_data <-data.frame(BooksRead = books_read, ShoeSize = shoe_size)# Visualize the no-relation scenarioggplot(no_relation_data, aes(x = BooksRead, y = ShoeSize)) +geom_point(color ="green", size =3) +labs(title ="No Relationship: Books Read vs. Shoe Size",x ="Books Read (Independent Variable)",y ="Shoe Size (Dependent Variable)" ) +theme_minimal()
2 NHST
2.1 One-Sample t-Test
Scenario: A teacher claims that the average test score in a class is 70. You want to test if the actual average is different from 70.
: The mean test score is 70 ( = 70).
: The mean test score is not 70 ( 70).
# Simulated data: Test scores of studentsset.seed(123)test_scores <-rnorm(30, mean =72, sd =5)# Perform a one-sample t-testt_test_result <-t.test(test_scores, mu =70)# Print the resultprint(t_test_result)
One Sample t-test
data: test_scores
t = 1.9703, df = 29, p-value = 0.05842
alternative hypothesis: true mean is not equal to 70
95 percent confidence interval:
69.93287 73.59610
sample estimates:
mean of x
71.76448
Interpretation:
If the p-value is less than 0.05, reject . This means the average test score is significantly different from 70.
2.2 Two-Sample t-Test
Scenario: Compare the average test scores of two groups: one that received tutoring and one that did not.
: The mean test scores the two groups are equal ().
: The mean test scores of the two groups are not equal ().
# Simulated dataset.seed(456)group_A <-rnorm(30, mean =75, sd =5) # Tutored groupgroup_B <-rnorm(30, mean =70, sd =5) # Non-tutored group# Perform a two-sample t-testtwo_sample_test <-t.test(group_A, group_B, alternative ="two.sided")# Print the resultprint(two_sample_test)
Welch Two Sample t-test
data: group_A and group_B
t = 4.1181, df = 53.018, p-value = 0.0001344
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
2.758615 7.997380
sample estimates:
mean of x mean of y
76.15872 70.78072
Interpretation:
If the p-value is less than 0.05, reject . This indicates a significant difference between the two groups.
2.3 Chi-Square Test
Scenario: Test if there is an association between gender and preference for two types of beverages.
: Gender and beverage preference are independent.
: Gender and beverage preference are not independent.