Project Assignment 2

Data Preparation and Merging

Overview

In this assignment, you will prepare your data for analysis by merging your datasets and creating a clean, rectangular dataset ready for analysis. This is a hands-on assignment where the main deliverable is your cleaned, merged dataset along with brief documentation of your process.

Building on Project Assignment 1, you will now implement your data merging plan and prepare the specific variables you will use to test your hypotheses.

Main Deliverable: A cleaned, merged dataset saved as a CSV file that is ready for analysis in your final presentation.

Data Merging and Preparation

Step 1: Load and Examine Your Datasets

Start by loading your datasets and examining their structure:

# Load your primary dataset (from class)
# Load your secondary dataset(s)
# Examine structure and key variables

Briefly describe what you found: - Primary dataset: [Name, number of observations, time coverage] - Secondary dataset: [Name, number of observations, time coverage] - Key challenges identified:

Step 2: Merge Your Datasets

Implement your merging strategy. Remember: you must successfully merge at least two datasets, with one being from our class datasets.

# Your merging code here
# Include any data cleaning needed for successful merge

Document your merging process: - Common identifier used: - Type of join performed: - Observations in final merged dataset: - Any data lost in merging:

Step 3: Create Your Analysis Variables

Select and prepare the specific variables you will use to test your hypotheses:

Your final variables (list the exact column names in your dataset):

Dependent variable: - Variable name: - Description: - Source dataset:

Independent variables: - Variable 1 name: - Description: - Source dataset:

  • Variable 2 name:

  • Description:

  • Source dataset:

  • Variable 3 name (if applicable):

  • Description:

  • Source dataset:

Control variables (if any): - Variable name(s): - Description:

Step 4: Data Cleaning

Clean your dataset to prepare it for analysis:

# Handle missing values
# Address any outliers
# Create derived variables if needed
# Filter to your analysis sample

Cleaning decisions made: - How you handled missing values: - Time period of final dataset: - Final sample size: - Any transformations applied:

Final Dataset Validation

Before submitting, validate that your dataset is analysis-ready:

Basic descriptive statistics: Provide summary statistics for your key variables (mean, median, min, max, missing values):

# summary(your_data)
# Or use your preferred summary function

Submission

Please export your cleaned dataset as a CSV file and this completed QMD document, then submit both files via Blackboard by the due date. If your dataset is large, put both files in a zipped folder and submit the zip file. Each group member should submit the files so that we have something in Blackboard to grade.