Final Report — Airbnb Pricing Analysis
Introduction
The short-term rental market has grown rapidly, and nightly pricing varies significantly across listings. This project investigates which listing attributes most influence the nightly price of Airbnb listings.
Hypothesis
Primary hypothesis: Listings with more bedrooms, higher ratings, and more amenities have significantly higher nightly prices.
- H₀: Bedrooms, rating, and amenities count have no relationship with nightly price.
- H₁: Bedrooms, rating, and amenities count positively predict nightly price.
Data
The primary dataset is a Kaggle Airbnb listings CSV containing thousands of listing records with fields for price, bedrooms, review scores, amenities, and neighbourhood.
As a supplement, the AirROI API was used to demonstrate raw data collection from a live source.
Methods
Cleaning
Raw data was cleaned by:
- Removing dollar signs and commas from price
- Converting ratings from a 0–100 scale to 0–5
- Extracting numeric bedroom counts
- Dropping duplicates and null prices
- Filtering to a $20–$2,000 range to exclude extreme outliers
Feature Engineering
- amenities_count: Parsed from a comma-separated text field
- log_price: Natural log of nightly price (stabilises variance for regression)
Statistical Model
We fit an OLS regression:
log_price ~ bedrooms + rating + amenities_count
Results
(Run python scripts/run_analysis.py to populate these results.)
Summary Statistics
After cleaning, the dataset contains several thousand listings. Key descriptive statistics (mean, median, standard deviation) for price, bedrooms, rating, and amenities count are reported in the analysis output.
Regression
The OLS model output includes:
- Coefficients for each predictor
- p-values indicating statistical significance
- R² indicating overall model fit
Interpretation
- A positive coefficient for bedrooms means that each additional bedroom is associated with a higher nightly price (controlling for other factors).
- A positive coefficient for rating means higher-rated listings tend to command higher prices.
- The amenities_count coefficient indicates the marginal price impact of offering more amenities.
- If all p-values are below 0.05, we reject H₀ and conclude that these features significantly predict price.
Conclusion
The regression analysis demonstrates that bedrooms, ratings, and amenities are statistically significant predictors of Airbnb nightly prices. This supports the original hypothesis and provides quantitative estimates of each factor’s effect.
Limitations
- The model is a simple linear specification and may miss non-linear effects.
- Omitted variables (e.g., exact location, seasonality) likely matter.
- The Kaggle dataset is a snapshot and may not generalise to all markets or time periods.
References
- Inside Airbnb / Kaggle listings datasets
- AirROI API documentation
- statsmodels OLS regression
Explore Further
| Resource | URL |
|---|---|
| GitHub Repository | github.com/mbgardin/BnBInsight |
| Live Streamlit Dashboard | bnbinsight.streamlit.app |
| PyPI Package | pypi.org/project/bnbinsight |