Skip to main content
  1. Projects/

Predicting Short-Term Rent Prices in London UK

··1280 words·7 mins·

Summary
#

One of the major challenges for letting agencies and private citizens alike is to determine the correct market price for their short-term property listings. Prices that are too low may result in a lost of potential revenue, while prices that are too high may result in too few bookings. Moreover, there are no free services that provide an accurate price estimation for property listings. Using data from Inside Airbnb, a data and advocacy website about Airbnb’s impact on residential communities, and from other public data sources, I’ve built a Streamlit web app to estimate the short-term rental prices for property listings based on their characteristics and location in London UK for December 2024. On average the web app estimates a percentage difference of 32.4% between actual and predicted rent price. If you’re curious, take a look at the web app at the link below.

Short-Term Rental Price Estimator • Streamlit

Data sources
#

Due to the terms of service of major UK home realtors which don’t permit web scraping, I decided to use the Inside Airbnb website and filter the data on short-term rentals of entire flats or buildings for London, UK. The data set is anonymously scraped from Airbnb host profiles in a number of major international cities. The data for London itself can be found here at the webpage for London. The specific data set used in this analysis was scraped 11 December 2024. Because of planning regulations in the Greater London area made to protect communities and keep homes available, short-term rentals are limited to 90 nights per year.

For the analysis I employed several features of the Inside Airbnb data set, notably the borough of the property location, the property and room types, the amount of people the property can accomodate, the number of bedrooms and bathrooms, the price per night, the availability of the property over the last year, the number of days from the last review (if any), and latitude and longitude of the property. For the purposes of anonymity, these geographic coordinates are randomly offset by 150 meters.

Beyond the main data source from Inside Airbnb, the data were also enriched by adding crime rate per borough, distance of the property to the nearest Tube Underground station, and local amenities in the vicinity of the property. A more detailed list of these data sources can be found in the GitHub repository of this work. Moreover, I added a section concerning the data preparation in the GitHub repository of this work as well.

Exploratory data analysis
#

A few initial observations can already be gleaned from the series of histograms in Figure 1. The top two histograms of latitude and logitude show a bimodal distribution that can be ascribed primarily to the Thames river in the first plot, but is harder to ascertain in the second. This could be due to the presence of more property listings around the major London parks, mostly present in the east and west of the city.

From the price histogram we observe a sharp drop in short-term rental prices per night, with a distribution heavily skewed towards the positive x axis. I chose to limit the x axis to £1000 as the upper limit, but there are several extreme outliers that are even further up in price. These outliers are also removed in the model analysis. A similar distribution behavior is visible in the plot for the number of days since the last review.

Histogram plot

Figure 1: Histogram of the main numerical features of the data set.

Histogram plot

Borough plots
#

Which boroughs have the most short-term rental properties?

From Figure 2 you see that most short-term rentals are present in the borough of Westminister, with Kensington & Chelsea, Camden, and Tower Hamlets listed in second, third, and fourth places respectively. The borough with the least number of rentals is Sutton.

Geographic plot of the number of rentals per borough

Figure 2: Geographic plot of the number of rentals per borough.

Geographic plot of the number of rentals per borough

Which boroughs have the highest median price?

The highest median prices can be found unsurprisingly in Kensington & Chelsea, which is the borough with the most exclusive and expensive properties of the city, followed closely by the boroughs of Westminster, Camden and Lambeth.

Geographic plot of the median price of rentals per borough

Figure 3: Geographic plot of the median price of rentals per borough.

Geographic plot of the median price of rentals per borough

Property features plot
#

What property features are correlated with average rent prices?

From the bar plot of the property features, we can see that many features have a positive correlation with price. However, adding the parking option to properties surprisingly corresponds to a decrease in average rent prices. Also, there is very little average price difference between properties that offer storage and those that don’t.

Property amenity features vs average rent price

Figure 4: Property amenity features vs average rent price.

Property amenity features vs average rent price

I also calculated the correlation and the variance inflation factor for the numerical features, which can be viewed in the GitHub repository of this work under the correlation plot section. In summary, the high corelation values present for some numerical features aren’t due to collinearity, so all of the numerical features can be used for the model generation.

Model generation
#

A few data science regression algorithms from Scikit-Learn were used to model the data. These were linear regression, random forest regressor, stochastic gradient descent regressor, support vector regressor and XGBoost regressor. The performance of the best algorithm was determined based on the highest \(R^2\). Of the data algorithms used, the support vector regressor and the random forest regressor achieved the best \(R^2\) scores of 0.636 on the validation data set.

Model RMSE \(R^2\)
Linear regression 81.82 0.587
Random forest regressor 76.74 0.636
Stocastic gradient descent regressor 82.16 0.583
Support vector regressor 76.77 0.636
XGBoost regressor 77.56 0.629
Table 1 – 5-fold cross-validation RMSE and \(R^2\) results.

5-fold cross-validation RMSE and \(R^2\) results were also calculated. Here the support vector regressor and the random forest regressor both achieved the highest \(R^2\) scores of 0.622. In the end, the decision to select the support vector regressor was solely based on the faster compute time for this algorithm. More information can be found in the cross-validation section of the GitHub repository.

The \(R^2\) for the support vector regressor using the test data set was slightly better at 0.649. A grid search analysis on the support vector regressor produced the best RMSE value with C=1.0 and epsilon=0.1, which are the default values for the support vector regressor in Scikit-Learn. More information on the grid search analysis can be found in the corresponding section in the GitHub repository.

Metric Value
RMSE 72.03
\(R^2\) 0.649
Table 2 – Support vector regressor RMSE and \(R^2\) results with test data set.

Ordinary least squares (OLS) from Statsmodels allows us to calculate the F-statistic to determine the likelihood of association between the predictors and the outcome. In the regression results the F-statistic returns a value of 902.5, which is much greater than 1 and points to a very high association between at least one predictor and the outcome. More can be found in the GitHub repository).

The mean and residual standard error of the price (in GBP) is 149.66 ± 48.54 (lower end 101.12, upper end 198.2). The error percentage of the residual standard error to the mean is 32.4%. This is the expected average variation of the price compared to the regression line.

Conclusions
#

The project was really enjoyable, and the part I liked the most was creating a new data set by data enrichment from other data sources. Once the model was generated, I set up an interactive web app with Streamlit that allows users to determine the rent price for the rental properties according to the features described above. Check it out at:

Short-Term Rental Price Estimator • Streamlit

Enjoy!

Angelo Varlotta
Author
Angelo Varlotta
If you can’t explain it simply, you don’t understand it well enough – Albert Einstein