Oslo AirBNB Data

Simple insights about Airbnb accommodations in Oslo

Awadelrahman M. A. Ahmed
6 min readJan 17, 2021
photo credit: www.visitoslo.com

This post aims to explore Airbnb data of Oslo city in Norway. The data set is downloadable from Inside Airbnb which is an independent, non-commercial set of tools and data that allows you to explore how Airbnb is being used in cities around the world. The code repository is here.

Particularly, this post targets to answer these questions:

  • How are advertised accommodations distributed across neighborhoods/districts in Oslo?
  • What are accommodation types mostly advertised in Airbnb in Oslo?
  • How do accommodation prices look like?
  • What price we can suggest to a new accommodation owner who wants to post an advertisement in Airbnb?

Data set Description

The data contains around 3000 listings of advertised accommodation places as compiled in 30 November, 2020 . It divides Oslo into 17 neighborhoods: ‘Alna’, ‘Bjerke’, ‘Frogner’, ‘Gamle Oslo’, ‘Grorud’, ‘Grünerløkka’, ‘Marka’, ‘Nordre Aker’, ‘Nordstrand’, ‘Østensjø’, ‘Sagene’, ‘Sentrum’, ‘Søndre Nordstrand’, ‘St. Hanshaugen’, ‘Stovner’, ‘Ullern’, ‘Vestre Aker’. The original data set contains the following descriptors:

Listing columns type:
id int64
name object
host_id int64
host_name object
neighbourhood_group float64
neighbourhood object
latitude float64
longitude float64
room_type object
price int64
minimum_nights int64
number_of_reviews int64
last_review object
reviews_per_month float64
calculated_host_listings_count int64
availability_365 int64

We will ignore the columns that have a significant number of missing values, therefore we will focus only on the following 6 descriptors after we encode them to numerical values.

Encoded dataframe types:
id int64
host_id int64
neighbourhood int8
room_type int8
price int64
minimum_nights int64

Q1: How are advertised accommodations distributed across neighborhoods/districts in Oslo?

The below figure shows the distribution of the listings across Oslo neighborhood. The figure shows the percentage share of each neighborhood. The top 5 neighborhoods are ‘Grünerløkka’, ‘Frogner’, ‘Gamle Oslo’, ‘St. Hanshaugen’ and ‘Sagene’ counting for more than 70% of the total listings.

Fig. 1 Accommodations shares by neighborhood

The above plot is interestingly supported by the map below which can be found in the website of the Norwegian mapping authority from Statistics Norway showing the percentage of occupied dwellings in block and flats in Oslo as a whole. Even though the map dates back to 2011 but it shows a pretty similar distribution keeping those 5 top neighborhoods which around Oslo city center.

Fig. 2 Source: https://www.ssb.no/en/befolkning/statistikker/fobbolig/hvert-10-aar/2013-02-26

Q2: What are accommodation types mostly advertised at Airbnb in Oslo?

The data set contains 4 different room types : ‘Shared room’, ‘Private room’, ‘Hotel room’ and ‘Entire home/apt’. The distribution of the listings per type is as shown in the plot below. About 75% of the owners aim to rent their entire homes or apartments and about 20% advertise private rooms, whereas very small share counts for less than 5% is advertised as shared rooms or hotel rooms.

Fig.3 The distribution of the listings per type

Q3: How do accommodation prices look like?

Now we can have a look at the prices. We firstly show the scatter plot in the figure below illustrating the price values from the original data set. We can see that there are some outliers that which might make it difficult to build a perception about how prices distributed.

Fig.4 Raw prices data scatter plot

To clean the data set to make better insights, any listing that has a price more than 3 standard deviation units will be considered an outlier and will be omitted from our data. Performing that, we get the following scatter plot.

Fig. 5 Prices data scatter plot after omitting outliers

The histogram below shows that the prices follow the normal distribution, however it is slightly skewed to the right indicating some high price instances.

Fig. 6 Prices distribution

Furthermore, we can look at the average prices by room type. The figure below reflects an intuitive fact that entire apartments average prices are the highest followed by private rooms and then shared rooms. The hotel rooms as shown have the lowest average price, this is due to the fact that hotels normally rely on their own websites to rent rooms and the ones in Airbnb are very few instances and the specific reasons could be investigated for each data point individually as the hotel star rating is an important factor to decide the price.

Fig. 7 Average prices by room_type

The average prices by neighborhood is shown in the figure below. We can see that the 5 top average prices are of ‘Vestre Aker’, ‘Sentrum’, ‘Nordstrand’, ‘Frogner’ and ‘Ullern’. A main reason could be the scarcity of the advertised accommodations in all these neighborhoods, putting in mind their low accommodation shares we saw in Fig. 1 , and the high prices could be as a result of supply-demand balance. Except the case of ‘Frogner’ neighborhood which has high accommodation share and also high prices. We can speculate that is because the closeness to many attractive places to visitors and tourists e.g. Frogner Park.

Fig. 8 Average prices by neighborhood

Q4: What price we can suggest to a new accommodation owner who wants to post an advertisement in Airbnb?

Now, we can look at the correlation between our selected variables. Specifically we are targeting to find a simple model that can return the price of some accommodation owner who has no clue about the prices. The matrix below shows the correlation values between our variables. Consequently, we can choose the room_type and neighborhood to be our predictors. The price has 0.32 correlational factor with the room_type.

Fig.9 Correlation matrix

For the simplicity purpose, we can use a simple linear regression model the receives the neighborhood and room_type numerical variables and predicts a price. The model can be fit easily using any machine learning library, we used sklearn in python. We take an example that the new listings are as follows:

neighbourhood        room_type
0 Frogner Entire home/apt
1 Østensjø Entire home/apt
2 Grünerløkka Private room
3 Sagene Shared room
4 Frogner Shared room

As a result, the model predicts the prices as below. These predictions can guide the owners to set reasonable and profitable prices.

neighbourhood        room_type  price_pred
0 Frogner Entire home/apt 1184
1 Østensjø Entire home/apt 1174
2 Grünerløkka Private room 630
3 Sagene Shared room 348
4 Frogner Shared room 360

Conclusion

In this post we tried to explore Airbnb data set of Oslo. For the code you may refer to the repository here.

--

--