Introduction

pyotodom supplies two methods that can be used to scrape data from OtoDom. They are designed to work in tandem, but they can also be used separately.

Scraping category data

The following method should be used to scrape all the offers compliant with the supplied search parameters

otodom.category.get_category(main_category, detail_category, region, **filters)

Scrape OtoDom search results based on supplied parameters.

Parameters:
  • main_category – “wynajem” or “sprzedaz”, should not be empty
  • detail_category – “mieszkanie”, “dom”, “pokoj”, “dzialka”, “lokal”, “haleimagazyny”, “garaz”, or empty string for any
  • region – a string that contains the region name. Districts, cities and voivodeships are supported. The exact location is established using OtoDom’s API, just as it would happen when typing something into the search bar. Empty string returns results for the whole country. Will be ignored if either ‘city’, ‘region’, ‘[district_id]’ or ‘[street_id]’ is present in the filters.
  • filters – the following dict contains every possible filter with examples of its values, but can be empty:
input_dict = {
    '[dist]': 0,  # distance from region
    '[filter_float_price:from]': 0,  # minimal price
    '[filter_float_price:to]': 0,  # maximal price
    '[filter_float_price_per_m:from]': 0  # maximal price per square meter, only used for apartments for sale
    '[filter_float_price_per_m:to]': 0  # minimal price per square meter, only used for apartments for sale
    '[filter_enum_market][]': [primary, secondary]  # enum: primary, secondary
    '[filter_enum_building_material][]': []  # enum: brick, wood, breezeblock, hydroton, concrete_plate,
        concrete, silikat, cellular_concrete, reinforced_concrete, other, only used for apartments for sale
    '[filter_float_m:from]': 0,  # minimal surface
    '[filter_float_m:to]': 0,  # maximal surface
    '[filter_enum_rooms_num][]': '1',  # number of rooms, enum: from "1" to "10", or "more"
    '[private_business]': 'private',  # poster type, enum: private, business
    '[open_day]': 0,  # whether or not the poster organises an open day
    '[exclusive_offer]': 0,  # whether or not the offer is otodom exclusive
    '[filter_enum_rent_to_students][]': 0,  # whether or not the offer is aimed for students, only used for
        apartments for rent
    '[filter_enum_floor_no][]': 'floor_1',  # enum: cellar, ground_floor, floor_1-floor_10, floor_higher_10,
        garret
    '[filter_float_building_floors_num:from]': 1,  # minimal number of floors in the building
    '[filter_float_building_floors_num:to]': 1,  # maximal number of floors in the building
    'building_type': 'blok',  # enum: blok, w-kamienicy, dom-wolnostojacy, plomba, szeregowiec,
        apartamentowiec, loft
    '[filter_enum_heating][]': 'urban',  # enum: urban, gas, tiled_stove, electrical, boiler_room, other
    '[filter_float_build_year:from]': 1980,  # minimal year the building was built in
    '[filter_float_build_year:to]': 2016,  # maximal year the building was built in
    '[filter_enum_extras_types][]': ['balcony', 'basement'],  # enum: balcony, usable_room, garage, basement,
        garden, terrace, lift, two_storey, separate_kitchen, air_conditioning, non_smokers_only
    '[filter_enum_media_types][]': ['internet', 'phone'],  # enum: internet, cable-television, phone
    '[free_from]': 'from_now',  # when will it be possible to move in, enum: from_now, 30, 90
    '[created_since]': 1,  # when was the offer posted on otodom in days, enum: 1, 3, 7, 14
    '[id]': 48326376,  # otodom offer ID, found at the very bottom of each offer
    'description_fragment': 'wygodne',  # the resulting offers' descriptions must contain this string
    '[photos]': 0,  # whether or not the offer contains photos
    '[movie]': 0,  # whether or not the offer contains video
    '[walkaround_3dview]': 0  # whether or not the offer contains a walkaround 3D view
    'city':  # lowercase, no diacritics, '-' instead of spaces, _city_id at the end
    'voivodeship':  # lowercase, no diacritics, '-' instead of spaces
    '[district_id]': from otodom API
    '[street_id]': from otodom API
}
Return type:list of dict(string, string)
Returns:Each of the dictionaries contains the following fields:
'detail_url' - a link to the offer
'offer_id' - the internal otodom's offer ID, not to be mistaken with the '[id]' field from the input_dict
'poster' - a piece of information about the poster. Could either be a name of the agency or "Oferta prywatna"

It can be used like this:

input_dict = {'[filter_float_price:to]': 1100}
parsed_category = scrape.category.get_category("wynajem", "mieszkanie", "gda", **input_dict)

The above code will put a list of dictionaries(string, string) containing all the apartments found in the given category (apartments for rent, in a region starting with “gda”, cheaper than 1100 PLN) into the parsed_category variable

Scraping offer data

The following method should be used to scrape all the information about an offer located under the given string. Context is used for phone number scraping. The corresponding field will be empty if it’s not provided.

otodom.offer.get_offer_information(url, context=None)

Scrape detailed information about an OtoDom offer.

Parameters:
  • url – a string containing a link to the offer
  • context – a dictionary(string, string) taken straight from the scrape.category.get_category()
Returns:

A dictionary containing the scraped offer details

It can be used like this:

offer_details = []
for offer in parsed_category:
    offer_details.append(get_offer_information(offer['detail_url'], context=offer))

The above code will populate the offer_details list with all the information about apartments found in parsed_category