The holidays are over, it’s time to get back to work!

We start the year with sales: we have already realized two ready-made databases – a database of grocery stores in Kazakhstan and a database of construction companies in Kazakhstan. We also coordinated the work and issued an invoice for our client, who previously bought databases from us. This time we will create a customized database “DIY/Household stores Belarus. 2025”

Plans for January 2025
– Update the databases for Kazakhstan and UAE collected in 2024
– Expand geography to include Central Asia and the Caucasus: Uzbekistan, Kyrgyzstan, Tajikistan, Armenia, Azerbaijan and Georgia.

I would like to answer the question, how do you collect a register of companies?

It is not magic 🪄

For this purpose, I have developed a methodology that allows you to collect the most useful information in a short time.

1. Identifying tools for parsing
In Central Asian countries, the main sources of data are map services, directories and search engine parsing.

2. analyze population density
Densely populated areas are points of business concentration. It is important to take them into account in the analysis. You need to select the largest cities and neighborhoods by population. It is desirable to cover 90% of the population.

3. Using map data
I use Nominatim OpenStreetMap, an open source solution, to define the boundaries of cities and regions.

4. Preparing a list of queries
Examples of queries for the HoReCa database:
Restaurant
Cafe
FastFood
Dining room
Sushi
Pizza
Hookah bar
Hotel

5. Data Verification:
After parsing, all information is manually verified to ensure maximum relevance. I check the joints and analyze the number of companies by city and their population.

I have divided the process into several steps to make the data collection transparent. I will add a full description of the methodology to the site so that clients can see what information is collected. No one buys a cat in a bag!

Transparency and standardization of processes is a very important thing for business. For the employees, the customer, and the manager.

In the next post, I will tell you how we will load such massive data sets into the database. Special thanks to my former classmate Dima! Thanks to his help we found a great DataOps specialist who designed the database and will handle the data migration.

https://www.linkedin.com/feed/update/urn:li:activity:7284960480886988801/