Restaurant Suggestor

I have created this project to help users find restaurants based on their favirote dishes and area in which they what to search the restaurants to dine-in.

This Project is sectioned into following sections.

Data Extraction and Collection
Data Preprocessing
Exploratory Data Analysis
Data Tansformation
Building Machine learning models
Developing a working model site

⇛ Data Extraction and Collection

      To beign with I collected Zomato Restaurants Data from kaggle.com and Extracted other data from Swiggy. Zomato and Swiggy are popular apps where most of the food orders are placed on daily Basis by people in Bengaluru.

      The Project involes restaurants in Bengaluru.The Dataset is obtained by scraping the two popular online delivery systems. Swiggy and Zomato. The dataset is separately obtained and later combined to make it as one file. The size of the individual files are 0.5GB and 100MB respectively from Zomato and Swiggy.

      The main Aim of this project is to prepare a model which recommends restaurants based on dish or restaurant type. The final output is to build a web app which shows the user his nearest best afforable, high quality restaurants with directions to reach to the place when a particular restarant is selected.

**** Please click on Data Preprocessing tab to view the data extraction code or click here

⇛ Data Preprocessing

       The Data came in two separate files. The exctracted data from Swiggy contained about 10300 rows of information and Zomato contained 54000 rows of information. Columns in Both the files were not equal. Also the files contained missing values, Long text values, Location etcera. Common Columns included Name of the restaurant, Rating, Cost for people Dining, location area and url.
      The text data had to be cleanned for emoji's, special characters etc. I have used regular express and another module called as Clean-Text in-order to clean the text data column.
      The numerical columns contained missing values. Duplicated Values as well. Those have to be addressed as well. Not all numerical columns could be filled with zeros. Depending on the nature of the column and the values which are representing the column, Numerical columns have to be filled for Missing Values. Although Duplicates can be removed easily.
      Once the datasets were combined and cleanned, The Data in the pandas frame was saved to .csv file for all the later use. Running the preprocessing everytime before the modeling in the jupyter file is not advisible. It takes a lot of time to run the file. So saving the cleanned data in another file is a good practice.

**** Please click on Data Preprocessing tab to view the data Preprocessing code or click here

⇛ Exploratory data Analysis

      I have used Statistical ways to build the graph and draw meaningfull insights from the data. The explanantion for each graph and the dashboard are clearly described in the Analysis page and Dashboard Page. I have also used Tableau to build the dashboard.
      The Overall insights and conclusion drawn from the data are explained in the Analysis page.
       I have also done sentiment analysis of the reviews and rating of restaurants given by customers while ordering food. This was necessary to know how the reviews and rating affect the increase of sales of the particular Restaurants. The Restaurants have received both Positive and negative ratings from various customers. Despite the ratings, Restaurants have never failed to give their great service and customer satisfaction.

Please click on Data analysis tab to view the Exploratory Data Analysis and Visualization code or click here For Dashboard click here. Dashboard of the Analyzed Data
Please check out the sentiment anlysis tag and dashboard for the explanation and code.

⇛ Data Transformation

For the machine learning model, the data have to be scaled and encoded so that the model can understand the numbers.
The dataset I created did not come in separate train and test and validation set. For the machine learning the spliting of data is highly required to avoid bais and variance of the trained model.

**** Please click on ML Model tab to view the Data Transformation explanantion and code or click here

⇛ Building Machine learning models

There are many Machine Learning model architures. When I saw the dataset, I wanted to build a machine learning model based off user's reviews, cuisines, restaurant type and location
Cuisinies, restaurant type and location were categorical variables. The restaurants can be grouped based on location using clustering architures.
The best suited architecture is K-means clustering algorithm.
Also I wanted to build a model based on users rating as well. Since Rating was a numerical column, I could easily apply ensemble machine learning model such as XG-Boost. I also tried SVM algorithm.
One of the famous filtering method is collaborative filtering using TFIDF. I used this technique on users reviews and Cuisines they serve to filter out the restaurants depending on sentiments scores obtained using TFIDF vectorization and NLP techniques.

After this I tested the model with the test data as well. In onrder to validate the trained model i used K-fold cross validation technique.
The final results were ploted and for collaborative filtering I tested out the function using the name of the dish.

**** Please click on ML Model tab to view Building Machine learning explanantion and code or click here

Working Model

I prepared a small page which suggests the restaurants based on user input. Due to some unaviodable reasons I couldnt provide the deployed link over here. However, the django project is available on github. The link is here.

Supporting Files

The raw dataset files, the Jupyter Notebooks which I worked on to clean, analyze and create the models of the data are found on my github ripository. Please Click on Supporting files