This Jupyter notebook includes some code to get you started with web scraping. We will use a package called BeautifulSoup
to collect the data from the web. Once you've collected your data and saved it into a local .csv
file you should start with your analysis.
If you visit [https://www.airlinequality.com] you can see that there is a lot of data there. For this task, we are only interested in reviews related to British Airways and the Airline itself.
If you navigate to this link: [https://www.airlinequality.com/airline-reviews/british-airways] you will see this data. Now, we can use Python
and BeautifulSoup
to collect all the links to the reviews and then to collect the text data on each of the individual review links.
Then we scrap the Route
data and then the Seat Type
, some Numerical
data and Score
data from the website.
Finally, we save it as BA_DataSet.csv
.
- First of all, we separate
Approval_Status
andReview
. - We remove unwanted parts from
Approval_Status
.
We analyze sentiment using the TextBlob
library and look at its distribution in the graph.
- According to the
to
we divide the routes intoFrom
andTo
. - According to
via
we separateTransfer
data. - To avoid complexity in the
From
andTo
columns, we reduce data that may represent the same thing to one data. - We look at which data is redundant and support it with a graph.
Graphing how much the Seat_Type
data is.
We examine the correlation of Numerical
data with Score
. Then we also examine the correlation between them.
- Visualization of the average
Score
data according toSeat_Type
andReview_Type
data with a graph. - Examine how many of which data according to
Review_Type
andSeat_Type
. - Then examining its relationship with other
Numerical
data.
Finally, we save it as BA_Clean_DataSet.csv
.