Taste Before You Buy

Alina V
5 min readOct 19, 2021

I had plenty of options.

But it was a weekend, mild sunshine and I didn’t want to analyse COVID trends.

…so I chose wine.

Long story short. As part of a very curvy learning process (which never ends — neither in the evenings, nor on weekends) one is entitled to a little bit of joy, don’t you agree?

I am part of Udacity Data Science Nanodegree program (surprise to all of you who read my previous posts on cooking!) and one of my exciting tasks is to (surprise one more time!) write an article on Medium about my findings.

I was absolutely free to choose a dataset to work with, and however tempted I was to predict COVID mortality rates, I chose something much more pleasurable, i.e. predicting the quality of wine based on a few variables.

Wouldn’t it be great, you could come to a shop, browse the wine, enter country, province, wine variety, price into an app and get its quality score without even tasting it? Not sure about you, but I’d be delighted to download such an app straight away!

So I found a nice wine review dataset on Kaggle here: https://www.kaggle.com/zynicide/wine-reviews

In my raw data I had a few either highly individual or largely missing parameters which I had to exclude, because they were of little significance to a large scale trend, which I was after.

Where do higher quality wines come from?

In the end the choice of variables I had was the following:

  • country
  • points (from 80 to 100, user-generated quality score for wines)
  • price (from 4 to 3300 USD)
  • province
  • region
  • variety
  • winery (optionally)

Take your guess on top 5 countries? Of course, my dataset is largely based on US data, therefore a bit biased, but surprisingly US didn’t top the list!

Canada did…

Then the famous France and Italy, then US, then Australia.

I have never tried a Canadian wine, have you? Are they as good as my data tells me?

Higher price = higher quality?

Another interesting observation about countries:

High quality means high price, but high price does not necessarily mean high quality. In fact, most of the highly priced wines are of average quality. And quite a few of highly priced wines are not good at all…

This means you definitely need help selecting a bottle from the shelf, because it seems you can’t rely on the price. Unless, of course you are a wine expert.

We observe the same trend on province (Champagne, of course, being higher rated), but region has an unexpected positive correlation between price and quality points.

High price = high quality, high quality = high price when we talk about region.

All fair.

For the variety it’s something in between. Highly priced wines score from average to high quality, high quality wines still cost more in our price range.

So the answer to my second question is: depends on your perspective. By default, when you come to a supermarket and look for a quality French wine and then aim for the top shelf with highest prices, you can get a pretty bad wine just like that.

Can we accurately predict wine quality if we know some of its formal features, including its price?

After exploring all aspects of data, I honestly tried to predict the wine quality with what I was given.

After many attempts with various models, the highest accuracy I was able to achieve was around 59% on the test set. I am sure that there is a way to predict wine quality better than 59% on a bright day, but perhaps it requires a few more specific parameters in the dataset.

59% means you often shouldn’t believe your app if it tells you “go buy this wine”.

This made me question: is my model not good enough or is it in fact not possible to predict the quality accurately based on these few parameters?

Feature Importance, descending

You can see that everything after “price” and perhaps “winery” is not very important in predicting the quality. “Description” won’t really count, because it’s unique to each wine and it’s obvious it makes the difference.

But what is important but yet generic?

For example, do we know if the wine is dry, semi-dry, semi-sweet or sweet? You will not always guess from variety, so no.

Or what year is the wine from?

Is it red, rosé or white?

Or maybe orange?

I am sure that these variables could’ve helped the predictions.

So the answer to my third question is: probably not.

Wines are a much more delicate subject than you may have thought, so knowing the grape variety, price and winery location won’t do the quality guess for you.

The magic app will have to wait just a little bit :)

--

--