This dataset is come from kaggle.
Vegetables and fruits prices.
Vegetables and fruits price analysis.
This section is only the cleaning and preparing the dataset. you can skip it.
import pandas as pd
import matplotlib.pyplot as plt
import re
import numpy as np
data = pd.read_csv("ProductPriceIndex.csv")
def deldolar(x) :
hasil = re.sub(r"\$","",x)
return hasil
data["farmprice"] = data["farmprice"].apply(deldolar)
data["atlantaretail"] = data["atlantaretail"].apply(deldolar)
data["chicagoretail"] = data["chicagoretail"].apply(deldolar)
data["losangelesretail"] = data["losangelesretail"].apply(deldolar)
data["newyorkretail"] = data["newyorkretail"].apply(deldolar)
data["farmprice"] = data["farmprice"].replace('', np.nan).astype(float)
data["atlantaretail"] = data["atlantaretail"].replace('', np.nan).astype(float)
data["chicagoretail"] = data["chicagoretail"].replace('', np.nan).astype(float)
data["losangelesretail"] = data["losangelesretail"].replace('', np.nan).astype(float)
data["newyorkretail"] = data["newyorkretail"].replace('', np.nan).astype(float)
data.info()
ASK
The purpose of this analysis is to find wether there is a movement in vegetables prices over time.
So for that purpose. Better if what we make a benchmark is 5 vegetables with the highest price.
Because if the price is high, then it means that vegetables are also sought after by many people.
Therefore, the highest price can be used as a reference about people's interest in buying vegetables and can also be used as a benchmark about price movements.
topv = data.groupby("productname").sum()
NYTOP = topv["newyorkretail"]
CCGTOP = topv["chicagoretail"]
ATLTOP = topv["atlantaretail"]
LATOP = topv["losangelesretail"]
NYTOP = NYTOP.sort_values(ascending=False)
CCGTOP = CCGTOP.sort_values(ascending=False)
ATLTOP = ATLTOP.sort_values(ascending=False)
LATOP = LATOP.sort_values(ascending=False)
print(NYTOP)
print("\n")
print(CCGTOP)
print("\n")
print(ATLTOP)
print("\n")
print(LATOP)
print(NYTOP.info())
print("\n")
print(CCGTOP.info())
print("\n")
print(ATLTOP.info())
print("\n")
print(LATOP.info())
We already know that the top 2 on the list are strawberries and potatoes.
However, what we don't know is that the top 3 must follow below them.
This cannot be done directly because the top 3 after strawberries and potatoes varies greatly from one market to another.
That's why we need to go to the next step to solve this problem which is
CONCATE ALL THE DATAFRAME AND SUM THE PRICE
productname
Strawberries 3222.97
Potatoes 3107.18
Cauliflower 2265.48
Red Leaf Lettuce 1975.42
Green Leaf Lettuce 1874.30
Celery 1830.00
Broccoli Crowns 1784.39
Romaine Lettuce 1722.12
Iceberg Lettuce 1717.82
Broccoli Bunches 1690.03
Honeydews 1221.26
Avocados 1139.02
Flame Grapes 1120.14
Carrots 1060.61
Cantaloupe 883.27
Oranges 860.65
Tomatoes 841.40
Asparagus 716.57
Thompson Grapes 625.70
Peaches 518.04
Nectarines 507.07
Plums 413.93
Name: newyorkretail, dtype: float64
productname
Potatoes 3070.18
Strawberries 2683.94
Cauliflower 2070.29
Romaine Lettuce 1748.52
Broccoli Crowns 1599.15
Red Leaf Lettuce 1580.74
Green Leaf Lettuce 1541.52
Celery 1419.11
Broccoli Bunches 1384.40
Iceberg Lettuce 1287.09
Avocados 982.71
Honeydews 974.79
Flame Grapes 942.44
Cantaloupe 820.42
Carrots 819.79
Tomatoes 814.56
Asparagus 678.85
Oranges 675.40
Thompson Grapes 536.15
Peaches 518.03
Nectarines 508.23
Plums 404.96
Name: chicagoretail, dtype: float64
productname
Potatoes 3658.61
Strawberries 2767.27
Cauliflower 1934.17
Broccoli Crowns 1786.28
Broccoli Bunches 1746.55
Romaine Lettuce 1552.16
Red Leaf Lettuce 1498.97
Green Leaf Lettuce 1486.06
Celery 1461.58
Iceberg Lettuce 1271.04
Flame Grapes 1001.30
Honeydews 978.26
Carrots 856.38
Avocados 841.20
Tomatoes 839.09
Cantaloupe 809.25
Asparagus 749.23
Oranges 678.46
Thompson Grapes 561.58
Nectarines 427.04
Peaches 412.68
Plums 366.45
Name: atlantaretail, dtype: float64
productname
Strawberries 2982.98
Potatoes 2926.31
Cauliflower 2303.08
Broccoli Crowns 1706.04
Celery 1523.23
Iceberg Lettuce 1353.69
Romaine Lettuce 1326.30
Broccoli Bunches 1301.84
Red Leaf Lettuce 1294.63
Green Leaf Lettuce 1281.86
Flame Grapes 1075.45
Avocados 1027.67
Honeydews 964.81
Tomatoes 952.09
Cantaloupe 797.48
Carrots 791.15
Asparagus 778.79
Oranges 656.77
Thompson Grapes 598.16
Peaches 545.91
Nectarines 508.77
Plums 439.47
Name: losangelesretail, dtype: float64
sumAll = pd.concat([NYTOP, CCGTOP, ATLTOP, LATOP])
sumAll = sumAll.groupby("productname").sum()
sumAll = sumAll.sort_values(ascending=False)
sumAll
productname
Potatoes 12762.28
Strawberries 11657.16
Cauliflower 8573.02
Broccoli Crowns 6875.86
Red Leaf Lettuce 6349.76
Romaine Lettuce 6349.10
Celery 6233.92
Green Leaf Lettuce 6183.74
Broccoli Bunches 6122.82
Iceberg Lettuce 5629.64
Flame Grapes 4139.33
Honeydews 4139.12
Avocados 3990.60
Carrots 3527.93
Tomatoes 3447.14
Cantaloupe 3310.42
Asparagus 2923.44
Oranges 2871.28
Thompson Grapes 2321.59
Peaches 1994.66
Nectarines 1951.11
Plums 1624.81
dtype: float64
Ok now we know that the top 5 highest priced fruits / vegetables are Potatoes, Strawberries, Cauliflower, Broccoli crowns, and Red leaf lettuce.
After knowing the information. Our next task is to monitor the development of the prices of these 5 products from time to time.
Strawberries = data.loc[data["productname"] == "Strawberries"]
Potatoes = data.loc[data["productname"] == "Potatoes"]
Cauliflower = data.loc[data["productname"] == "Cauliflower"]
Broccoli = data.loc[data["productname"] == "Broccoli Crowns"]
RedLeaf = data.loc[data["productname"] == "Red Leaf Lettuce"]
NOW IT'S TIME FOR GRAPH
Rather than show graph in gere. it's better to make the visualization on power BI.
Therefore, our last task in this notebook is simply to save the data from the five top dataframes into a CSV file.
Strawberries.to_csv("Strawberries.csv")
print("Strawberries done.")
Potatoes.to_csv("Potatoes.csv")
print("Potatoes done.")
Cauliflower.to_csv("Cauliflower.csv")
print("Cauliflower done.")
Broccoli.to_csv("Broccoli.csv")
print("Broccoli done.")
RedLeaf.to_csv("RedLeaf.csv")
print("RedLeaf done.")
print("See you in the next project bye...")
Thanks for reading this little project.
Now, we have 5 diferent dataset that we can use later to make visualization and analyze the movement price of top 5 vegetables / fruits.
Well.. well.. wel..
It seems there is no regular or significant price movement. This finding may be somewhat disappointing and not in line with our expectations, but that's how the data is. What it conveys may sometimes not meet our expectations, but it always aligns with the reality that can be occasionally mundane.
See you in the next project. LOVE YOU....
Location
Kalimantan timur, Samarinda.
Hours
Everytime. Everyday.
Contacts
+6281348005809
mfajar7777777@gmail.com