This dataset come from kaggle.

Data science salary analysis

Data science salary analysis

The purpose of this analysis is to investigate the salaries of a group of employees or people who work in the data science field.
The main approach that will be taken to carry out the analysis is to make comparisons and answer several questions that have been determined previously.
To answer most of these questions is to group the data into certain groups and then compare it with the equivalent salary value (salary in USD).

import pandas as pd
data = pd.read_csv("ds_salaries.csv")
data

ASK

  • Does higher work experience mean higher salaries?

  • How influential is the company location with the salary given?

  • Does comany size also affect the salary received even for workers who have the same level of work experience?

  • Does the employment type contribute significantly to the salary received even for workers who have the same level of work experience?

  • Remote ratio percentage.

Does higher work experience mean higher salaries?

To answer this question. it's better to use the average salaries for every kind of work experience.

exp = data.groupby("experience_level")
avgexp = exp["salary_in_usd"].mean()
avgexp = avgexp.sort_values(ascending=False)
avgexp

experience_level
EX 194930.929825
SE 153051.071542
MI 104525.939130
EN 78546.284375
Name: salary_in_usd, dtype: float64

The answer to the first question is, yes.
Higher work experience, on average, brings higher salaries as well

How influential is the company location with the salary given?

It seems that this task will be a little more complicated compared to the previous task.
To answer this question. The parameters that we will use as a benchmark are 2 things.

1. work experience.
2. region.

because if you compare based on regional parameters only. it seems to be a little less fair, because certain regions may have many workers with lower levels of work experience, and as we know from the answer to the first question.
Work experience affects the salary given too.

expreg = data.groupby(["experience_level","company_location"])
expregmean = expreg["salary_in_usd"].mean()
expregmean = expregmean.sort_values(ascending=False)
expregmean

The answer to question 2 is yes.
Because if we look at the top row of the data that we just grouped, it is already clear.
Even SE(Senior) in IL receives a salary that is almost 2 times higher than EX(Executive) in US.

If you ask, how is it possible that someone with less work experience can get a higher salary just because they are in a different area?
Well... unfortunately that's what the data says.

Does comany size also affect the salary received even for workers who have the same level of work experience?

Answering this question may be a little similar to answering the previous question.
Group data based on 2 parameters, then give an assessment.

sizeexp = data.groupby(["company_size", "experience_level"])
sizeexpmean = sizeexp["salary_in_usd"].mean()
sizeexpmean = sizeexpmean.sort_values(ascending=False)
sizeexpmean

company_size experience_level
M EX 198857.284211
S EX 196827.166667
L EX 165363.153846
SE 156159.690821
M SE 153643.334069
MI 111586.421900
S SE 106875.465116
L MI 89135.731343
M EN 87416.456140
L EN 72896.810000
S EN 59120.734694
MI 58080.500000
Name: salary_in_usd, dtype: float64

experience_level company_location
SE IL 423834.000000
JP 214000.000000
EX US 207445.520408
SE NG 200000.000000
PR 167500.000000
...
EN KE 9272.000000
SE SG 8000.000000
MI BO 7500.000000
EN GH 7000.000000
MK 6304.000000
Name: salary_in_usd, Length: 138, dtype: float64

The answer to this question is definitely and clear. YES.
However, for middle to lower experience levels, the value varies slightly. not like the ones that occupy the top 3 of the list. all three have EX experience.

Even though they still have a salary difference. but still, they are at the top of the list with the same level of experience, and that is very different from those with a middle to lower level of experience.

Does the employment type contribute significantly to the salary received even for workers who have the same level of work experience?

Simple questions will be answered simply too. Group data based on employment type and then analyze.

emptype = data.groupby("employment_type")
emptypemean = emptype["salary_in_usd"].mean()
emptypemean = emptypemean.sort_values(ascending=False)
emptypemean

employment_type
FT 138314.199570
CT 113446.900000
FL 51807.800000
PT 39533.705882
Name: salary_in_usd, dtype: float64

For full-time and contract types, maybe the salary range between the two has a relatively small distance.
But for freelance and part-time types. The distance is very far below full time and also contract.

Remote ratio percentage?

I don't like matplotlib. So, to complete this task i gonna use power bi instead.

rerate = data.groupby("remote_ratio")
reratemean = rerate["salary_in_usd"].mean()
reratemean.to_csv("reratemean.csv")
reratemean

remote_ratio
0 144316.202288
50 78400.687831
100 136481.452830
Name: salary_in_usd, dtype: float64

Well. That seems to be the answer to the last question of the analysis this time.
From the pie chart, it can be seen that whether employees work remotely or not, it seems that the difference in number between the three is not so significant.

Thank you for being willing to read this simple analysis.

And see you in the next project.
LOVE YOU ALL...