Open In Colab

Uma contribuição para análise de venda de jogos

Abaixo Documentações Libs Gráficos:

Opções de EDA:

Predict Sales

PUGB Finish Predict

Predict Price

Netflix Dataset

Predict Imdb Rate

Desafio Escolhido

Video Game Sales

Possíveis Perguntas

  • Qual o jogo mais vendido por região/Genero/Plataforma ? OK
  • Jogo infantil vende mais do que adultos/Cultura do Pais também influência ? - Procurar base para join com classificação
  • Jogos exclusivos vendem mais ? OK
  • Concorrencia entre exclusivos (Principais Fabricantes Video Game)?
  • Produtora que mais vende e mais jogos ? Venda por jogo ? Ok
  • Será que a os NA indicam o comportamento do resto do mundo ? OK
  • Há anos com mais vendas de jogos ? OK
  • Genero por Região ? OK
!pip install -U seaborn
Requirement already satisfied: seaborn in /usr/local/lib/python3.7/dist-packages (0.11.2)
Requirement already satisfied: scipy>=1.0 in /usr/local/lib/python3.7/dist-packages (from seaborn) (1.4.1)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.7/dist-packages (from seaborn) (1.19.5)
Requirement already satisfied: pandas>=0.23 in /usr/local/lib/python3.7/dist-packages (from seaborn) (1.1.5)
Requirement already satisfied: matplotlib>=2.2 in /usr/local/lib/python3.7/dist-packages (from seaborn) (3.2.2)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.2->seaborn) (3.0.6)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.2->seaborn) (1.3.2)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.2->seaborn) (2.8.2)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.2->seaborn) (0.11.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.23->seaborn) (2018.9)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib>=2.2->seaborn) (1.15.0)
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
%matplotlib inline
from scipy import stats
df = pd.read_csv('vgsales.csv')

df.shape
(16598, 11)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Rank          16598 non-null  int64  
 1   Name          16598 non-null  object 
 2   Platform      16598 non-null  object 
 3   Year          16327 non-null  float64
 4   Genre         16598 non-null  object 
 5   Publisher     16540 non-null  object 
 6   NA_Sales      16598 non-null  float64
 7   EU_Sales      16598 non-null  float64
 8   JP_Sales      16598 non-null  float64
 9   Other_Sales   16598 non-null  float64
 10  Global_Sales  16598 non-null  float64
dtypes: float64(6), int64(1), object(4)
memory usage: 1.4+ MB
df.head()
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
0 1 Wii Sports Wii 2006.0 Sports Nintendo 41.49 29.02 3.77 8.46 82.74
1 2 Super Mario Bros. NES 1985.0 Platform Nintendo 29.08 3.58 6.81 0.77 40.24
2 3 Mario Kart Wii Wii 2008.0 Racing Nintendo 15.85 12.88 3.79 3.31 35.82
3 4 Wii Sports Resort Wii 2009.0 Sports Nintendo 15.75 11.01 3.28 2.96 33.00
4 5 Pokemon Red/Pokemon Blue GB 1996.0 Role-Playing Nintendo 11.27 8.89 10.22 1.00 31.37
df.isna().sum()/df.shape[0]
Rank            0.000000
Name            0.000000
Platform        0.000000
Year            0.016327
Genre           0.000000
Publisher       0.003494
NA_Sales        0.000000
EU_Sales        0.000000
JP_Sales        0.000000
Other_Sales     0.000000
Global_Sales    0.000000
dtype: float64
df.columns.str.lower()
Index(['rank', 'name', 'platform', 'year', 'genre', 'publisher', 'na_sales',
       'eu_sales', 'jp_sales', 'other_sales', 'global_sales'],
      dtype='object')
columns_renamed = {
    'Rank': 'rank', 
    'Name': 'name', 
    'Platform': 'platform', 
    'Year': 'year', 
    'Genre': 'genre',
    'Publisher': 'publisher', 
    'NA_Sales': 'na_sales',
    'EU_Sales': 'eu_sales', 
    'JP_Sales': 'jp_sales', 
    'Other_Sales': 'other_sales', 
    'Global_Sales': 'global_sales'
}
df.rename(columns=columns_renamed, inplace=True)
df[df.year.isna()]
rank name platform year genre publisher na_sales eu_sales jp_sales other_sales global_sales
179 180 Madden NFL 2004 PS2 NaN Sports Electronic Arts 4.26 0.26 0.01 0.71 5.23
377 378 FIFA Soccer 2004 PS2 NaN Sports Electronic Arts 0.59 2.36 0.04 0.51 3.49
431 432 LEGO Batman: The Videogame Wii NaN Action Warner Bros. Interactive Entertainment 1.86 1.02 0.00 0.29 3.17
470 471 wwe Smackdown vs. Raw 2006 PS2 NaN Fighting NaN 1.57 1.02 0.00 0.41 3.00
607 608 Space Invaders 2600 NaN Shooter Atari 2.36 0.14 0.00 0.03 2.53
... ... ... ... ... ... ... ... ... ... ... ...
16307 16310 Freaky Flyers GC NaN Racing Unknown 0.01 0.00 0.00 0.00 0.01
16327 16330 Inversion PC NaN Shooter Namco Bandai Games 0.01 0.00 0.00 0.00 0.01
16366 16369 Hakuouki: Shinsengumi Kitan PS3 NaN Adventure Unknown 0.01 0.00 0.00 0.00 0.01
16427 16430 Virtua Quest GC NaN Role-Playing Unknown 0.01 0.00 0.00 0.00 0.01
16493 16496 The Smurfs 3DS NaN Action Unknown 0.00 0.01 0.00 0.00 0.01

271 rows × 11 columns

df[df.publisher.isna()]
rank name platform year genre publisher na_sales eu_sales jp_sales other_sales global_sales
470 471 wwe Smackdown vs. Raw 2006 PS2 NaN Fighting NaN 1.57 1.02 0.00 0.41 3.00
1303 1305 Triple Play 99 PS NaN Sports NaN 0.81 0.55 0.00 0.10 1.46
1662 1664 Shrek / Shrek 2 2-in-1 Gameboy Advance Video GBA 2007.0 Misc NaN 0.87 0.32 0.00 0.02 1.21
2222 2224 Bentley's Hackpack GBA 2005.0 Misc NaN 0.67 0.25 0.00 0.02 0.93
3159 3161 Nicktoons Collection: Game Boy Advance Video V... GBA 2004.0 Misc NaN 0.46 0.17 0.00 0.01 0.64
3166 3168 SpongeBob SquarePants: Game Boy Advance Video ... GBA 2004.0 Misc NaN 0.46 0.17 0.00 0.01 0.64
3766 3768 SpongeBob SquarePants: Game Boy Advance Video ... GBA 2004.0 Misc NaN 0.38 0.14 0.00 0.01 0.53
4145 4147 Sonic the Hedgehog PS3 NaN Platform NaN 0.00 0.48 0.00 0.00 0.48
4526 4528 The Fairly Odd Parents: Game Boy Advance Video... GBA 2004.0 Misc NaN 0.31 0.11 0.00 0.01 0.43
4635 4637 The Fairly Odd Parents: Game Boy Advance Video... GBA 2004.0 Misc NaN 0.30 0.11 0.00 0.01 0.42
5302 5304 Dragon Ball Z: Budokai Tenkaichi 2 (JP sales) Wii NaN Action NaN 0.15 0.05 0.14 0.01 0.35
5647 5649 Cartoon Network Collection: Game Boy Advance V... GBA 2005.0 Misc NaN 0.23 0.08 0.00 0.01 0.32
6272 6274 The Legend of Zelda: The Minish Cap(weekly JP ... GBA NaN Action NaN 0.00 0.00 0.27 0.01 0.27
6437 6439 Sonic X: Game Boy Advance Video Volume 1 GBA 2004.0 Misc NaN 0.19 0.07 0.00 0.00 0.27
6562 6564 Dora the Explorer: Game Boy Advance Video Volu... GBA 2004.0 Misc NaN 0.18 0.07 0.00 0.00 0.26
6648 6650 Cartoon Network Collection: Game Boy Advance V... GBA 2004.0 Misc NaN 0.18 0.07 0.00 0.00 0.25
6849 6851 All Grown Up!: Game Boy Advance Video Volume 1 GBA 2004.0 Misc NaN 0.17 0.06 0.00 0.00 0.24
7208 7210 Nicktoons Collection: Game Boy Advance Video V... GBA 2004.0 Misc NaN 0.16 0.06 0.00 0.00 0.22
7351 7353 Yu Yu Hakusho: Dark Tournament PS2 NaN Fighting NaN 0.10 0.08 0.00 0.03 0.21
7470 7472 SpongeBob SquarePants: Game Boy Advance Video ... GBA 2004.0 Misc NaN 0.15 0.05 0.00 0.00 0.21
7953 7955 Thomas the Tank Engine & Friends GBA 2004.0 Adventure NaN 0.13 0.05 0.00 0.00 0.19
8330 8332 Dragon Ball GT: Game Boy Advance Video Volume 1 GBA 2004.0 Misc NaN 0.12 0.05 0.00 0.00 0.17
8341 8343 Codename: Kids Next Door: Game Boy Advance Vid... GBA 2004.0 Misc NaN 0.12 0.05 0.00 0.00 0.17
8368 8370 Teenage Mutant Ninja Turtles: Game Boy Advance... GBA 2004.0 Misc NaN 0.12 0.04 0.00 0.00 0.17
8503 8505 Stronghold 3 PC 2011.0 Strategy NaN 0.06 0.10 0.00 0.00 0.16
8770 8772 Cartoon Network Collection: Game Boy Advance V... GBA 2005.0 Misc NaN 0.11 0.04 0.00 0.00 0.15
8848 8850 Pokémon: Johto Photo Finish: Game Boy Advance ... GBA 2004.0 Misc NaN 0.11 0.04 0.00 0.00 0.15
8896 8898 Strawberry Shortcake: Game Boy Advance Video V... GBA 2004.0 Misc NaN 0.11 0.04 0.00 0.00 0.15
9517 9519 Farming Simulator 2011 PC 2010.0 Simulation NaN 0.00 0.13 0.00 0.00 0.13
9749 9751 Super Robot Wars OG Saga: Masou Kishin II - Re... PSP NaN Strategy NaN 0.00 0.00 0.12 0.00 0.12
10382 10384 Disney Channel Collection Vol. 1 GBA 2004.0 Misc NaN 0.08 0.03 0.00 0.00 0.11
10494 10496 Atsumare! Power Pro Kun no DS Koushien DS NaN Sports NaN 0.00 0.00 0.10 0.00 0.10
11076 11078 Action Man-Operation Extreme PS NaN Action NaN 0.05 0.03 0.00 0.01 0.09
11526 11528 Cartoon Network Collection: Game Boy Advance V... GBA 2004.0 Misc NaN 0.06 0.02 0.00 0.00 0.08
12487 12489 Chou Soujuu Mecha MG DS NaN Simulation NaN 0.00 0.00 0.06 0.00 0.06
12517 12519 Prinny: Can I Really Be The Hero? (US sales) PSP NaN Action NaN 0.06 0.00 0.00 0.00 0.06
13278 13280 Monster Hunter Frontier Online PS3 NaN Role-Playing NaN 0.00 0.00 0.05 0.00 0.05
13672 13674 B.L.U.E.: Legend of Water PS NaN Adventure NaN 0.00 0.00 0.04 0.00 0.04
13962 13964 World of Tanks X360 NaN Shooter NaN 0.00 0.03 0.00 0.00 0.04
14087 14089 Housekeeping DS NaN Action NaN 0.00 0.00 0.04 0.00 0.04
14296 14299 Bikkuriman Daijiten DS NaN Misc NaN 0.00 0.00 0.03 0.00 0.03
14311 14314 Silverlicious DS 2012.0 Action NaN 0.03 0.00 0.00 0.00 0.03
14698 14701 UK Truck Simulator PC 2010.0 Simulation NaN 0.00 0.03 0.00 0.00 0.03
14942 14945 Umineko no Naku Koro ni San: Shinjitsu to Gens... PS3 NaN Adventure NaN 0.00 0.00 0.02 0.00 0.02
15056 15059 Xia-Xia DS 2012.0 Platform NaN 0.00 0.02 0.00 0.00 0.02
15261 15264 Mario Tennis 3DS NaN Sports NaN 0.00 0.00 0.02 0.00 0.02
15325 15328 Nicktoons Collection: Game Boy Advance Video V... GBA 2005.0 Misc NaN 0.01 0.01 0.00 0.00 0.02
15353 15356 Demolition Company: Gold Edition PC 2011.0 Simulation NaN 0.00 0.02 0.00 0.00 0.02
15788 15791 Moshi, Kono Sekai ni Kami-sama ga Iru to suru ... PSV 2016.0 Adventure NaN 0.00 0.00 0.02 0.00 0.02
15915 15918 Dream Dancer DS NaN Misc NaN 0.01 0.00 0.00 0.00 0.02
16191 16194 Homeworld Remastered Collection PC NaN Strategy NaN 0.00 0.01 0.00 0.00 0.01
16198 16201 AKB1/48: Idol to Guam de Koishitara... X360 NaN Misc NaN 0.00 0.00 0.01 0.00 0.01
16208 16211 Super Robot Monkey Team: Game Boy Advance Vide... GBA 2005.0 Misc NaN 0.01 0.00 0.00 0.00 0.01
16229 16232 Brothers in Arms: Furious 4 X360 NaN Shooter NaN 0.01 0.00 0.00 0.00 0.01
16367 16370 Dance with Devils PSV 2016.0 Action NaN 0.00 0.00 0.01 0.00 0.01
16494 16497 Legends of Oz: Dorothy's Return 3DS 2014.0 Puzzle NaN 0.00 0.01 0.00 0.00 0.01
16543 16546 Driving Simulator 2011 PC 2011.0 Racing NaN 0.00 0.01 0.00 0.00 0.01
16553 16556 Bound By Flame X360 2014.0 Role-Playing NaN 0.00 0.01 0.00 0.00 0.01
df.dropna(inplace=True)
df.year =df.year.astype(int)
df.shape
(16291, 11)
df[df.duplicated()].count()
rank            0
name            0
platform        0
year            0
genre           0
publisher       0
na_sales        0
eu_sales        0
jp_sales        0
other_sales     0
global_sales    0
dtype: int64
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 16291 entries, 0 to 16597
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   rank          16291 non-null  int64  
 1   name          16291 non-null  object 
 2   platform      16291 non-null  object 
 3   year          16291 non-null  int64  
 4   genre         16291 non-null  object 
 5   publisher     16291 non-null  object 
 6   na_sales      16291 non-null  float64
 7   eu_sales      16291 non-null  float64
 8   jp_sales      16291 non-null  float64
 9   other_sales   16291 non-null  float64
 10  global_sales  16291 non-null  float64
dtypes: float64(5), int64(2), object(4)
memory usage: 1.5+ MB
df.head()
rank name platform year genre publisher na_sales eu_sales jp_sales other_sales global_sales
0 1 Wii Sports Wii 2006 Sports Nintendo 41.49 29.02 3.77 8.46 82.74
1 2 Super Mario Bros. NES 1985 Platform Nintendo 29.08 3.58 6.81 0.77 40.24
2 3 Mario Kart Wii Wii 2008 Racing Nintendo 15.85 12.88 3.79 3.31 35.82
3 4 Wii Sports Resort Wii 2009 Sports Nintendo 15.75 11.01 3.28 2.96 33.00
4 5 Pokemon Red/Pokemon Blue GB 1996 Role-Playing Nintendo 11.27 8.89 10.22 1.00 31.37

df.describe()
rank year na_sales eu_sales jp_sales other_sales global_sales
count 16291.000000 16291.000000 16291.000000 16291.000000 16291.000000 16291.000000 16291.000000
mean 8290.190228 2006.405561 0.265647 0.147731 0.078833 0.048426 0.540910
std 4792.654450 5.832412 0.822432 0.509303 0.311879 0.190083 1.567345
min 1.000000 1980.000000 0.000000 0.000000 0.000000 0.000000 0.010000
25% 4132.500000 2003.000000 0.000000 0.000000 0.000000 0.000000 0.060000
50% 8292.000000 2007.000000 0.080000 0.020000 0.000000 0.010000 0.170000
75% 12439.500000 2010.000000 0.240000 0.110000 0.040000 0.040000 0.480000
max 16600.000000 2020.000000 41.490000 29.020000 10.220000 10.570000 82.740000
df_top_games = df[['name', 'na_sales', 'eu_sales', 'jp_sales', 'other_sales']]
for col in ['na_sales', 'eu_sales', 'jp_sales', 'other_sales']:
  print(col)
  print(df_top_games[df_top_games[col]==df_top_games[col].max()])
na_sales
         name  na_sales  eu_sales  jp_sales  other_sales
0  Wii Sports     41.49     29.02      3.77         8.46
eu_sales
         name  na_sales  eu_sales  jp_sales  other_sales
0  Wii Sports     41.49     29.02      3.77         8.46
jp_sales
                       name  na_sales  eu_sales  jp_sales  other_sales
4  Pokemon Red/Pokemon Blue     11.27      8.89     10.22          1.0
other_sales
                             name  na_sales  eu_sales  jp_sales  other_sales
17  Grand Theft Auto: San Andreas      9.43       0.4      0.41        10.57

O Jogo mais vendido nas regiões foram Wii Sports, Pokemon, GTA

for col in ['na_sales', 'eu_sales', 'jp_sales', 'other_sales']:
  df_plot = df_top_games.sort_values(by=col, ascending=False).head(5)
  df_plot[['name',col]].set_index('name').plot.bar(rot=90)
  plt.title(f'Sales in {col}')
df.head()
rank name platform year genre publisher na_sales eu_sales jp_sales other_sales global_sales
0 1 Wii Sports Wii 2006 Sports Nintendo 41.49 29.02 3.77 8.46 82.74
1 2 Super Mario Bros. NES 1985 Platform Nintendo 29.08 3.58 6.81 0.77 40.24
2 3 Mario Kart Wii Wii 2008 Racing Nintendo 15.85 12.88 3.79 3.31 35.82
3 4 Wii Sports Resort Wii 2009 Sports Nintendo 15.75 11.01 3.28 2.96 33.00
4 5 Pokemon Red/Pokemon Blue GB 1996 Role-Playing Nintendo 11.27 8.89 10.22 1.00 31.37
df_genre = df[['name','genre', 'global_sales']]
df_genre.groupby('genre').agg({'name':'first', 'global_sales':'max'})
name global_sales
genre
Action Grand Theft Auto V 21.40
Adventure Super Mario Land 2: 6 Golden Coins 11.18
Fighting Super Smash Bros. Brawl 13.04
Misc Wii Play 29.02
Platform Super Mario Bros. 40.24
Puzzle Tetris 30.26
Racing Mario Kart Wii 35.82
Role-Playing Pokemon Red/Pokemon Blue 31.37
Shooter Duck Hunt 28.31
Simulation Nintendogs 24.76
Sports Wii Sports 82.74
Strategy Pokemon Stadium 5.45
df_genre.genre.unique()
array(['Sports', 'Platform', 'Racing', 'Role-Playing', 'Puzzle', 'Misc',
       'Shooter', 'Simulation', 'Action', 'Fighting', 'Adventure',
       'Strategy'], dtype=object)

df_plat = df[['name','platform', 'global_sales']]
df_plat.groupby('platform').agg({'name':'first', 'global_sales':'max'}).sort_values(by='global_sales', ascending=False)
name global_sales
platform
Wii Wii Sports 82.74
NES Super Mario Bros. 40.24
GB Pokemon Red/Pokemon Blue 31.37
DS New Super Mario Bros. 30.01
X360 Kinect Adventures! 21.82
PS3 Grand Theft Auto V 21.40
PS2 Grand Theft Auto: San Andreas 20.81
SNES Super Mario World 20.61
GBA Pokemon Ruby/Pokemon Sapphire 15.85
3DS Pokemon X/Pokemon Y 14.35
PS4 Call of Duty: Black Ops 3 14.24
N64 Super Mario 64 11.89
PS Gran Turismo 10.95
XB Halo 2 8.49
PC The Sims 3 8.11
2600 Pac-Man 7.81
PSP Grand Theft Auto: Liberty City Stories 7.72
XOne Call of Duty: Black Ops 3 7.30
GC Super Smash Bros. Melee 7.07
WiiU Mario Kart 8 6.96
GEN Sonic the Hedgehog 2 6.03
DC Sonic Adventure 2.42
PSV Minecraft 2.25
SAT Virtua Fighter 2 1.93
SCD Sonic CD 1.50
WS Final Fantasy 0.51
NG Samurai Shodown II 0.25
TG16 Doukyuusei 0.14
3DO Policenauts 0.06
GG Sonic the Hedgehog 2 (8-bit) 0.04
PCFX Blue Breaker: Ken Yorimo Hohoemi o 0.03

df_unique_game_by_plat = df.groupby('name').agg({'platform':'nunique'})
df_unique_game_by_plat = df_unique_game_by_plat[df_unique_game_by_plat.platform==1].reset_index()
df_exclusive = df.merge(df_unique_game_by_plat, on='name', how='left')
df_exclusive.rename(columns={'platform_y':'is_exclusive'}, inplace=True)
df_exclusive.is_exclusive = df_exclusive.is_exclusive.fillna(0)
df_exclusive.groupby(['year','is_exclusive']).sum()[['global_sales']].reset_index().head()
year is_exclusive global_sales
0 1980 0.0 9.38
1 1980 1.0 2.00
2 1981 0.0 5.95
3 1981 1.0 29.82
4 1982 0.0 12.78
df_exclusive.groupby(['year','is_exclusive']).sum()[['global_sales']].reset_index().pivot('year','is_exclusive','global_sales').plot(figsize=(15,10))
plt.title('Vendas Exclusivos x Não Exclusivos por Ano')
plt.legend(['Não Exclusivo','Exclusivo'])
plt.show()
df.head()
rank name platform year genre publisher na_sales eu_sales jp_sales other_sales global_sales
0 1 Wii Sports Wii 2006 Sports Nintendo 41.49 29.02 3.77 8.46 82.74
1 2 Super Mario Bros. NES 1985 Platform Nintendo 29.08 3.58 6.81 0.77 40.24
2 3 Mario Kart Wii Wii 2008 Racing Nintendo 15.85 12.88 3.79 3.31 35.82
3 4 Wii Sports Resort Wii 2009 Sports Nintendo 15.75 11.01 3.28 2.96 33.00
4 5 Pokemon Red/Pokemon Blue GB 1996 Role-Playing Nintendo 11.27 8.89 10.22 1.00 31.37
df_publisher = df.groupby('publisher').agg({'global_sales':'sum', 'name':'nunique'})
df_publisher.sort_values('global_sales', ascending=False)['global_sales'].head().plot.bar()
plt.title('Top 5 Publisher in Global Sales')
plt.show()
df_publisher.sort_values('name', ascending=False)['name'].head().plot.bar()
plt.title('Top 5 Publisher Number of Games')
plt.show()
df['sales_without_na'] = df['jp_sales']+df.eu_sales+df.other_sales
df.groupby('year').agg({'sales_without_na':'sum', 'na_sales':'sum', 'jp_sales':'sum'}).plot(figsize=(15,10))
<matplotlib.axes._subplots.AxesSubplot at 0x7f7dab4b76d0>
corr = df[['na_sales','sales_without_na', 'eu_sales','jp_sales','global_sales']].corr()
corr
na_sales sales_without_na eu_sales jp_sales global_sales
na_sales 1.000000 0.776859 0.768923 0.451283 0.941269
sales_without_na 0.776859 1.000000 0.932092 0.701177 0.943837
eu_sales 0.768923 0.932092 1.000000 0.436379 0.903264
jp_sales 0.451283 0.701177 0.436379 1.000000 0.612774
global_sales 0.941269 0.943837 0.903264 0.612774 1.000000
sns.heatmap(corr)
plt.title('Correlation')
plt.show()
df.head()
rank name platform year genre publisher na_sales eu_sales jp_sales other_sales global_sales sales_without_na
0 1 Wii Sports Wii 2006 Sports Nintendo 41.49 29.02 3.77 8.46 82.74 41.25
1 2 Super Mario Bros. NES 1985 Platform Nintendo 29.08 3.58 6.81 0.77 40.24 11.16
2 3 Mario Kart Wii Wii 2008 Racing Nintendo 15.85 12.88 3.79 3.31 35.82 19.98
3 4 Wii Sports Resort Wii 2009 Sports Nintendo 15.75 11.01 3.28 2.96 33.00 17.25
4 5 Pokemon Red/Pokemon Blue GB 1996 Role-Playing Nintendo 11.27 8.89 10.22 1.00 31.37 20.11
df_genre_by_region = df.groupby('genre').sum()
df_genre_by_region.head()
rank year na_sales eu_sales jp_sales other_sales global_sales sales_without_na
genre
Action 25955792 6527703 861.77 516.48 158.65 184.92 1722.84 860.05
Adventure 14704318 2558355 101.93 63.74 51.99 16.70 234.59 132.43
Fighting 6371780 1675871 220.74 100.00 87.15 36.19 444.05 223.34
Misc 14445141 3384308 396.92 211.77 106.67 73.92 789.87 392.36
Platform 6019939 1753335 445.99 200.65 130.65 51.51 829.13 382.81
df_genre_by_region[['na_sales','eu_sales', 'jp_sales', 'other_sales']].plot.barh(figsize=(15,10))
plt.show()

Observação

Trechos comentados não rodaram no collab devido ao custo computacional, rodar localmente para comparar distancia entre titulos de jogos.

df.name = df.name.str.lower()
df.name = df.name.str.replace(' ', '_')
!pip install unidecode
Collecting unidecode
  Downloading Unidecode-1.3.2-py3-none-any.whl (235 kB)
     |████████████████████████████████| 235 kB 5.0 MB/s 
Installing collected packages: unidecode
Successfully installed unidecode-1.3.2
import unidecode
df.name = df.name.apply(unidecode.unidecode)
df_1 = df[['name']]
df_2 = df[['name']]
df_1['key'] = 0
df_2['key'] = 0
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
## df_matrix = df_2.merge(df_1, how='outer', on='key', validate='many_to_many', suffixes=('x_','y_'))
# JACCARD 
# def minhash(input_question, compare_question):
#     score = 0.0
#     shingles = lambda s: set(s[i:i+3] for i in range(len(s)-2))
#     jaccard_distance = lambda seta, setb: len(seta & setb)/float(len(seta | setb))
#     try:
#         score = jaccard_distance(shingles(input_question), shingles(compare_question))
#     except ZeroDivisionError:
#         print('ZeroDivisionError')

#     return score

# df['score'] = df.apply(lambda x: minhash(x.x_name, y_name))
for col in ['na_sales', 'eu_sales', 'jp_sales', 'other_sales']:
  print(col)
  print(df[df[col]==df[col].min()].head(1))
na_sales
     rank                      name  ... global_sales  sales_without_na
214   215  monster_hunter_freedom_3  ...         4.87              4.87

[1 rows x 12 columns]
eu_sales
     rank               name  ... global_sales  sales_without_na
147   148  final_fantasy_xii  ...         5.95              4.07

[1 rows x 12 columns]
jp_sales
    rank          name platform  ...  other_sales global_sales sales_without_na
60    61  just_dance_3      Wii  ...         1.07        10.26             4.22

[1 rows x 12 columns]
other_sales
     rank               name  ... global_sales  sales_without_na
137   138  world_of_warcraft  ...         6.28              6.21

[1 rows x 12 columns]

Conclusões

  • Existem jogos que venderam mais em plataformas mais difundidas
  • plataformas que venderam aparelhos junto com o jogo atrapalham na contagem
  • Jogos de estratégia não vendem bem fora do Japão
  • O mercado Norte Americano é sempre o que mais compra em todas as categorias. Em especial, os de ação.