Pré requisitos:

O resultado fica bem fiel. Para usar, necessita de:

Notebook do Colab aberto
Noções de Python
Conexão com a interet
Url da API que deseja usar

1 - Importando as bibliotecas:

Duas blibliotecas são necessárias aqui. Pandas e Numpy.

import pandas as pd
import numpy as np

Aqui eu abri o csv obtido no Kaggle e rankeio de acordo com a popularidade das músicas.

df = pd.read_csv('spotify.csv', index_col=0)
df.sort_values('song_popularity', ascending=False, inplace=True)
df.head(5)

2 - Inspeção Dataset:

print(df.shape)

(18835, 15)

df.head(5)

## Doc - https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.info.html
## Verificando Tipo de Dados e Valores Não Nulos
## Inicialmente não possuimos dados nulo
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18835 entries, 1757 to 9956
Data columns (total 15 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   song_name         18835 non-null  object 
 1   song_popularity   18835 non-null  object 
 2   song_duration_ms  18835 non-null  object 
 3   acousticness      18835 non-null  object 
 4   danceability      18835 non-null  object 
 5   energy            18835 non-null  object 
 6   instrumentalness  18835 non-null  object 
 7   key               18835 non-null  float64
 8   liveness          18835 non-null  object 
 9   loudness          18835 non-null  object 
 10  audio_mode        18835 non-null  object 
 11  speechiness       18835 non-null  object 
 12  tempo             18835 non-null  object 
 13  time_signature    18835 non-null  object 
 14  audio_valence     18834 non-null  float64
dtypes: float64(2), object(13)
memory usage: 2.3+ MB

## Doc - https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html?highlight=describe#pandas.DataFrame.describe
## Aqui observamos apenas duas colunas pois os formatos das outras esta como Object e assim ele não consegue calcular as agregações necessárias.
df.describe()

3 - Removendo duplicadas:

duplicados = df[df.duplicated()]
print(duplicados)

                               song_name  ... audio_valence
11777             I Love It (& Lil Pump)  ...         0.329
4301              I Love It (& Lil Pump)  ...         0.329
14444             I Love It (& Lil Pump)  ...         0.329
1229              I Love It (& Lil Pump)  ...         0.329
3443              I Love It (& Lil Pump)  ...         0.329
...                                  ...  ...           ...
14292  Get Dripped (feat. Playboi Carti)  ...         0.904
7273                       John Madden 2  ...         0.409
6514                        THIS OLE BOY  ...         0.764
14312    Transformer (feat. Nicki Minaj)  ...         0.287
7275                     Prince Charming  ...         0.605

[3903 rows x 15 columns]

## Exemplo de uso em um cenário onde vc pode ter diversos valores iguais mas a combinação que não pode se repetir é em duas chaves especificas.
print(df[df.duplicated(subset=['song_name','audio_valence'])])

                             song_name  ... audio_valence
11777           I Love It (& Lil Pump)  ...        0.3290
4301            I Love It (& Lil Pump)  ...        0.3290
14444           I Love It (& Lil Pump)  ...        0.3290
1229            I Love It (& Lil Pump)  ...        0.3290
3443            I Love It (& Lil Pump)  ...        0.3290
...                                ...  ...           ...
7273                     John Madden 2  ...        0.4090
6514                      THIS OLE BOY  ...        0.7640
14312  Transformer (feat. Nicki Minaj)  ...        0.2870
7275                   Prince Charming  ...        0.6050
7939                           99 Pace  ...        0.0689

[4161 rows x 15 columns]

## Doc - https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html
df.drop_duplicates(inplace=True) 
print(df.shape)
df.head(5)

(14932, 15)

4 - Validando consistência:

Como vimos anteriormente temos campos que seriam númericos porém possuem texto e um texto que não condiz com o nome da coluna, aqui temos métricas de kg e mol/L

def remove_text (df, columns, text):
    for col in columns:
        df[col] = df[col].str.strip(text)

remove_text(df, ['acousticness', 'danceability'], 'mol/L')
remove_text(df, ['song_duration_ms', 'acousticness'], 'kg')

df.head(5)

5 - Transformações DataType:

## Doc - https://pandas.pydata.org/docs/reference/api/pandas.Series.astype.html?highlight=astype#pandas.Series.astype
def to_type(df, columns, type):
    for col in columns:
        print(col)
        df[col] = df[col].astype(type)

numerical_cols = ['song_duration_ms', 'acousticness', 'danceability',
                  'energy', 'instrumentalness', 'liveness', 'loudness',
                  'speechiness', 'tempo', 'audio_valence']
 
categorical_cols = ['song_popularity', 'key', 'audio_mode', 'time_signature']

to_type(df, numerical_cols, 'float')
to_type(df, categorical_cols, 'category')

song_duration_ms
acousticness
danceability
energy

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-8fa9f2911c24> in <module>()
     12 categorical_cols = ['song_popularity', 'key', 'audio_mode', 'time_signature']
     13 
---> 14 to_type(df, numerical_cols, 'float')
     15 to_type(df, categorical_cols, 'category')

<ipython-input-15-8fa9f2911c24> in to_type(df, columns, type)
      4     for col in columns:
      5         print(col)
----> 6         df[col] = df[col].astype(type)
      7 
      8 numerical_cols = ['song_duration_ms', 'acousticness', 'danceability',

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
   5546         else:
   5547             # else, only a single dtype is given
-> 5548             new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,)
   5549             return self._constructor(new_data).__finalize__(self, method="astype")
   5550 

/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
    602         self, dtype, copy: bool = False, errors: str = "raise"
    603     ) -> "BlockManager":
--> 604         return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
    605 
    606     def convert(

/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py in apply(self, f, align_keys, **kwargs)
    407                 applied = b.apply(f, **kwargs)
    408             else:
--> 409                 applied = getattr(b, f)(**kwargs)
    410             result_blocks = _extend_blocks(applied, result_blocks)
    411 

/usr/local/lib/python3.7/dist-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
    593             vals1d = values.ravel()
    594             try:
--> 595                 values = astype_nansafe(vals1d, dtype, copy=True)
    596             except (ValueError, TypeError):
    597                 # e.g. astype_nansafe can fail on object-dtype of strings

/usr/local/lib/python3.7/dist-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
    995     if copy or is_object_dtype(arr) or is_object_dtype(dtype):
    996         # Explicit copy, or required since NumPy can't view from / to object.
--> 997         return arr.astype(dtype, copy=True)
    998 
    999     return arr.view(dtype)

ValueError: could not convert string to float: 'nao_sei'

df = df.replace(['nao_sei'], np.nan)

to_type(df, numerical_cols, 'float')
to_type(df, categorical_cols, 'category')

song_duration_ms
acousticness
danceability
energy
instrumentalness
liveness
loudness
speechiness

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-60fe30a932ff> in <module>()
----> 1 to_type(df, numerical_cols, 'float')
      2 to_type(df, categorical_cols, 'category')

<ipython-input-15-8fa9f2911c24> in to_type(df, columns, type)
      4     for col in columns:
      5         print(col)
----> 6         df[col] = df[col].astype(type)
      7 
      8 numerical_cols = ['song_duration_ms', 'acousticness', 'danceability',

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
   5546         else:
   5547             # else, only a single dtype is given
-> 5548             new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,)
   5549             return self._constructor(new_data).__finalize__(self, method="astype")
   5550 

/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
    602         self, dtype, copy: bool = False, errors: str = "raise"
    603     ) -> "BlockManager":
--> 604         return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
    605 
    606     def convert(

/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py in apply(self, f, align_keys, **kwargs)
    407                 applied = b.apply(f, **kwargs)
    408             else:
--> 409                 applied = getattr(b, f)(**kwargs)
    410             result_blocks = _extend_blocks(applied, result_blocks)
    411 

/usr/local/lib/python3.7/dist-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
    593             vals1d = values.ravel()
    594             try:
--> 595                 values = astype_nansafe(vals1d, dtype, copy=True)
    596             except (ValueError, TypeError):
    597                 # e.g. astype_nansafe can fail on object-dtype of strings

/usr/local/lib/python3.7/dist-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
    995     if copy or is_object_dtype(arr) or is_object_dtype(dtype):
    996         # Explicit copy, or required since NumPy can't view from / to object.
--> 997         return arr.astype(dtype, copy=True)
    998 
    999     return arr.view(dtype)

ValueError: could not convert string to float: '0.nao_sei'

df['speechiness'] = df['speechiness'].replace(['0.nao_sei'], np.nan)

to_type(df, numerical_cols, 'float')
to_type(df, categorical_cols, 'category')

song_duration_ms
acousticness
danceability
energy
instrumentalness
liveness
loudness
speechiness
tempo
audio_valence
song_popularity
key
audio_mode
time_signature

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14932 entries, 1757 to 9956
Data columns (total 15 columns):
 #   Column            Non-Null Count  Dtype   
---  ------            --------------  -----   
 0   song_name         14932 non-null  object  
 1   song_popularity   14931 non-null  category
 2   song_duration_ms  14932 non-null  float64 
 3   acousticness      14932 non-null  float64 
 4   danceability      14932 non-null  float64 
 5   energy            14931 non-null  float64 
 6   instrumentalness  14930 non-null  float64 
 7   key               14932 non-null  category
 8   liveness          14928 non-null  float64 
 9   loudness          14931 non-null  float64 
 10  audio_mode        14931 non-null  category
 11  speechiness       14931 non-null  float64 
 12  tempo             14931 non-null  float64 
 13  time_signature    14931 non-null  category
 14  audio_valence     14931 non-null  float64 
dtypes: category(4), float64(10), object(1)
memory usage: 1.4+ MB

## Uma forma de validação é verificar a quantidade de elementos em cada uma das categorias. 
for col in categorical_cols:
  print(f'{col}')
  print(df[col].value_counts().sort_values())

song_popularity
99       1
100      1
98       4
97       4
96       5
      ... 
54     324
53     325
55     345
58     347
52     355
Name: song_popularity, Length: 101, dtype: int64
key
0.177       1
3.0       433
10.0     1045
8.0      1047
6.0      1048
4.0      1084
11.0     1223
5.0      1257
2.0      1399
9.0      1410
1.0      1596
7.0      1654
0.0      1735
Name: key, dtype: int64
audio_mode
0.105       1
0        5496
1        9434
Name: audio_mode, dtype: int64
time_signature
2800000000        1
0.7               1
0                 3
1                67
5               195
3               684
4             13980
Name: time_signature, dtype: int64

df['key'] = df['key'].replace([0.177], np.nan)
df['audio_mode'] = df['audio_mode'].replace(['0.105'], np.nan)
df['time_signature'] = df['time_signature'].replace(['0.7', '2800000000'], np.nan)

A partir de agora, temos um dataset com o minimo de consistencia e sem valores duplicados

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14932 entries, 1757 to 9956
Data columns (total 15 columns):
 #   Column            Non-Null Count  Dtype   
---  ------            --------------  -----   
 0   song_name         14932 non-null  object  
 1   song_popularity   14931 non-null  category
 2   song_duration_ms  14932 non-null  float64 
 3   acousticness      14932 non-null  float64 
 4   danceability      14932 non-null  float64 
 5   energy            14931 non-null  float64 
 6   instrumentalness  14930 non-null  float64 
 7   key               14931 non-null  category
 8   liveness          14928 non-null  float64 
 9   loudness          14931 non-null  float64 
 10  audio_mode        14930 non-null  category
 11  speechiness       14931 non-null  float64 
 12  tempo             14931 non-null  float64 
 13  time_signature    14929 non-null  category
 14  audio_valence     14931 non-null  float64 
dtypes: category(4), float64(10), object(1)
memory usage: 1.4+ MB

df.isna().sum()

song_name           0
song_popularity     1
song_duration_ms    0
acousticness        0
danceability        0
energy              1
instrumentalness    2
key                 1
liveness            4
loudness            1
audio_mode          2
speechiness         1
tempo               1
time_signature      3
audio_valence       1
dtype: int64

df[df[numerical_cols]<0].count()

song_name               0
song_popularity         0
song_duration_ms        1
acousticness            0
danceability            0
energy                  0
instrumentalness        0
key                     0
liveness                1
loudness            14923
audio_mode              0
speechiness             0
tempo                   0
time_signature          0
audio_valence           0
dtype: int64

6 - Remoção de Colunas:

Algumas colunas podem ser consideradas desnecessárias para nossa análise, isso porque elas não nos passam informações relevantes a respeito do que queremos descobrir, ou até mesmo porque possuem tantos dados faltantes que mais atrapalham do que ajudam. Nesses casos uma forma rápida e fácil de solucionar esse problema seria excluí-las.

Aqui eliminaremos apenas uma a nivel de experimentação.

## Doc - https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html
df.drop(['liveness'], axis=1)

df.drop(columns=['liveness'], inplace=True)

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-28-9052eefc4426> in <module>()
      1 ## Ou podemos deletar diretamente passando o parametro columns
----> 2 df.drop(columns=['liveness'], inplace=True)

/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   4172             level=level,
   4173             inplace=inplace,
-> 4174             errors=errors,
   4175         )
   4176 

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   3887         for axis, labels in axes.items():
   3888             if labels is not None:
-> 3889                 obj = obj._drop_axis(labels, axis, level=level, errors=errors)
   3890 
   3891         if inplace:

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in _drop_axis(self, labels, axis, level, errors)
   3921                 new_axis = axis.drop(labels, level=level, errors=errors)
   3922             else:
-> 3923                 new_axis = axis.drop(labels, errors=errors)
   3924             result = self.reindex(**{axis_name: new_axis})
   3925 

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in drop(self, labels, errors)
   5285         if mask.any():
   5286             if errors != "ignore":
-> 5287                 raise KeyError(f"{labels[mask]} not found in axis")
   5288             indexer = indexer[~mask]
   5289         return self.delete(indexer)

KeyError: "['liveness'] not found in axis"

7 - Dados faltantes Missing Values:

Em algumas situações, podemos ter muitas informações incompletas no nosso df. Essas informações faltantes podem prejudicar nossa análise e outras etapas que dependem dela e do pré-processamento, portanto, precisamos removê-los ou substituir esses valores por outros. O fluxo a seguir pode auxiliar na decisão e trazer sugestões de como tratar cada caso.

alt text

Para dados que não são séries temporais, nossa primeira opção é substitui-los pela média da coluna, entretanto, às vezes, a média pode ter sido afetada pelos valores destoantes da coluna (outliers), então podemos substituir também pela moda ou mediana.

Podemos fazer isso com a função .fillna que preenche todos os campos com dados ausentes. Vamos criar alguns loops como exemplo. O primeiro passa por algumas colunas e substitui os valores faltantes pela moda:

df.isna().sum()

song_name           0
song_popularity     1
song_duration_ms    0
acousticness        0
danceability        0
energy              1
instrumentalness    2
key                 1
loudness            1
audio_mode          2
speechiness         1
tempo               1
time_signature      3
audio_valence       1
dtype: int64

## Doc .fillna - https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html

for column in ['acousticness', 'speechiness']:
    df[column].fillna(df[column].mode()[0], inplace=True)

for column in ['song_duration_ms',  'danceability', 'energy', 
                'loudness', 'audio_valence']:
    df[column].fillna(df[column].median(), inplace=True)

df.isna().sum()

song_name           0
song_popularity     1
song_duration_ms    0
acousticness        0
danceability        0
energy              0
instrumentalness    2
key                 1
loudness            0
audio_mode          2
speechiness         0
tempo               1
time_signature      3
audio_valence       0
dtype: int64

## Doc - https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html?highlight=dropna#pandas.DataFrame.dropna
df.dropna(inplace=True)

df.isna().sum()

song_name           0
song_popularity     0
song_duration_ms    0
acousticness        0
danceability        0
energy              0
instrumentalness    0
key                 0
loudness            0
audio_mode          0
speechiness         0
tempo               0
time_signature      0
audio_valence       0
dtype: int64

Conclusão

Ao final temos o dataset pronto para a análise exploratória, aqui ainda não tratamos outliers pois dependendo do cenário podemos fazer uso deles.

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14925 entries, 7574 to 9956
Data columns (total 14 columns):
 #   Column            Non-Null Count  Dtype   
---  ------            --------------  -----   
 0   song_name         14925 non-null  object  
 1   song_popularity   14925 non-null  category
 2   song_duration_ms  14925 non-null  float64 
 3   acousticness      14925 non-null  float64 
 4   danceability      14925 non-null  float64 
 5   energy            14925 non-null  float64 
 6   instrumentalness  14925 non-null  float64 
 7   key               14925 non-null  category
 8   loudness          14925 non-null  float64 
 9   audio_mode        14925 non-null  category
 10  speechiness       14925 non-null  float64 
 11  tempo             14925 non-null  float64 
 12  time_signature    14925 non-null  category
 13  audio_valence     14925 non-null  float64 
dtypes: category(4), float64(9), object(1)
memory usage: 1.3+ MB

df.head()

	song_name	song_popularity	song_duration_ms	acousticness	danceability	energy	instrumentalness	key	liveness	loudness	audio_mode	speechiness	tempo	time_signature	audio_valence
1757	Party In The U.S.A.	nao_sei	0.8220000000000001kg	0.519mol/L	0.36	0.0	10	0.177	-8.575	0	0.105	97.42	4	0.7	NaN
7574	I Love It (& Lil Pump)	99	127946	0.0114kg	0.901mol/L	0.522	0.0	2.000	0.259	-8.304	1	0.33	104.053	4	0.329
11777	I Love It (& Lil Pump)	99	127946	0.0114kg	0.901mol/L	0.522	0.0	2.000	0.259	-8.304	1	0.33	104.053	4	0.329
4301	I Love It (& Lil Pump)	99	127946	0.0114kg	0.901mol/L	0.522	0.0	2.000	0.259	-8.304	1	0.33	104.053	4	0.329
14444	I Love It (& Lil Pump)	99	127946	0.0114kg	0.901mol/L	0.522	0.0	2.000	0.259	-8.304	1	0.33	104.053	4	0.329

	song_name	song_popularity	song_duration_ms	acousticness	danceability	energy	instrumentalness	key	liveness	loudness	audio_mode	speechiness	tempo	time_signature	audio_valence
1757	Party In The U.S.A.	nao_sei	0.8220000000000001kg	0.519mol/L	0.36	0.0	10	0.177	-8.575	0	0.105	97.42	4	0.7	NaN
7574	I Love It (& Lil Pump)	99	127946	0.0114kg	0.901mol/L	0.522	0.0	2.000	0.259	-8.304	1	0.33	104.053	4	0.329
11777	I Love It (& Lil Pump)	99	127946	0.0114kg	0.901mol/L	0.522	0.0	2.000	0.259	-8.304	1	0.33	104.053	4	0.329
4301	I Love It (& Lil Pump)	99	127946	0.0114kg	0.901mol/L	0.522	0.0	2.000	0.259	-8.304	1	0.33	104.053	4	0.329
14444	I Love It (& Lil Pump)	99	127946	0.0114kg	0.901mol/L	0.522	0.0	2.000	0.259	-8.304	1	0.33	104.053	4	0.329

	key	audio_valence
count	18835.000000	18834.000000
mean	5.288674	0.527958
std	3.614624	0.244635
min	0.000000	0.000000
25%	2.000000	0.335000
50%	5.000000	0.526500
75%	8.000000	0.725000
max	11.000000	0.984000

	song_name	song_popularity	song_duration_ms	acousticness	danceability	energy	instrumentalness	key	liveness	loudness	audio_mode	speechiness	tempo	time_signature	audio_valence
1757	Party In The U.S.A.	nao_sei	0.8220000000000001kg	0.519mol/L	0.36	0.0	10	0.177	-8.575	0	0.105	97.42	4	0.7	NaN
7574	I Love It (& Lil Pump)	99	127946	0.0114kg	0.901mol/L	0.522	0.0	2.000	0.259	-8.304	1	0.33	104.053	4	0.329
17588	Taki Taki (with Selena Gomez, Ozuna & Cardi B)	98	212500	0.153kg	0.841mol/L	0.7979999999999999	3.33e-06	1.000	0.0618	-4.206	0	0.229	95.948	4	0.591
17394	Promises (with Sam Smith)	98	213309	0.0119kg	0.7809999999999999mol/L	0.768	4.91e-06	11.000	0.325	-5.9910000000000005	1	0.0394	123.07	4	0.486
12665	Eastside (with Halsey & Khalid)	98	173799	0.555kg	0.56mol/L	0.68	0.0	6.000	0.116	-7.648	0	0.321	89.391	4	0.319

	song_name	song_popularity	song_duration_ms	acousticness	danceability	energy	instrumentalness	key	liveness	loudness	audio_mode	speechiness	tempo	time_signature	audio_valence
1757	Party In The U.S.A.	nao_sei	0.8220000000000001	0.519	0.36	0.0	10	0.177	-8.575	0	0.105	97.42	4	0.7	NaN
7574	I Love It (& Lil Pump)	99	127946	0.0114	0.901	0.522	0.0	2.000	0.259	-8.304	1	0.33	104.053	4	0.329
17588	Taki Taki (with Selena Gomez, Ozuna & Cardi B)	98	212500	0.153	0.841	0.7979999999999999	3.33e-06	1.000	0.0618	-4.206	0	0.229	95.948	4	0.591
17394	Promises (with Sam Smith)	98	213309	0.0119	0.7809999999999999	0.768	4.91e-06	11.000	0.325	-5.9910000000000005	1	0.0394	123.07	4	0.486
12665	Eastside (with Halsey & Khalid)	98	173799	0.555	0.56	0.68	0.0	6.000	0.116	-7.648	0	0.321	89.391	4	0.319

	song_name	song_popularity	song_duration_ms	acousticness	danceability	energy	instrumentalness	key	loudness	audio_mode	speechiness	tempo	time_signature	audio_valence
1757	Party In The U.S.A.	NaN	0.822	0.51900	0.360	0.000	10.000000	NaN	0.000	NaN	97.4200	4.000	NaN	NaN
7574	I Love It (& Lil Pump)	99	127946.000	0.01140	0.901	0.522	0.000000	2.0	-8.304	1	0.3300	104.053	4	0.329
17588	Taki Taki (with Selena Gomez, Ozuna & Cardi B)	98	212500.000	0.15300	0.841	0.798	0.000003	1.0	-4.206	0	0.2290	95.948	4	0.591
17394	Promises (with Sam Smith)	98	213309.000	0.01190	0.781	0.768	0.000005	11.0	-5.991	1	0.0394	123.070	4	0.486
12665	Eastside (with Halsey & Khalid)	98	173799.000	0.55500	0.560	0.680	0.000000	6.0	-7.648	0	0.3210	89.391	4	0.319
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
11278	María	0	161986.000	0.90600	0.843	0.483	0.005230	3.0	-14.776	1	0.0638	141.295	4	0.964
12923	Unfuck The World	0	250213.000	0.00142	0.574	0.831	0.010800	7.0	-5.576	0	0.0325	101.988	4	0.518
11282	Kimbya (feat. Manny Roman)	0	261590.000	0.49600	0.418	0.958	0.058300	7.0	-5.678	1	0.0728	123.639	4	0.676
12905	Mad World	0	174253.000	0.00002	0.298	0.931	0.404000	2.0	-6.185	1	0.1300	135.970	4	0.404
9956	All in My Feelings	0	187123.000	0.51100	0.459	0.476	0.000000	2.0	-5.277	1	0.0467	139.624	4	0.247

	song_name	song_popularity	song_duration_ms	acousticness	danceability	energy	instrumentalness	key	loudness	audio_mode	speechiness	tempo	time_signature	audio_valence
7574	I Love It (& Lil Pump)	99	127946.0	0.0114	0.901	0.522	0.000000	2.0	-8.304	1	0.3300	104.053	4	0.329
17588	Taki Taki (with Selena Gomez, Ozuna & Cardi B)	98	212500.0	0.1530	0.841	0.798	0.000003	1.0	-4.206	0	0.2290	95.948	4	0.591
17394	Promises (with Sam Smith)	98	213309.0	0.0119	0.781	0.768	0.000005	11.0	-5.991	1	0.0394	123.070	4	0.486
12665	Eastside (with Halsey & Khalid)	98	173799.0	0.5550	0.560	0.680	0.000000	6.0	-7.648	0	0.3210	89.391	4	0.319
17618	In My Feelings	98	217925.0	0.0589	0.835	0.626	0.000060	1.0	-5.833	1	0.1250	91.030	4	0.350