From f2cb638be6a711efa605dd578285a95ca09a2ee5 Mon Sep 17 00:00:00 2001 From: Edgard Decena Date: Mon, 23 Sep 2019 06:32:11 -0400 Subject: [PATCH] =?UTF-8?q?Actualizaci=C3=B3n=20de=20estructura.?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- 06_pandas.ipynb | 4105 ++++++++++++++++++ 07_pandas.ipynb | 2660 ------------ 06_statsmodels.ipynb => 07_statsmodels.ipynb | 0 3 files changed, 4105 insertions(+), 2660 deletions(-) create mode 100644 06_pandas.ipynb delete mode 100644 07_pandas.ipynb rename 06_statsmodels.ipynb => 07_statsmodels.ipynb (100%) diff --git a/06_pandas.ipynb b/06_pandas.ipynb new file mode 100644 index 0000000..fa02f37 --- /dev/null +++ b/06_pandas.ipynb @@ -0,0 +1,4105 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![](./imagenes/python_logo.jpeg)\n", + "# Librería Pandas.\n", + "***" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[*Pandas*](https://pandas.pydata.org/pandas-docs/stable/) es sin duda el paquete más importante de *Python* utilizado en la *Ciencia de Datos*. No solo ofrece muchos métodos y funciones que facilitan el trabajo con los datos, sino que además se ha optimizado para la velocidad, lo que le brinda una ventaja significativa en comparación al trabajo con datos numéricos en *Python*.\n", + "\n", + "*Pandas* es una librería que provee estructuras de datos rápidas, flexibles y expresivas; diseñadas para trabajar con *rotulados* y/o *relacionales*. Conceptualmente se puede pensar como *arrays* de *NumPy* donde las filas y columnas están *rótuladas*. Las estructuras de datos de *Pandas* tienen forma similar a hojas de cálculo en *Python*.\n", + "\n", + "Asi como *NumPy*, *Pandas* es una muy buena herramienta para trabajar con números, vectores, álgebra lineal, etc. *Pandas* es adecuado para trabajar con:\n", + "\n", + "* Datos tabulares y heterogéneos (flotantes, string, enteros, etc.)\n", + "* Series temporales.\n", + "* Los mismos datos que se pueden manipular con arreglos de *NumPy*.\n", + "\n", + "*Pandas* no forma parte de la instalación estándar de *Python*, así que debe instalarse por separado. Para instalar *Pandas* debe ejecutarse el siguiente comando desde una cónsola o terminal:\n", + "```bash\n", + "pip install pandas\n", + "```\n", + "Además, para trabajar con archivos de Excel, también deberá ejecutarse:\n", + "```bash\n", + "pip install xlrd\n", + "```\n", + "*Pandas* tiene una documentación muy completa y diversos [tutoriales](http://pandas.pydata.org/pandas-docs/stable/tutorials.html).\n", + "\n", + "En *Pandas* existen tres tipos básicos de objetos todos ellos basados a su vez en *NumPy*:\n", + "\n", + "* *Series* (listas, 1D),\n", + "* *DataFrame* (tablas, 2D) y\n", + "* *Panels* (tablas 3D).\n", + "\n", + "Nosotros vamos a ver el uso básico de los dos primeros tipos de objetos, para un mayor detalle puedes consultar el [manual](http://pandas.pydata.org/pandas-docs/stable/dsintro.html).\n", + "\n", + "En este notebook vamos a usar la abreviación:\n", + "\n", + "* **df**: para cualquier objeto `DataFrame` *Pandas*.\n", + "* **s**: para cualquier objeto de `Series` *Pandas*.\n", + "\n", + "Para comenzar iniciamos importando *Pandas* según la convención:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Importador de datos.\n", + "***" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "csv_data = \"datos/capacitaciones.csv\"\n", + "json_data = \"datos/capacitaciones.json\"\n", + "xlsx_data = \"datos/capacitaciones.xlsx\"\n", + "\n", + "df = pd.read_csv(csv_data, encoding = \"ISO-8859-1\") # De un archivo CSV.\n", + "df = pd.read_json(json_data) # De un archivo JSON.\n", + "df = pd.read_excel(xlsx_data) # De un archivo XLSX." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Existen otros tipos de *importadores* de datos, tales como:\n", + "\n", + "* `pd.read_table(filename)`: importa desde un archivo de texto delimitado (como TSV).\n", + "* `pd.read_sql(query, connection_object)`: importa desde una BaseDeDatos/Tabla SQL.\n", + "* `pd.read_html(url)`: importa desde una URL html, una cadena o un archivo y extrae tablas a una lista.\n", + "* `pd.read_clipboard()`: toma los datos desde el contenido del portapapeles.\n", + "* `pd.DataFrame(dict)`: importa desde un diccionario *Python*." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Exportador de datos.\n", + "***" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Los *exportadores* de datos guardan en disco los datos del *df*:\n", + "\n", + "* `df.to_csv(filename)`: escribe los datos en un archivo CSV.\n", + "* `df.to_excel(filename)`: escribe los datos en un archivo Excel.\n", + "* `df.to_sql(table_name, connection_object)`: escribe los datos en una tabla SQL.\n", + "* `df.to_json(filename)`: escribe los datos en un archivo con formato JSON;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Crear objetos de prueba.\n", + "***" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Crear datos de prueba es útil para probar segmentos de código:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
00.5827850.3882630.4262300.6583710.751270
10.7136510.6953130.0792950.0304520.586475
20.1099220.5371570.6377530.8790460.471591
30.3553670.7849310.9571910.4550440.487740
40.9422780.8562660.0391430.7763210.147819
50.5051110.2538760.9910860.3829610.846470
60.1143460.5810860.7546940.0542460.226384
70.5603940.0340810.4125050.7963240.218537
80.0804040.2658320.2964400.9038310.344192
90.2131730.0492820.1062730.6066790.282871
\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4\n", + "0 0.582785 0.388263 0.426230 0.658371 0.751270\n", + "1 0.713651 0.695313 0.079295 0.030452 0.586475\n", + "2 0.109922 0.537157 0.637753 0.879046 0.471591\n", + "3 0.355367 0.784931 0.957191 0.455044 0.487740\n", + "4 0.942278 0.856266 0.039143 0.776321 0.147819\n", + "5 0.505111 0.253876 0.991086 0.382961 0.846470\n", + "6 0.114346 0.581086 0.754694 0.054246 0.226384\n", + "7 0.560394 0.034081 0.412505 0.796324 0.218537\n", + "8 0.080404 0.265832 0.296440 0.903831 0.344192\n", + "9 0.213173 0.049282 0.106273 0.606679 0.282871" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Genera un dataframe con 5 columnas y 10 filas.\n", + "df = pd.DataFrame(pd.np.random.rand(10, 5))\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 2\n", + "1 7\n", + "2 3\n", + "3 9\n", + "4 5\n", + "dtype: int64" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Crea una series a partir de una lista.\n", + "my_list = [2, 7, 3, 9, 5]\n", + "s = pd.Series(my_list)\n", + "s" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
1900-01-300.5827850.3882630.4262300.6583710.751270
1900-01-310.7136510.6953130.0792950.0304520.586475
1900-02-010.1099220.5371570.6377530.8790460.471591
1900-02-020.3553670.7849310.9571910.4550440.487740
1900-02-030.9422780.8562660.0391430.7763210.147819
1900-02-040.5051110.2538760.9910860.3829610.846470
1900-02-050.1143460.5810860.7546940.0542460.226384
1900-02-060.5603940.0340810.4125050.7963240.218537
1900-02-070.0804040.2658320.2964400.9038310.344192
1900-02-080.2131730.0492820.1062730.6066790.282871
\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4\n", + "1900-01-30 0.582785 0.388263 0.426230 0.658371 0.751270\n", + "1900-01-31 0.713651 0.695313 0.079295 0.030452 0.586475\n", + "1900-02-01 0.109922 0.537157 0.637753 0.879046 0.471591\n", + "1900-02-02 0.355367 0.784931 0.957191 0.455044 0.487740\n", + "1900-02-03 0.942278 0.856266 0.039143 0.776321 0.147819\n", + "1900-02-04 0.505111 0.253876 0.991086 0.382961 0.846470\n", + "1900-02-05 0.114346 0.581086 0.754694 0.054246 0.226384\n", + "1900-02-06 0.560394 0.034081 0.412505 0.796324 0.218537\n", + "1900-02-07 0.080404 0.265832 0.296440 0.903831 0.344192\n", + "1900-02-08 0.213173 0.049282 0.106273 0.606679 0.282871" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Añade un índice de fecha al dataframe.\n", + "df.index = pd.date_range('1900/1/30', periods = df.shape[0])\n", + "df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Visualizar / inspeccionar datos.\n", + "***" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
1900-01-300.5827850.3882630.4262300.6583710.751270
1900-01-310.7136510.6953130.0792950.0304520.586475
1900-02-010.1099220.5371570.6377530.8790460.471591
1900-02-020.3553670.7849310.9571910.4550440.487740
1900-02-030.9422780.8562660.0391430.7763210.147819
\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4\n", + "1900-01-30 0.582785 0.388263 0.426230 0.658371 0.751270\n", + "1900-01-31 0.713651 0.695313 0.079295 0.030452 0.586475\n", + "1900-02-01 0.109922 0.537157 0.637753 0.879046 0.471591\n", + "1900-02-02 0.355367 0.784931 0.957191 0.455044 0.487740\n", + "1900-02-03 0.942278 0.856266 0.039143 0.776321 0.147819" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(5) # Muestra las primeras 5 filas del DataFrame." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
1900-02-040.5051110.2538760.9910860.3829610.846470
1900-02-050.1143460.5810860.7546940.0542460.226384
1900-02-060.5603940.0340810.4125050.7963240.218537
1900-02-070.0804040.2658320.2964400.9038310.344192
1900-02-080.2131730.0492820.1062730.6066790.282871
\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4\n", + "1900-02-04 0.505111 0.253876 0.991086 0.382961 0.846470\n", + "1900-02-05 0.114346 0.581086 0.754694 0.054246 0.226384\n", + "1900-02-06 0.560394 0.034081 0.412505 0.796324 0.218537\n", + "1900-02-07 0.080404 0.265832 0.296440 0.903831 0.344192\n", + "1900-02-08 0.213173 0.049282 0.106273 0.606679 0.282871" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.tail(5) # Muestra las últimas 5 filas del DataFrame." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(10, 5)" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.shape # Muestra el número de filas y columnas del DataFrame." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "DatetimeIndex: 10 entries, 1900-01-30 to 1900-02-08\n", + "Freq: D\n", + "Data columns (total 5 columns):\n", + "0 10 non-null float64\n", + "1 10 non-null float64\n", + "2 10 non-null float64\n", + "3 10 non-null float64\n", + "4 10 non-null float64\n", + "dtypes: float64(5)\n", + "memory usage: 480.0 bytes\n" + ] + } + ], + "source": [ + "df.info() # Muestra el índice, tipo de datos y memoria." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
count10.00000010.00000010.00000010.00000010.000000
mean0.4177430.4446090.4700610.5543280.436335
std0.2914070.2928130.3533970.3184610.235848
min0.0804040.0340810.0391430.0304520.147819
25%0.1390530.2568650.1538150.4009810.240506
50%0.4302390.4627100.4193680.6325250.407892
75%0.5771870.6667560.7254590.7913230.561791
max0.9422780.8562660.9910860.9038310.846470
\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4\n", + "count 10.000000 10.000000 10.000000 10.000000 10.000000\n", + "mean 0.417743 0.444609 0.470061 0.554328 0.436335\n", + "std 0.291407 0.292813 0.353397 0.318461 0.235848\n", + "min 0.080404 0.034081 0.039143 0.030452 0.147819\n", + "25% 0.139053 0.256865 0.153815 0.400981 0.240506\n", + "50% 0.430239 0.462710 0.419368 0.632525 0.407892\n", + "75% 0.577187 0.666756 0.725459 0.791323 0.561791\n", + "max 0.942278 0.856266 0.991086 0.903831 0.846470" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.describe() # Muestra estadísticas resumidas de columnas numéricas." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "7 1\n", + "5 1\n", + "3 1\n", + "2 1\n", + "9 1\n", + "dtype: int64" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s.value_counts(dropna = False) # Muestra valores y recuentos únicos en la serie." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Selección.\n", + "***" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1900-01-30 0.426230\n", + "1900-01-31 0.079295\n", + "1900-02-01 0.637753\n", + "1900-02-02 0.957191\n", + "1900-02-03 0.039143\n", + "1900-02-04 0.991086\n", + "1900-02-05 0.754694\n", + "1900-02-06 0.412505\n", + "1900-02-07 0.296440\n", + "1900-02-08 0.106273\n", + "Freq: D, Name: 2, dtype: float64" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[2] # Devuelve la columna con la etiqueta 2 como una Series." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
24
1900-01-300.4262300.751270
1900-01-310.0792950.586475
1900-02-010.6377530.471591
1900-02-020.9571910.487740
1900-02-030.0391430.147819
1900-02-040.9910860.846470
1900-02-050.7546940.226384
1900-02-060.4125050.218537
1900-02-070.2964400.344192
1900-02-080.1062730.282871
\n", + "
" + ], + "text/plain": [ + " 2 4\n", + "1900-01-30 0.426230 0.751270\n", + "1900-01-31 0.079295 0.586475\n", + "1900-02-01 0.637753 0.471591\n", + "1900-02-02 0.957191 0.487740\n", + "1900-02-03 0.039143 0.147819\n", + "1900-02-04 0.991086 0.846470\n", + "1900-02-05 0.754694 0.226384\n", + "1900-02-06 0.412505 0.218537\n", + "1900-02-07 0.296440 0.344192\n", + "1900-02-08 0.106273 0.282871" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[[2, 4]] # Devuelve columnas 2 y 4 como un nuevo DataFrame." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s.iloc[0] # Selección por posición: selecciona el elemento 0 de la Series." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "9" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s.loc[3] # Selección por índice de la Series." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 0.582785\n", + "1 0.388263\n", + "2 0.426230\n", + "3 0.658371\n", + "4 0.751270\n", + "Name: 1900-01-30 00:00:00, dtype: float64" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.iloc[0, :] # Primera fila del DataFrame." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.5827851483624562" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.iloc[0, 0] # Primer elemento de la primera columna." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Limpieza de datos.\n", + "***" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
abcxy
1900-01-300.5827850.3882630.4262300.6583710.751270
1900-01-310.7136510.6953130.0792950.0304520.586475
1900-02-010.1099220.5371570.6377530.8790460.471591
1900-02-020.3553670.7849310.9571910.4550440.487740
1900-02-030.9422780.8562660.0391430.7763210.147819
1900-02-040.5051110.2538760.9910860.3829610.846470
1900-02-050.1143460.5810860.7546940.0542460.226384
1900-02-060.5603940.0340810.4125050.7963240.218537
1900-02-070.0804040.2658320.2964400.9038310.344192
1900-02-080.2131730.0492820.1062730.6066790.282871
\n", + "
" + ], + "text/plain": [ + " a b c x y\n", + "1900-01-30 0.582785 0.388263 0.426230 0.658371 0.751270\n", + "1900-01-31 0.713651 0.695313 0.079295 0.030452 0.586475\n", + "1900-02-01 0.109922 0.537157 0.637753 0.879046 0.471591\n", + "1900-02-02 0.355367 0.784931 0.957191 0.455044 0.487740\n", + "1900-02-03 0.942278 0.856266 0.039143 0.776321 0.147819\n", + "1900-02-04 0.505111 0.253876 0.991086 0.382961 0.846470\n", + "1900-02-05 0.114346 0.581086 0.754694 0.054246 0.226384\n", + "1900-02-06 0.560394 0.034081 0.412505 0.796324 0.218537\n", + "1900-02-07 0.080404 0.265832 0.296440 0.903831 0.344192\n", + "1900-02-08 0.213173 0.049282 0.106273 0.606679 0.282871" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.columns = [\"a\", \"b\", \"c\", \"x\", \"y\"] # Renombrar columnas del DataFrame.\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
abcxy
1900-01-30FalseFalseFalseFalseFalse
1900-01-31FalseFalseFalseFalseFalse
1900-02-01FalseFalseFalseFalseFalse
1900-02-02FalseFalseFalseFalseFalse
1900-02-03FalseFalseFalseFalseFalse
1900-02-04FalseFalseFalseFalseFalse
1900-02-05FalseFalseFalseFalseFalse
1900-02-06FalseFalseFalseFalseFalse
1900-02-07FalseFalseFalseFalseFalse
1900-02-08FalseFalseFalseFalseFalse
\n", + "
" + ], + "text/plain": [ + " a b c x y\n", + "1900-01-30 False False False False False\n", + "1900-01-31 False False False False False\n", + "1900-02-01 False False False False False\n", + "1900-02-02 False False False False False\n", + "1900-02-03 False False False False False\n", + "1900-02-04 False False False False False\n", + "1900-02-05 False False False False False\n", + "1900-02-06 False False False False False\n", + "1900-02-07 False False False False False\n", + "1900-02-08 False False False False False" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.isnull() # Comprueba valores nulos, devuelve un Boolean Arrays." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
abcxy
1900-01-30TrueTrueTrueTrueTrue
1900-01-31TrueTrueTrueTrueTrue
1900-02-01TrueTrueTrueTrueTrue
1900-02-02TrueTrueTrueTrueTrue
1900-02-03TrueTrueTrueTrueTrue
1900-02-04TrueTrueTrueTrueTrue
1900-02-05TrueTrueTrueTrueTrue
1900-02-06TrueTrueTrueTrueTrue
1900-02-07TrueTrueTrueTrueTrue
1900-02-08TrueTrueTrueTrueTrue
\n", + "
" + ], + "text/plain": [ + " a b c x y\n", + "1900-01-30 True True True True True\n", + "1900-01-31 True True True True True\n", + "1900-02-01 True True True True True\n", + "1900-02-02 True True True True True\n", + "1900-02-03 True True True True True\n", + "1900-02-04 True True True True True\n", + "1900-02-05 True True True True True\n", + "1900-02-06 True True True True True\n", + "1900-02-07 True True True True True\n", + "1900-02-08 True True True True True" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.notnull() # El opuesto a df.isnull()." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
abcxy
1900-01-300.5827850.3882630.4262300.6583710.751270
1900-01-310.7136510.6953130.0792950.0304520.586475
1900-02-010.1099220.5371570.6377530.8790460.471591
1900-02-020.3553670.7849310.9571910.4550440.487740
1900-02-030.9422780.8562660.0391430.7763210.147819
1900-02-040.5051110.2538760.9910860.3829610.846470
1900-02-050.1143460.5810860.7546940.0542460.226384
1900-02-060.5603940.0340810.4125050.7963240.218537
1900-02-070.0804040.2658320.2964400.9038310.344192
1900-02-080.2131730.0492820.1062730.6066790.282871
\n", + "
" + ], + "text/plain": [ + " a b c x y\n", + "1900-01-30 0.582785 0.388263 0.426230 0.658371 0.751270\n", + "1900-01-31 0.713651 0.695313 0.079295 0.030452 0.586475\n", + "1900-02-01 0.109922 0.537157 0.637753 0.879046 0.471591\n", + "1900-02-02 0.355367 0.784931 0.957191 0.455044 0.487740\n", + "1900-02-03 0.942278 0.856266 0.039143 0.776321 0.147819\n", + "1900-02-04 0.505111 0.253876 0.991086 0.382961 0.846470\n", + "1900-02-05 0.114346 0.581086 0.754694 0.054246 0.226384\n", + "1900-02-06 0.560394 0.034081 0.412505 0.796324 0.218537\n", + "1900-02-07 0.080404 0.265832 0.296440 0.903831 0.344192\n", + "1900-02-08 0.213173 0.049282 0.106273 0.606679 0.282871" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.dropna(axis = 0) # Elimina todas las filas que contienen valores nulos." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
abcxy
1900-01-300.5827850.3882630.4262300.6583710.751270
1900-01-310.7136510.6953130.0792950.0304520.586475
1900-02-010.1099220.5371570.6377530.8790460.471591
1900-02-020.3553670.7849310.9571910.4550440.487740
1900-02-030.9422780.8562660.0391430.7763210.147819
1900-02-040.5051110.2538760.9910860.3829610.846470
1900-02-050.1143460.5810860.7546940.0542460.226384
1900-02-060.5603940.0340810.4125050.7963240.218537
1900-02-070.0804040.2658320.2964400.9038310.344192
1900-02-080.2131730.0492820.1062730.6066790.282871
\n", + "
" + ], + "text/plain": [ + " a b c x y\n", + "1900-01-30 0.582785 0.388263 0.426230 0.658371 0.751270\n", + "1900-01-31 0.713651 0.695313 0.079295 0.030452 0.586475\n", + "1900-02-01 0.109922 0.537157 0.637753 0.879046 0.471591\n", + "1900-02-02 0.355367 0.784931 0.957191 0.455044 0.487740\n", + "1900-02-03 0.942278 0.856266 0.039143 0.776321 0.147819\n", + "1900-02-04 0.505111 0.253876 0.991086 0.382961 0.846470\n", + "1900-02-05 0.114346 0.581086 0.754694 0.054246 0.226384\n", + "1900-02-06 0.560394 0.034081 0.412505 0.796324 0.218537\n", + "1900-02-07 0.080404 0.265832 0.296440 0.903831 0.344192\n", + "1900-02-08 0.213173 0.049282 0.106273 0.606679 0.282871" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.dropna(axis = 1) # Elimina todas las columnas que contienen valores nulos" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
abcxy
1900-01-300.5827850.3882630.4262300.6583710.751270
1900-01-310.7136510.6953130.0792950.0304520.586475
1900-02-010.1099220.5371570.6377530.8790460.471591
1900-02-020.3553670.7849310.9571910.4550440.487740
1900-02-030.9422780.8562660.0391430.7763210.147819
1900-02-040.5051110.2538760.9910860.3829610.846470
1900-02-050.1143460.5810860.7546940.0542460.226384
1900-02-060.5603940.0340810.4125050.7963240.218537
1900-02-070.0804040.2658320.2964400.9038310.344192
1900-02-080.2131730.0492820.1062730.6066790.282871
\n", + "
" + ], + "text/plain": [ + " a b c x y\n", + "1900-01-30 0.582785 0.388263 0.426230 0.658371 0.751270\n", + "1900-01-31 0.713651 0.695313 0.079295 0.030452 0.586475\n", + "1900-02-01 0.109922 0.537157 0.637753 0.879046 0.471591\n", + "1900-02-02 0.355367 0.784931 0.957191 0.455044 0.487740\n", + "1900-02-03 0.942278 0.856266 0.039143 0.776321 0.147819\n", + "1900-02-04 0.505111 0.253876 0.991086 0.382961 0.846470\n", + "1900-02-05 0.114346 0.581086 0.754694 0.054246 0.226384\n", + "1900-02-06 0.560394 0.034081 0.412505 0.796324 0.218537\n", + "1900-02-07 0.080404 0.265832 0.296440 0.903831 0.344192\n", + "1900-02-08 0.213173 0.049282 0.106273 0.606679 0.282871" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Elimina todas las filas que tienen menos de 3 valores no nulos.\n", + "df.dropna(axis = 1, thresh = 3)" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
abcxy
1900-01-300.5827850.3882630.4262300.6583710.751270
1900-01-310.7136510.6953130.0792950.0304520.586475
1900-02-010.1099220.5371570.6377530.8790460.471591
1900-02-020.3553670.7849310.9571910.4550440.487740
1900-02-030.9422780.8562660.0391430.7763210.147819
1900-02-040.5051110.2538760.9910860.3829610.846470
1900-02-050.1143460.5810860.7546940.0542460.226384
1900-02-060.5603940.0340810.4125050.7963240.218537
1900-02-070.0804040.2658320.2964400.9038310.344192
1900-02-080.2131730.0492820.1062730.6066790.282871
\n", + "
" + ], + "text/plain": [ + " a b c x y\n", + "1900-01-30 0.582785 0.388263 0.426230 0.658371 0.751270\n", + "1900-01-31 0.713651 0.695313 0.079295 0.030452 0.586475\n", + "1900-02-01 0.109922 0.537157 0.637753 0.879046 0.471591\n", + "1900-02-02 0.355367 0.784931 0.957191 0.455044 0.487740\n", + "1900-02-03 0.942278 0.856266 0.039143 0.776321 0.147819\n", + "1900-02-04 0.505111 0.253876 0.991086 0.382961 0.846470\n", + "1900-02-05 0.114346 0.581086 0.754694 0.054246 0.226384\n", + "1900-02-06 0.560394 0.034081 0.412505 0.796324 0.218537\n", + "1900-02-07 0.080404 0.265832 0.296440 0.903831 0.344192\n", + "1900-02-08 0.213173 0.049282 0.106273 0.606679 0.282871" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.fillna(0) # Remplaza todos los valores nulos por 0." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 2\n", + "1 7\n", + "2 3\n", + "3 9\n", + "4 5\n", + "dtype: int64" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s.fillna(s.mean()) # Remplaza todos los valores nulos por la media." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 2.0\n", + "1 7.0\n", + "2 3.0\n", + "3 9.0\n", + "4 5.0\n", + "dtype: float64" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s.astype(float) # Convierte el tipo de datos a float." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 2\n", + "1 7\n", + "2 100\n", + "3 9\n", + "4 5\n", + "dtype: object" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s.replace(3, '100') # Remplaza todos los valores iguales a 3 con '100'." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 2\n", + "1 700\n", + "2 3\n", + "3 900\n", + "4 5\n", + "dtype: object" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s.replace([7, 9], ['700', '900']) # Remplaza todos los 7 por '700' y 9 por '900'." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
13579
1900-01-300.5827850.3882630.4262300.6583710.751270
1900-01-310.7136510.6953130.0792950.0304520.586475
1900-02-010.1099220.5371570.6377530.8790460.471591
1900-02-020.3553670.7849310.9571910.4550440.487740
1900-02-030.9422780.8562660.0391430.7763210.147819
1900-02-040.5051110.2538760.9910860.3829610.846470
1900-02-050.1143460.5810860.7546940.0542460.226384
1900-02-060.5603940.0340810.4125050.7963240.218537
1900-02-070.0804040.2658320.2964400.9038310.344192
1900-02-080.2131730.0492820.1062730.6066790.282871
\n", + "
" + ], + "text/plain": [ + " 1 3 5 7 9\n", + "1900-01-30 0.582785 0.388263 0.426230 0.658371 0.751270\n", + "1900-01-31 0.713651 0.695313 0.079295 0.030452 0.586475\n", + "1900-02-01 0.109922 0.537157 0.637753 0.879046 0.471591\n", + "1900-02-02 0.355367 0.784931 0.957191 0.455044 0.487740\n", + "1900-02-03 0.942278 0.856266 0.039143 0.776321 0.147819\n", + "1900-02-04 0.505111 0.253876 0.991086 0.382961 0.846470\n", + "1900-02-05 0.114346 0.581086 0.754694 0.054246 0.226384\n", + "1900-02-06 0.560394 0.034081 0.412505 0.796324 0.218537\n", + "1900-02-07 0.080404 0.265832 0.296440 0.903831 0.344192\n", + "1900-02-08 0.213173 0.049282 0.106273 0.606679 0.282871" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.columns = (0, 1, 2, 3, 4)\n", + "df.rename(columns = lambda x: 2*x + 1) # Cambio de nombre de columnas en masa." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0one234
1900-01-300.5827850.3882630.4262300.6583710.751270
1900-01-310.7136510.6953130.0792950.0304520.586475
1900-02-010.1099220.5371570.6377530.8790460.471591
1900-02-020.3553670.7849310.9571910.4550440.487740
1900-02-030.9422780.8562660.0391430.7763210.147819
1900-02-040.5051110.2538760.9910860.3829610.846470
1900-02-050.1143460.5810860.7546940.0542460.226384
1900-02-060.5603940.0340810.4125050.7963240.218537
1900-02-070.0804040.2658320.2964400.9038310.344192
1900-02-080.2131730.0492820.1062730.6066790.282871
\n", + "
" + ], + "text/plain": [ + " 0 one 2 3 4\n", + "1900-01-30 0.582785 0.388263 0.426230 0.658371 0.751270\n", + "1900-01-31 0.713651 0.695313 0.079295 0.030452 0.586475\n", + "1900-02-01 0.109922 0.537157 0.637753 0.879046 0.471591\n", + "1900-02-02 0.355367 0.784931 0.957191 0.455044 0.487740\n", + "1900-02-03 0.942278 0.856266 0.039143 0.776321 0.147819\n", + "1900-02-04 0.505111 0.253876 0.991086 0.382961 0.846470\n", + "1900-02-05 0.114346 0.581086 0.754694 0.054246 0.226384\n", + "1900-02-06 0.560394 0.034081 0.412505 0.796324 0.218537\n", + "1900-02-07 0.080404 0.265832 0.296440 0.903831 0.344192\n", + "1900-02-08 0.213173 0.049282 0.106273 0.606679 0.282871" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.rename(columns={1:'one'}) # Renombrar seleccionando columna." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0134
2
0.4262300.5827850.3882630.6583710.751270
0.0792950.7136510.6953130.0304520.586475
0.6377530.1099220.5371570.8790460.471591
0.9571910.3553670.7849310.4550440.487740
0.0391430.9422780.8562660.7763210.147819
0.9910860.5051110.2538760.3829610.846470
0.7546940.1143460.5810860.0542460.226384
0.4125050.5603940.0340810.7963240.218537
0.2964400.0804040.2658320.9038310.344192
0.1062730.2131730.0492820.6066790.282871
\n", + "
" + ], + "text/plain": [ + " 0 1 3 4\n", + "2 \n", + "0.426230 0.582785 0.388263 0.658371 0.751270\n", + "0.079295 0.713651 0.695313 0.030452 0.586475\n", + "0.637753 0.109922 0.537157 0.879046 0.471591\n", + "0.957191 0.355367 0.784931 0.455044 0.487740\n", + "0.039143 0.942278 0.856266 0.776321 0.147819\n", + "0.991086 0.505111 0.253876 0.382961 0.846470\n", + "0.754694 0.114346 0.581086 0.054246 0.226384\n", + "0.412505 0.560394 0.034081 0.796324 0.218537\n", + "0.296440 0.080404 0.265832 0.903831 0.344192\n", + "0.106273 0.213173 0.049282 0.606679 0.282871" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.set_index(2) # Cambiar el índice por una columna (2) del Dataframe." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0134
2
1.4262300.5827850.3882630.6583710.751270
1.0792950.7136510.6953130.0304520.586475
1.6377530.1099220.5371570.8790460.471591
1.9571910.3553670.7849310.4550440.487740
1.0391430.9422780.8562660.7763210.147819
1.9910860.5051110.2538760.3829610.846470
1.7546940.1143460.5810860.0542460.226384
1.4125050.5603940.0340810.7963240.218537
1.2964400.0804040.2658320.9038310.344192
1.1062730.2131730.0492820.6066790.282871
\n", + "
" + ], + "text/plain": [ + " 0 1 3 4\n", + "2 \n", + "1.426230 0.582785 0.388263 0.658371 0.751270\n", + "1.079295 0.713651 0.695313 0.030452 0.586475\n", + "1.637753 0.109922 0.537157 0.879046 0.471591\n", + "1.957191 0.355367 0.784931 0.455044 0.487740\n", + "1.039143 0.942278 0.856266 0.776321 0.147819\n", + "1.991086 0.505111 0.253876 0.382961 0.846470\n", + "1.754694 0.114346 0.581086 0.054246 0.226384\n", + "1.412505 0.560394 0.034081 0.796324 0.218537\n", + "1.296440 0.080404 0.265832 0.903831 0.344192\n", + "1.106273 0.213173 0.049282 0.606679 0.282871" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.set_index(2).rename(index = lambda x: x + 1) # Cambia el índice en lote." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Filtro, orden y agrupamiento.\n", + "***" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
1900-02-010.1099220.5371570.6377530.8790460.471591
1900-02-020.3553670.7849310.9571910.4550440.487740
1900-02-040.5051110.2538760.9910860.3829610.846470
1900-02-050.1143460.5810860.7546940.0542460.226384
\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4\n", + "1900-02-01 0.109922 0.537157 0.637753 0.879046 0.471591\n", + "1900-02-02 0.355367 0.784931 0.957191 0.455044 0.487740\n", + "1900-02-04 0.505111 0.253876 0.991086 0.382961 0.846470\n", + "1900-02-05 0.114346 0.581086 0.754694 0.054246 0.226384" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df[2] > 0.5] # Filtra las filas donde la columna 2 es mayor que 0.5" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
1900-02-010.1099220.5371570.6377530.8790460.471591
\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4\n", + "1900-02-01 0.109922 0.537157 0.637753 0.879046 0.471591" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[(df[2] > 0.5) & (df[2] < 0.7)] # Filtra las filas donde 0.7 > columna 2 > 0.5" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
1900-01-310.7136510.6953130.0792950.0304520.586475
1900-02-050.1143460.5810860.7546940.0542460.226384
1900-02-040.5051110.2538760.9910860.3829610.846470
1900-02-020.3553670.7849310.9571910.4550440.487740
1900-02-080.2131730.0492820.1062730.6066790.282871
1900-01-300.5827850.3882630.4262300.6583710.751270
1900-02-030.9422780.8562660.0391430.7763210.147819
1900-02-060.5603940.0340810.4125050.7963240.218537
1900-02-010.1099220.5371570.6377530.8790460.471591
1900-02-070.0804040.2658320.2964400.9038310.344192
\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4\n", + "1900-01-31 0.713651 0.695313 0.079295 0.030452 0.586475\n", + "1900-02-05 0.114346 0.581086 0.754694 0.054246 0.226384\n", + "1900-02-04 0.505111 0.253876 0.991086 0.382961 0.846470\n", + "1900-02-02 0.355367 0.784931 0.957191 0.455044 0.487740\n", + "1900-02-08 0.213173 0.049282 0.106273 0.606679 0.282871\n", + "1900-01-30 0.582785 0.388263 0.426230 0.658371 0.751270\n", + "1900-02-03 0.942278 0.856266 0.039143 0.776321 0.147819\n", + "1900-02-06 0.560394 0.034081 0.412505 0.796324 0.218537\n", + "1900-02-01 0.109922 0.537157 0.637753 0.879046 0.471591\n", + "1900-02-07 0.080404 0.265832 0.296440 0.903831 0.344192" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.sort_values(3) # Ordena por los valores de la columna 3 en orden ascendente." + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
1900-02-070.0804040.2658320.2964400.9038310.344192
1900-02-010.1099220.5371570.6377530.8790460.471591
1900-02-060.5603940.0340810.4125050.7963240.218537
1900-02-030.9422780.8562660.0391430.7763210.147819
1900-01-300.5827850.3882630.4262300.6583710.751270
1900-02-080.2131730.0492820.1062730.6066790.282871
1900-02-020.3553670.7849310.9571910.4550440.487740
1900-02-040.5051110.2538760.9910860.3829610.846470
1900-02-050.1143460.5810860.7546940.0542460.226384
1900-01-310.7136510.6953130.0792950.0304520.586475
\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4\n", + "1900-02-07 0.080404 0.265832 0.296440 0.903831 0.344192\n", + "1900-02-01 0.109922 0.537157 0.637753 0.879046 0.471591\n", + "1900-02-06 0.560394 0.034081 0.412505 0.796324 0.218537\n", + "1900-02-03 0.942278 0.856266 0.039143 0.776321 0.147819\n", + "1900-01-30 0.582785 0.388263 0.426230 0.658371 0.751270\n", + "1900-02-08 0.213173 0.049282 0.106273 0.606679 0.282871\n", + "1900-02-02 0.355367 0.784931 0.957191 0.455044 0.487740\n", + "1900-02-04 0.505111 0.253876 0.991086 0.382961 0.846470\n", + "1900-02-05 0.114346 0.581086 0.754694 0.054246 0.226384\n", + "1900-01-31 0.713651 0.695313 0.079295 0.030452 0.586475" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.sort_values(3, ascending = False) # Ordena por la columna 3 en orden descendente." + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
1900-02-030.9422780.8562660.0391430.7763210.147819
1900-01-310.7136510.6953130.0792950.0304520.586475
1900-02-080.2131730.0492820.1062730.6066790.282871
1900-02-070.0804040.2658320.2964400.9038310.344192
1900-02-060.5603940.0340810.4125050.7963240.218537
1900-01-300.5827850.3882630.4262300.6583710.751270
1900-02-010.1099220.5371570.6377530.8790460.471591
1900-02-050.1143460.5810860.7546940.0542460.226384
1900-02-020.3553670.7849310.9571910.4550440.487740
1900-02-040.5051110.2538760.9910860.3829610.846470
\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4\n", + "1900-02-03 0.942278 0.856266 0.039143 0.776321 0.147819\n", + "1900-01-31 0.713651 0.695313 0.079295 0.030452 0.586475\n", + "1900-02-08 0.213173 0.049282 0.106273 0.606679 0.282871\n", + "1900-02-07 0.080404 0.265832 0.296440 0.903831 0.344192\n", + "1900-02-06 0.560394 0.034081 0.412505 0.796324 0.218537\n", + "1900-01-30 0.582785 0.388263 0.426230 0.658371 0.751270\n", + "1900-02-01 0.109922 0.537157 0.637753 0.879046 0.471591\n", + "1900-02-05 0.114346 0.581086 0.754694 0.054246 0.226384\n", + "1900-02-02 0.355367 0.784931 0.957191 0.455044 0.487740\n", + "1900-02-04 0.505111 0.253876 0.991086 0.382961 0.846470" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Ordena los valores por la columna 2 de forma ascendente\n", + "# y luego por la columna 3 en orden descendente.\n", + "df.sort_values([2, 3], ascending = [True, False])" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.groupby(3) # Devuelve un objeto groupby para los valores de una columna." + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Devuelve un objeto groupby para valores de múltiples columnas,\n", + "# en este caso columnas 3 y 4.\n", + "df.groupby([3, 4])" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
34
2
0.0391430.7763210.147819
0.0792950.0304520.586475
0.1062730.6066790.282871
0.2964400.9038310.344192
0.4125050.7963240.218537
0.4262300.6583710.751270
0.6377530.8790460.471591
0.7546940.0542460.226384
0.9571910.4550440.487740
0.9910860.3829610.846470
\n", + "
" + ], + "text/plain": [ + " 3 4\n", + "2 \n", + "0.039143 0.776321 0.147819\n", + "0.079295 0.030452 0.586475\n", + "0.106273 0.606679 0.282871\n", + "0.296440 0.903831 0.344192\n", + "0.412505 0.796324 0.218537\n", + "0.426230 0.658371 0.751270\n", + "0.637753 0.879046 0.471591\n", + "0.754694 0.054246 0.226384\n", + "0.957191 0.455044 0.487740\n", + "0.991086 0.382961 0.846470" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Crea una tabla dinámica que agrupa por la columna 2\n", + "# y calcula la media de las columnas 3 y 4.\n", + "df.pivot_table(index = 2, values = [3, 4], aggfunc = pd.np.mean)" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0124
3
0.0304520.7136510.6953130.0792950.586475
0.0542460.1143460.5810860.7546940.226384
0.3829610.5051110.2538760.9910860.846470
0.4550440.3553670.7849310.9571910.487740
0.6066790.2131730.0492820.1062730.282871
0.6583710.5827850.3882630.4262300.751270
0.7763210.9422780.8562660.0391430.147819
0.7963240.5603940.0340810.4125050.218537
0.8790460.1099220.5371570.6377530.471591
0.9038310.0804040.2658320.2964400.344192
\n", + "
" + ], + "text/plain": [ + " 0 1 2 4\n", + "3 \n", + "0.030452 0.713651 0.695313 0.079295 0.586475\n", + "0.054246 0.114346 0.581086 0.754694 0.226384\n", + "0.382961 0.505111 0.253876 0.991086 0.846470\n", + "0.455044 0.355367 0.784931 0.957191 0.487740\n", + "0.606679 0.213173 0.049282 0.106273 0.282871\n", + "0.658371 0.582785 0.388263 0.426230 0.751270\n", + "0.776321 0.942278 0.856266 0.039143 0.147819\n", + "0.796324 0.560394 0.034081 0.412505 0.218537\n", + "0.879046 0.109922 0.537157 0.637753 0.471591\n", + "0.903831 0.080404 0.265832 0.296440 0.344192" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Encuentra el promedio en todas las columnas\n", + "# para cada grupo de la columna 3 único.\n", + "df.groupby(3).agg(pd.np.mean)" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 0.417743\n", + "1 0.444609\n", + "2 0.470061\n", + "3 0.554328\n", + "4 0.436335\n", + "dtype: float64" + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.apply(pd.np.mean) # Aplica la función pd.np.mean() en cada columna." + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1900-01-30 0.751270\n", + "1900-01-31 0.713651\n", + "1900-02-01 0.879046\n", + "1900-02-02 0.957191\n", + "1900-02-03 0.942278\n", + "1900-02-04 0.991086\n", + "1900-02-05 0.754694\n", + "1900-02-06 0.796324\n", + "1900-02-07 0.903831\n", + "1900-02-08 0.606679\n", + "Freq: D, dtype: float64" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.apply(pd.np.max, axis = 1) # Aplica la función pd.np.max() en cada fila." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Unir / Combinar.\n", + "***" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Estadísticas.\n", + "***" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
count10.00000010.00000010.00000010.00000010.000000
mean0.4177430.4446090.4700610.5543280.436335
std0.2914070.2928130.3533970.3184610.235848
min0.0804040.0340810.0391430.0304520.147819
25%0.1390530.2568650.1538150.4009810.240506
50%0.4302390.4627100.4193680.6325250.407892
75%0.5771870.6667560.7254590.7913230.561791
max0.9422780.8562660.9910860.9038310.846470
\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4\n", + "count 10.000000 10.000000 10.000000 10.000000 10.000000\n", + "mean 0.417743 0.444609 0.470061 0.554328 0.436335\n", + "std 0.291407 0.292813 0.353397 0.318461 0.235848\n", + "min 0.080404 0.034081 0.039143 0.030452 0.147819\n", + "25% 0.139053 0.256865 0.153815 0.400981 0.240506\n", + "50% 0.430239 0.462710 0.419368 0.632525 0.407892\n", + "75% 0.577187 0.666756 0.725459 0.791323 0.561791\n", + "max 0.942278 0.856266 0.991086 0.903831 0.846470" + ] + }, + "execution_count": 45, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.describe() # Resumen de estadísticas para columnas numéricas." + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 0.417743\n", + "1 0.444609\n", + "2 0.470061\n", + "3 0.554328\n", + "4 0.436335\n", + "dtype: float64" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.mean(axis = 0) # Devuelve la media de todas las columnas." + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1900-01-30 0.561384\n", + "1900-01-31 0.421037\n", + "1900-02-01 0.527094\n", + "1900-02-02 0.608055\n", + "1900-02-03 0.552365\n", + "1900-02-04 0.595901\n", + "1900-02-05 0.346151\n", + "1900-02-06 0.404368\n", + "1900-02-07 0.378140\n", + "1900-02-08 0.251656\n", + "Freq: D, dtype: float64" + ] + }, + "execution_count": 49, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.mean(axis = 1) # Devuelve la media de todas las filas." + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
01.0000000.332894-0.374344-0.0868100.100152
10.3328941.0000000.049785-0.318906-0.029674
2-0.3743440.0497851.000000-0.2119690.428057
3-0.086810-0.318906-0.2119691.000000-0.239399
40.100152-0.0296740.428057-0.2393991.000000
\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4\n", + "0 1.000000 0.332894 -0.374344 -0.086810 0.100152\n", + "1 0.332894 1.000000 0.049785 -0.318906 -0.029674\n", + "2 -0.374344 0.049785 1.000000 -0.211969 0.428057\n", + "3 -0.086810 -0.318906 -0.211969 1.000000 -0.239399\n", + "4 0.100152 -0.029674 0.428057 -0.239399 1.000000" + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.corr() # Devuelve la correlación entre las columnas en un DataFrame." + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 10\n", + "1 10\n", + "2 10\n", + "3 10\n", + "4 10\n", + "dtype: int64" + ] + }, + "execution_count": 57, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.count(axis = 0) # Devuelve el número de valores no nulos en cada columna." + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 0.942278\n", + "1 0.856266\n", + "2 0.991086\n", + "3 0.903831\n", + "4 0.846470\n", + "dtype: float64" + ] + }, + "execution_count": 56, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.max(axis = 0) # Devuelve el valor más alto en cada columna." + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 0.080404\n", + "1 0.034081\n", + "2 0.039143\n", + "3 0.030452\n", + "4 0.147819\n", + "dtype: float64" + ] + }, + "execution_count": 58, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.min(axis = 0) # Devuelve el valor más bajo en cada columna" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 0.430239\n", + "1 0.462710\n", + "2 0.419368\n", + "3 0.632525\n", + "4 0.407892\n", + "dtype: float64" + ] + }, + "execution_count": 59, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.median(axis = 0) # Devuelve la mediana de cada columna." + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 0.291407\n", + "1 0.292813\n", + "2 0.353397\n", + "3 0.318461\n", + "4 0.235848\n", + "dtype: float64" + ] + }, + "execution_count": 60, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.std(axis = 0) # Devuelve la desviación estándar de cada columna." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.5.3" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/07_pandas.ipynb b/07_pandas.ipynb deleted file mode 100644 index d6d75a6..0000000 --- a/07_pandas.ipynb +++ /dev/null @@ -1,2660 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![](./imagenes/python_logo.jpeg)\n", - "# Librería Pandas.\n", - "***" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[*Pandas*](https://pandas.pydata.org/pandas-docs/stable/) es sin duda el paquete más importante de *Python* utilizado en la *Ciencia de Datos*. No solo ofrece muchos métodos y funciones que facilitan el trabajo con los datos, sino que además se ha optimizado para la velocidad, lo que le brinda una ventaja significativa en comparación al trabajo con datos numéricos en *Python*.\n", - "\n", - "*Pandas* es una librería que provee estructuras de datos rápidas, flexibles y expresivas; diseñadas para trabajar con *rotulados* y/o *relacionales*. Conceptualmente se puede pensar como *arrays* de *NumPy* donde las filas y columnas están *rótuladas*. Las estructuras de datos de *Pandas* tienen forma similar a hojas de cálculo en *Python*.\n", - "\n", - "Asi como *NumPy*, *Pandas* es una muy buena herramienta para trabajar con números, vectores, álgebra lineal, etc. *Pandas* es adecuado para trabajar con:\n", - "\n", - "* Datos tabulares y heterogéneos (flotantes, string, enteros, etc.)\n", - "* Series temporales.\n", - "* Los mismos datos que se pueden manipular con arreglos de *NumPy*.\n", - "\n", - "*Pandas* no forma parte de la instalación estándar de *Python*, así que debe instalarse por separado. Para instalar *Pandas* debe ejecutarse el siguiente comando desde una cónsola o terminal:\n", - "```bash\n", - "pip install pandas\n", - "```\n", - "Además, para trabajar con archivos de Excel, también deberá ejecutarse:\n", - "```bash\n", - "pip install xlrd\n", - "```\n", - "*Pandas* tiene una documentación muy completa y diversos [tutoriales](http://pandas.pydata.org/pandas-docs/stable/tutorials.html).\n", - "\n", - "En *Pandas* existen tres tipos básicos de objetos todos ellos basados a su vez en *NumPy*:\n", - "\n", - "* *Series* (listas, 1D),\n", - "* *DataFrame* (tablas, 2D) y\n", - "* *Panels* (tablas 3D).\n", - "\n", - "Nosotros vamos a ver el uso básico de los dos primeros tipos de objetos, para un mayor detalle puedes consultar el [manual](http://pandas.pydata.org/pandas-docs/stable/dsintro.html).\n", - "\n", - "En este notebook vamos a usar la abreviación:\n", - "\n", - "* **df**: para cualquier objeto `DataFrame` *Pandas*.\n", - "* **s**: para cualquier objeto de `Series` *Pandas*.\n", - "\n", - "Para comenzar iniciamos importando *Pandas* según la convención:" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Importador de datos." - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "csv_data = \"datos/capacitaciones.csv\"\n", - "json_data = \"datos/capacitaciones.json\"\n", - "xlsx_data = \"datos/capacitaciones.xlsx\"\n", - "\n", - "df = pd.read_csv(csv_data, encoding = \"ISO-8859-1\") # De un archivo CSV.\n", - "df = pd.read_json(json_data) # De un archivo JSON.\n", - "df = pd.read_excel(xlsx_data) # De un archivo XLSX." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Existen otros tipos de *importadores* de datos, tales como:\n", - "\n", - "* `pd.read_table(filename)`: importa desde un archivo de texto delimitado (como TSV).\n", - "* `pd.read_sql(query, connection_object)`: importa desde una BaseDeDatos/Tabla SQL.\n", - "* `pd.read_html(url)`: importa desde una URL html, una cadena o un archivo y extrae tablas a una lista.\n", - "* `pd.read_clipboard()`: toma los datos desde el contenido del portapapeles.\n", - "* `pd.DataFrame(dict)`: importa desde un diccionario *Python*." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Exportador de datos." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Los *exportadores* de datos guardan en disco los datos del *df*:\n", - "\n", - "* `df.to_csv(filename)`: escribe los datos en un archivo CSV.\n", - "* `df.to_excel(filename)`: escribe los datos en un archivo Excel.\n", - "* `df.to_sql(table_name, connection_object)`: escribe los datos en una tabla SQL.\n", - "* `df.to_json(filename)`: escribe los datos en un archivo con formato JSON;" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Crear objetos de prueba." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Crear datos de prueba es útil para probar segmentos de código:" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
01234
00.4713040.3466600.6258570.1997570.326658
10.6397310.8346990.1558160.4242620.465353
20.9868180.4698960.5343620.7480240.064895
30.6733670.5240890.9174700.7547540.999615
40.6604630.1271520.9512550.7940990.021641
50.0833330.1994990.7068080.4934640.679596
60.5188920.8556670.0284150.5118200.851935
70.4573810.3198300.3391670.4631010.417858
80.1044480.1535280.4849920.7948210.702888
90.6638230.0820650.3829270.4527810.586594
\n", - "
" - ], - "text/plain": [ - " 0 1 2 3 4\n", - "0 0.471304 0.346660 0.625857 0.199757 0.326658\n", - "1 0.639731 0.834699 0.155816 0.424262 0.465353\n", - "2 0.986818 0.469896 0.534362 0.748024 0.064895\n", - "3 0.673367 0.524089 0.917470 0.754754 0.999615\n", - "4 0.660463 0.127152 0.951255 0.794099 0.021641\n", - "5 0.083333 0.199499 0.706808 0.493464 0.679596\n", - "6 0.518892 0.855667 0.028415 0.511820 0.851935\n", - "7 0.457381 0.319830 0.339167 0.463101 0.417858\n", - "8 0.104448 0.153528 0.484992 0.794821 0.702888\n", - "9 0.663823 0.082065 0.382927 0.452781 0.586594" - ] - }, - "execution_count": 21, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Genera un dataframe con 5 columnas y 10 filas.\n", - "df = pd.DataFrame(pd.np.random.rand(10, 5))\n", - "df" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 2\n", - "1 7\n", - "2 3\n", - "3 9\n", - "4 5\n", - "dtype: int64" - ] - }, - "execution_count": 30, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Crea una series a partir de una lista.\n", - "my_list = [2, 7, 3, 9, 5]\n", - "s = pd.Series(my_list)\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
01234
1900-01-300.4713040.3466600.6258570.1997570.326658
1900-01-310.6397310.8346990.1558160.4242620.465353
1900-02-010.9868180.4698960.5343620.7480240.064895
1900-02-020.6733670.5240890.9174700.7547540.999615
1900-02-030.6604630.1271520.9512550.7940990.021641
1900-02-040.0833330.1994990.7068080.4934640.679596
1900-02-050.5188920.8556670.0284150.5118200.851935
1900-02-060.4573810.3198300.3391670.4631010.417858
1900-02-070.1044480.1535280.4849920.7948210.702888
1900-02-080.6638230.0820650.3829270.4527810.586594
\n", - "
" - ], - "text/plain": [ - " 0 1 2 3 4\n", - "1900-01-30 0.471304 0.346660 0.625857 0.199757 0.326658\n", - "1900-01-31 0.639731 0.834699 0.155816 0.424262 0.465353\n", - "1900-02-01 0.986818 0.469896 0.534362 0.748024 0.064895\n", - "1900-02-02 0.673367 0.524089 0.917470 0.754754 0.999615\n", - "1900-02-03 0.660463 0.127152 0.951255 0.794099 0.021641\n", - "1900-02-04 0.083333 0.199499 0.706808 0.493464 0.679596\n", - "1900-02-05 0.518892 0.855667 0.028415 0.511820 0.851935\n", - "1900-02-06 0.457381 0.319830 0.339167 0.463101 0.417858\n", - "1900-02-07 0.104448 0.153528 0.484992 0.794821 0.702888\n", - "1900-02-08 0.663823 0.082065 0.382927 0.452781 0.586594" - ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Añade un índice de fecha al dataframe.\n", - "df.index = pd.date_range('1900/1/30', periods = df.shape[0])\n", - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Visualizar / inspeccionar datos." - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
01234
1900-01-300.4713040.3466600.6258570.1997570.326658
1900-01-310.6397310.8346990.1558160.4242620.465353
1900-02-010.9868180.4698960.5343620.7480240.064895
1900-02-020.6733670.5240890.9174700.7547540.999615
1900-02-030.6604630.1271520.9512550.7940990.021641
\n", - "
" - ], - "text/plain": [ - " 0 1 2 3 4\n", - "1900-01-30 0.471304 0.346660 0.625857 0.199757 0.326658\n", - "1900-01-31 0.639731 0.834699 0.155816 0.424262 0.465353\n", - "1900-02-01 0.986818 0.469896 0.534362 0.748024 0.064895\n", - "1900-02-02 0.673367 0.524089 0.917470 0.754754 0.999615\n", - "1900-02-03 0.660463 0.127152 0.951255 0.794099 0.021641" - ] - }, - "execution_count": 24, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.head(5) # Muestra las primeras 5 filas del DataFrame." - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
01234
1900-02-040.0833330.1994990.7068080.4934640.679596
1900-02-050.5188920.8556670.0284150.5118200.851935
1900-02-060.4573810.3198300.3391670.4631010.417858
1900-02-070.1044480.1535280.4849920.7948210.702888
1900-02-080.6638230.0820650.3829270.4527810.586594
\n", - "
" - ], - "text/plain": [ - " 0 1 2 3 4\n", - "1900-02-04 0.083333 0.199499 0.706808 0.493464 0.679596\n", - "1900-02-05 0.518892 0.855667 0.028415 0.511820 0.851935\n", - "1900-02-06 0.457381 0.319830 0.339167 0.463101 0.417858\n", - "1900-02-07 0.104448 0.153528 0.484992 0.794821 0.702888\n", - "1900-02-08 0.663823 0.082065 0.382927 0.452781 0.586594" - ] - }, - "execution_count": 25, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.tail(5) # Muestra las últimas 5 filas del DataFrame." - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(10, 5)" - ] - }, - "execution_count": 26, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.shape # Muestra el Nnúmero de filas y columnas del DataFrame." - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "DatetimeIndex: 10 entries, 1900-01-30 to 1900-02-08\n", - "Freq: D\n", - "Data columns (total 5 columns):\n", - "0 10 non-null float64\n", - "1 10 non-null float64\n", - "2 10 non-null float64\n", - "3 10 non-null float64\n", - "4 10 non-null float64\n", - "dtypes: float64(5)\n", - "memory usage: 480.0 bytes\n" - ] - } - ], - "source": [ - "df.info() # Muestra el índice, tipo de datos y memoria." - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
01234
count10.00000010.00000010.00000010.00000010.000000
mean0.5259560.3913080.5127070.5636880.511703
std0.2719260.2792600.3012460.1997010.317833
min0.0833330.0820650.0284150.1997570.021641
25%0.4608620.1650210.3501070.4553610.349458
50%0.5793110.3332450.5096770.5026420.525974
75%0.6629830.5105410.6865700.7530720.697065
max0.9868180.8556670.9512550.7948210.999615
\n", - "
" - ], - "text/plain": [ - " 0 1 2 3 4\n", - "count 10.000000 10.000000 10.000000 10.000000 10.000000\n", - "mean 0.525956 0.391308 0.512707 0.563688 0.511703\n", - "std 0.271926 0.279260 0.301246 0.199701 0.317833\n", - "min 0.083333 0.082065 0.028415 0.199757 0.021641\n", - "25% 0.460862 0.165021 0.350107 0.455361 0.349458\n", - "50% 0.579311 0.333245 0.509677 0.502642 0.525974\n", - "75% 0.662983 0.510541 0.686570 0.753072 0.697065\n", - "max 0.986818 0.855667 0.951255 0.794821 0.999615" - ] - }, - "execution_count": 28, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.describe() # Muestra estadísticas resumidas de columnas numéricas." - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "7 1\n", - "5 1\n", - "3 1\n", - "2 1\n", - "9 1\n", - "dtype: int64" - ] - }, - "execution_count": 31, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.value_counts(dropna = False) # Muestra valores y recuentos únicos en la serie." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Selección." - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "1900-01-30 0.625857\n", - "1900-01-31 0.155816\n", - "1900-02-01 0.534362\n", - "1900-02-02 0.917470\n", - "1900-02-03 0.951255\n", - "1900-02-04 0.706808\n", - "1900-02-05 0.028415\n", - "1900-02-06 0.339167\n", - "1900-02-07 0.484992\n", - "1900-02-08 0.382927\n", - "Freq: D, Name: 2, dtype: float64" - ] - }, - "execution_count": 34, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df[2] # Devuelve la columna con la etiqueta 2 como una Series." - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
24
1900-01-300.6258570.326658
1900-01-310.1558160.465353
1900-02-010.5343620.064895
1900-02-020.9174700.999615
1900-02-030.9512550.021641
1900-02-040.7068080.679596
1900-02-050.0284150.851935
1900-02-060.3391670.417858
1900-02-070.4849920.702888
1900-02-080.3829270.586594
\n", - "
" - ], - "text/plain": [ - " 2 4\n", - "1900-01-30 0.625857 0.326658\n", - "1900-01-31 0.155816 0.465353\n", - "1900-02-01 0.534362 0.064895\n", - "1900-02-02 0.917470 0.999615\n", - "1900-02-03 0.951255 0.021641\n", - "1900-02-04 0.706808 0.679596\n", - "1900-02-05 0.028415 0.851935\n", - "1900-02-06 0.339167 0.417858\n", - "1900-02-07 0.484992 0.702888\n", - "1900-02-08 0.382927 0.586594" - ] - }, - "execution_count": 33, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df[[2, 4]] # Devuelve columnas 2 y 4 como un nuevo DataFrame." - ] - }, - { - "cell_type": "code", - "execution_count": 35, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "2" - ] - }, - "execution_count": 35, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.iloc[0] # Selección por posición: selecciona el elemento 0 de la Series." - ] - }, - { - "cell_type": "code", - "execution_count": 36, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "9" - ] - }, - "execution_count": 36, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.loc[3] # Selección por índice de la Series." - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 0.471304\n", - "1 0.346660\n", - "2 0.625857\n", - "3 0.199757\n", - "4 0.326658\n", - "Name: 1900-01-30 00:00:00, dtype: float64" - ] - }, - "execution_count": 37, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.iloc[0, :] # Primera fila del DataFrame." - ] - }, - { - "cell_type": "code", - "execution_count": 38, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0.4713038819925456" - ] - }, - "execution_count": 38, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.iloc[0, 0] # Primer elemento de la primera columna." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Limpieza de datos." - ] - }, - { - "cell_type": "code", - "execution_count": 41, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
abcxy
1900-01-300.4713040.3466600.6258570.1997570.326658
1900-01-310.6397310.8346990.1558160.4242620.465353
1900-02-010.9868180.4698960.5343620.7480240.064895
1900-02-020.6733670.5240890.9174700.7547540.999615
1900-02-030.6604630.1271520.9512550.7940990.021641
1900-02-040.0833330.1994990.7068080.4934640.679596
1900-02-050.5188920.8556670.0284150.5118200.851935
1900-02-060.4573810.3198300.3391670.4631010.417858
1900-02-070.1044480.1535280.4849920.7948210.702888
1900-02-080.6638230.0820650.3829270.4527810.586594
\n", - "
" - ], - "text/plain": [ - " a b c x y\n", - "1900-01-30 0.471304 0.346660 0.625857 0.199757 0.326658\n", - "1900-01-31 0.639731 0.834699 0.155816 0.424262 0.465353\n", - "1900-02-01 0.986818 0.469896 0.534362 0.748024 0.064895\n", - "1900-02-02 0.673367 0.524089 0.917470 0.754754 0.999615\n", - "1900-02-03 0.660463 0.127152 0.951255 0.794099 0.021641\n", - "1900-02-04 0.083333 0.199499 0.706808 0.493464 0.679596\n", - "1900-02-05 0.518892 0.855667 0.028415 0.511820 0.851935\n", - "1900-02-06 0.457381 0.319830 0.339167 0.463101 0.417858\n", - "1900-02-07 0.104448 0.153528 0.484992 0.794821 0.702888\n", - "1900-02-08 0.663823 0.082065 0.382927 0.452781 0.586594" - ] - }, - "execution_count": 41, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.columns = [\"a\", \"b\", \"c\", \"x\", \"y\"] # Renombrar columnas del DataFrame.\n", - "df" - ] - }, - { - "cell_type": "code", - "execution_count": 44, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
abcxy
1900-01-30FalseFalseFalseFalseFalse
1900-01-31FalseFalseFalseFalseFalse
1900-02-01FalseFalseFalseFalseFalse
1900-02-02FalseFalseFalseFalseFalse
1900-02-03FalseFalseFalseFalseFalse
1900-02-04FalseFalseFalseFalseFalse
1900-02-05FalseFalseFalseFalseFalse
1900-02-06FalseFalseFalseFalseFalse
1900-02-07FalseFalseFalseFalseFalse
1900-02-08FalseFalseFalseFalseFalse
\n", - "
" - ], - "text/plain": [ - " a b c x y\n", - "1900-01-30 False False False False False\n", - "1900-01-31 False False False False False\n", - "1900-02-01 False False False False False\n", - "1900-02-02 False False False False False\n", - "1900-02-03 False False False False False\n", - "1900-02-04 False False False False False\n", - "1900-02-05 False False False False False\n", - "1900-02-06 False False False False False\n", - "1900-02-07 False False False False False\n", - "1900-02-08 False False False False False" - ] - }, - "execution_count": 44, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.isnull() # Comprueba valores nulos, devuelve un Boolean Arrays." - ] - }, - { - "cell_type": "code", - "execution_count": 45, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
abcxy
1900-01-30TrueTrueTrueTrueTrue
1900-01-31TrueTrueTrueTrueTrue
1900-02-01TrueTrueTrueTrueTrue
1900-02-02TrueTrueTrueTrueTrue
1900-02-03TrueTrueTrueTrueTrue
1900-02-04TrueTrueTrueTrueTrue
1900-02-05TrueTrueTrueTrueTrue
1900-02-06TrueTrueTrueTrueTrue
1900-02-07TrueTrueTrueTrueTrue
1900-02-08TrueTrueTrueTrueTrue
\n", - "
" - ], - "text/plain": [ - " a b c x y\n", - "1900-01-30 True True True True True\n", - "1900-01-31 True True True True True\n", - "1900-02-01 True True True True True\n", - "1900-02-02 True True True True True\n", - "1900-02-03 True True True True True\n", - "1900-02-04 True True True True True\n", - "1900-02-05 True True True True True\n", - "1900-02-06 True True True True True\n", - "1900-02-07 True True True True True\n", - "1900-02-08 True True True True True" - ] - }, - "execution_count": 45, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.notnull() # El opuesto a df.isnull()." - ] - }, - { - "cell_type": "code", - "execution_count": 46, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
abcxy
1900-01-300.4713040.3466600.6258570.1997570.326658
1900-01-310.6397310.8346990.1558160.4242620.465353
1900-02-010.9868180.4698960.5343620.7480240.064895
1900-02-020.6733670.5240890.9174700.7547540.999615
1900-02-030.6604630.1271520.9512550.7940990.021641
1900-02-040.0833330.1994990.7068080.4934640.679596
1900-02-050.5188920.8556670.0284150.5118200.851935
1900-02-060.4573810.3198300.3391670.4631010.417858
1900-02-070.1044480.1535280.4849920.7948210.702888
1900-02-080.6638230.0820650.3829270.4527810.586594
\n", - "
" - ], - "text/plain": [ - " a b c x y\n", - "1900-01-30 0.471304 0.346660 0.625857 0.199757 0.326658\n", - "1900-01-31 0.639731 0.834699 0.155816 0.424262 0.465353\n", - "1900-02-01 0.986818 0.469896 0.534362 0.748024 0.064895\n", - "1900-02-02 0.673367 0.524089 0.917470 0.754754 0.999615\n", - "1900-02-03 0.660463 0.127152 0.951255 0.794099 0.021641\n", - "1900-02-04 0.083333 0.199499 0.706808 0.493464 0.679596\n", - "1900-02-05 0.518892 0.855667 0.028415 0.511820 0.851935\n", - "1900-02-06 0.457381 0.319830 0.339167 0.463101 0.417858\n", - "1900-02-07 0.104448 0.153528 0.484992 0.794821 0.702888\n", - "1900-02-08 0.663823 0.082065 0.382927 0.452781 0.586594" - ] - }, - "execution_count": 46, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.dropna() # Elimina todas las filas que contienen valores nulos." - ] - }, - { - "cell_type": "code", - "execution_count": 47, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
abcxy
1900-01-300.4713040.3466600.6258570.1997570.326658
1900-01-310.6397310.8346990.1558160.4242620.465353
1900-02-010.9868180.4698960.5343620.7480240.064895
1900-02-020.6733670.5240890.9174700.7547540.999615
1900-02-030.6604630.1271520.9512550.7940990.021641
1900-02-040.0833330.1994990.7068080.4934640.679596
1900-02-050.5188920.8556670.0284150.5118200.851935
1900-02-060.4573810.3198300.3391670.4631010.417858
1900-02-070.1044480.1535280.4849920.7948210.702888
1900-02-080.6638230.0820650.3829270.4527810.586594
\n", - "
" - ], - "text/plain": [ - " a b c x y\n", - "1900-01-30 0.471304 0.346660 0.625857 0.199757 0.326658\n", - "1900-01-31 0.639731 0.834699 0.155816 0.424262 0.465353\n", - "1900-02-01 0.986818 0.469896 0.534362 0.748024 0.064895\n", - "1900-02-02 0.673367 0.524089 0.917470 0.754754 0.999615\n", - "1900-02-03 0.660463 0.127152 0.951255 0.794099 0.021641\n", - "1900-02-04 0.083333 0.199499 0.706808 0.493464 0.679596\n", - "1900-02-05 0.518892 0.855667 0.028415 0.511820 0.851935\n", - "1900-02-06 0.457381 0.319830 0.339167 0.463101 0.417858\n", - "1900-02-07 0.104448 0.153528 0.484992 0.794821 0.702888\n", - "1900-02-08 0.663823 0.082065 0.382927 0.452781 0.586594" - ] - }, - "execution_count": 47, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.dropna(axis = 1) # Elimina todas las columnas que contienen valores nulos" - ] - }, - { - "cell_type": "code", - "execution_count": 48, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
abcxy
1900-01-300.4713040.3466600.6258570.1997570.326658
1900-01-310.6397310.8346990.1558160.4242620.465353
1900-02-010.9868180.4698960.5343620.7480240.064895
1900-02-020.6733670.5240890.9174700.7547540.999615
1900-02-030.6604630.1271520.9512550.7940990.021641
1900-02-040.0833330.1994990.7068080.4934640.679596
1900-02-050.5188920.8556670.0284150.5118200.851935
1900-02-060.4573810.3198300.3391670.4631010.417858
1900-02-070.1044480.1535280.4849920.7948210.702888
1900-02-080.6638230.0820650.3829270.4527810.586594
\n", - "
" - ], - "text/plain": [ - " a b c x y\n", - "1900-01-30 0.471304 0.346660 0.625857 0.199757 0.326658\n", - "1900-01-31 0.639731 0.834699 0.155816 0.424262 0.465353\n", - "1900-02-01 0.986818 0.469896 0.534362 0.748024 0.064895\n", - "1900-02-02 0.673367 0.524089 0.917470 0.754754 0.999615\n", - "1900-02-03 0.660463 0.127152 0.951255 0.794099 0.021641\n", - "1900-02-04 0.083333 0.199499 0.706808 0.493464 0.679596\n", - "1900-02-05 0.518892 0.855667 0.028415 0.511820 0.851935\n", - "1900-02-06 0.457381 0.319830 0.339167 0.463101 0.417858\n", - "1900-02-07 0.104448 0.153528 0.484992 0.794821 0.702888\n", - "1900-02-08 0.663823 0.082065 0.382927 0.452781 0.586594" - ] - }, - "execution_count": 48, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.dropna(axis = 1) # Elimina todas las columnas que contienen valores nulos." - ] - }, - { - "cell_type": "code", - "execution_count": 51, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
abcxy
1900-01-300.4713040.3466600.6258570.1997570.326658
1900-01-310.6397310.8346990.1558160.4242620.465353
1900-02-010.9868180.4698960.5343620.7480240.064895
1900-02-020.6733670.5240890.9174700.7547540.999615
1900-02-030.6604630.1271520.9512550.7940990.021641
1900-02-040.0833330.1994990.7068080.4934640.679596
1900-02-050.5188920.8556670.0284150.5118200.851935
1900-02-060.4573810.3198300.3391670.4631010.417858
1900-02-070.1044480.1535280.4849920.7948210.702888
1900-02-080.6638230.0820650.3829270.4527810.586594
\n", - "
" - ], - "text/plain": [ - " a b c x y\n", - "1900-01-30 0.471304 0.346660 0.625857 0.199757 0.326658\n", - "1900-01-31 0.639731 0.834699 0.155816 0.424262 0.465353\n", - "1900-02-01 0.986818 0.469896 0.534362 0.748024 0.064895\n", - "1900-02-02 0.673367 0.524089 0.917470 0.754754 0.999615\n", - "1900-02-03 0.660463 0.127152 0.951255 0.794099 0.021641\n", - "1900-02-04 0.083333 0.199499 0.706808 0.493464 0.679596\n", - "1900-02-05 0.518892 0.855667 0.028415 0.511820 0.851935\n", - "1900-02-06 0.457381 0.319830 0.339167 0.463101 0.417858\n", - "1900-02-07 0.104448 0.153528 0.484992 0.794821 0.702888\n", - "1900-02-08 0.663823 0.082065 0.382927 0.452781 0.586594" - ] - }, - "execution_count": 51, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Elimina todas las filas que tienen menos de 3 valores no nulos.\n", - "df.dropna(axis = 1, thresh = 3)" - ] - }, - { - "cell_type": "code", - "execution_count": 52, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
abcxy
1900-01-300.4713040.3466600.6258570.1997570.326658
1900-01-310.6397310.8346990.1558160.4242620.465353
1900-02-010.9868180.4698960.5343620.7480240.064895
1900-02-020.6733670.5240890.9174700.7547540.999615
1900-02-030.6604630.1271520.9512550.7940990.021641
1900-02-040.0833330.1994990.7068080.4934640.679596
1900-02-050.5188920.8556670.0284150.5118200.851935
1900-02-060.4573810.3198300.3391670.4631010.417858
1900-02-070.1044480.1535280.4849920.7948210.702888
1900-02-080.6638230.0820650.3829270.4527810.586594
\n", - "
" - ], - "text/plain": [ - " a b c x y\n", - "1900-01-30 0.471304 0.346660 0.625857 0.199757 0.326658\n", - "1900-01-31 0.639731 0.834699 0.155816 0.424262 0.465353\n", - "1900-02-01 0.986818 0.469896 0.534362 0.748024 0.064895\n", - "1900-02-02 0.673367 0.524089 0.917470 0.754754 0.999615\n", - "1900-02-03 0.660463 0.127152 0.951255 0.794099 0.021641\n", - "1900-02-04 0.083333 0.199499 0.706808 0.493464 0.679596\n", - "1900-02-05 0.518892 0.855667 0.028415 0.511820 0.851935\n", - "1900-02-06 0.457381 0.319830 0.339167 0.463101 0.417858\n", - "1900-02-07 0.104448 0.153528 0.484992 0.794821 0.702888\n", - "1900-02-08 0.663823 0.082065 0.382927 0.452781 0.586594" - ] - }, - "execution_count": 52, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.fillna(0) # Remplaza todos los valores nulos por 0." - ] - }, - { - "cell_type": "code", - "execution_count": 53, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 2\n", - "1 7\n", - "2 3\n", - "3 9\n", - "4 5\n", - "dtype: int64" - ] - }, - "execution_count": 53, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.fillna(s.mean()) # Remplaza todos los valores nulos por la media." - ] - }, - { - "cell_type": "code", - "execution_count": 54, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 2.0\n", - "1 7.0\n", - "2 3.0\n", - "3 9.0\n", - "4 5.0\n", - "dtype: float64" - ] - }, - "execution_count": 54, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.astype(float) # Convierte el tipo de datos a float." - ] - }, - { - "cell_type": "code", - "execution_count": 55, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 2\n", - "1 7\n", - "2 100\n", - "3 9\n", - "4 5\n", - "dtype: object" - ] - }, - "execution_count": 55, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.replace(3, '100') # Remplaza todos los valores iguales a 3 con '100'." - ] - }, - { - "cell_type": "code", - "execution_count": 56, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 2\n", - "1 700\n", - "2 3\n", - "3 900\n", - "4 5\n", - "dtype: object" - ] - }, - "execution_count": 56, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.replace([7, 9], ['700', '900']) # Remplaza todos los 7 por '700' y 9 por '900'." - ] - }, - { - "cell_type": "code", - "execution_count": 65, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
13579
1900-01-300.4713040.3466600.6258570.1997570.326658
1900-01-310.6397310.8346990.1558160.4242620.465353
1900-02-010.9868180.4698960.5343620.7480240.064895
1900-02-020.6733670.5240890.9174700.7547540.999615
1900-02-030.6604630.1271520.9512550.7940990.021641
1900-02-040.0833330.1994990.7068080.4934640.679596
1900-02-050.5188920.8556670.0284150.5118200.851935
1900-02-060.4573810.3198300.3391670.4631010.417858
1900-02-070.1044480.1535280.4849920.7948210.702888
1900-02-080.6638230.0820650.3829270.4527810.586594
\n", - "
" - ], - "text/plain": [ - " 1 3 5 7 9\n", - "1900-01-30 0.471304 0.346660 0.625857 0.199757 0.326658\n", - "1900-01-31 0.639731 0.834699 0.155816 0.424262 0.465353\n", - "1900-02-01 0.986818 0.469896 0.534362 0.748024 0.064895\n", - "1900-02-02 0.673367 0.524089 0.917470 0.754754 0.999615\n", - "1900-02-03 0.660463 0.127152 0.951255 0.794099 0.021641\n", - "1900-02-04 0.083333 0.199499 0.706808 0.493464 0.679596\n", - "1900-02-05 0.518892 0.855667 0.028415 0.511820 0.851935\n", - "1900-02-06 0.457381 0.319830 0.339167 0.463101 0.417858\n", - "1900-02-07 0.104448 0.153528 0.484992 0.794821 0.702888\n", - "1900-02-08 0.663823 0.082065 0.382927 0.452781 0.586594" - ] - }, - "execution_count": 65, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.columns = (0, 1, 2, 3, 4)\n", - "df.rename(columns = lambda x: 2*x + 1) # Cambio de nombre de columnas en masa." - ] - }, - { - "cell_type": "code", - "execution_count": 60, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
0one234
1900-01-300.4713040.3466600.6258570.1997570.326658
1900-01-310.6397310.8346990.1558160.4242620.465353
1900-02-010.9868180.4698960.5343620.7480240.064895
1900-02-020.6733670.5240890.9174700.7547540.999615
1900-02-030.6604630.1271520.9512550.7940990.021641
1900-02-040.0833330.1994990.7068080.4934640.679596
1900-02-050.5188920.8556670.0284150.5118200.851935
1900-02-060.4573810.3198300.3391670.4631010.417858
1900-02-070.1044480.1535280.4849920.7948210.702888
1900-02-080.6638230.0820650.3829270.4527810.586594
\n", - "
" - ], - "text/plain": [ - " 0 one 2 3 4\n", - "1900-01-30 0.471304 0.346660 0.625857 0.199757 0.326658\n", - "1900-01-31 0.639731 0.834699 0.155816 0.424262 0.465353\n", - "1900-02-01 0.986818 0.469896 0.534362 0.748024 0.064895\n", - "1900-02-02 0.673367 0.524089 0.917470 0.754754 0.999615\n", - "1900-02-03 0.660463 0.127152 0.951255 0.794099 0.021641\n", - "1900-02-04 0.083333 0.199499 0.706808 0.493464 0.679596\n", - "1900-02-05 0.518892 0.855667 0.028415 0.511820 0.851935\n", - "1900-02-06 0.457381 0.319830 0.339167 0.463101 0.417858\n", - "1900-02-07 0.104448 0.153528 0.484992 0.794821 0.702888\n", - "1900-02-08 0.663823 0.082065 0.382927 0.452781 0.586594" - ] - }, - "execution_count": 60, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.rename(columns={1:'one'}) # Renombrar seleccionando columna." - ] - }, - { - "cell_type": "code", - "execution_count": 64, - "metadata": {}, - "outputs": [], - "source": [ - "#df.set_index(5) # Cambiar el índice." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Filtro, orden y agrupamiento." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Unir / Combinar." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Estadísticas." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.5.3" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/06_statsmodels.ipynb b/07_statsmodels.ipynb similarity index 100% rename from 06_statsmodels.ipynb rename to 07_statsmodels.ipynb