Skip to content

Commit

Permalink
Semana 7 lista
Browse files Browse the repository at this point in the history
  • Loading branch information
joanby committed Dec 17, 2024
1 parent e1aa8ce commit 0b6edaa
Show file tree
Hide file tree
Showing 25 changed files with 7,663 additions and 1,034 deletions.
430 changes: 331 additions & 99 deletions week6/day1.ipynb

Large diffs are not rendered by default.

868 changes: 738 additions & 130 deletions week6/day2.ipynb

Large diffs are not rendered by default.

4,029 changes: 3,891 additions & 138 deletions week6/day3.ipynb

Large diffs are not rendered by default.

60 changes: 30 additions & 30 deletions week6/day4-results.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,21 @@
"id": "db8736a7-ed94-441c-9556-831fa57b5a10",
"metadata": {},
"source": [
"# The Product Pricer Continued\n",
"# El evaluador de precios de productos (continuación)\n",
"\n",
"A model that can estimate how much something costs, from its description.\n",
"Un modelo que puede estimar cuánto cuesta algo a partir de su descripción.\n",
"\n",
"## Enter The Frontier!\n",
"## ¡Nos marchamos a la Frontera!\n",
"\n",
"And now - we put Frontier Models to the test.\n",
"Y ahora, ponemos a prueba los modelos de Frontier.\n",
"\n",
"### 2 important points:\n",
"### Dos puntos importantes:\n",
"\n",
"It's important to appreciate that we aren't Training the frontier models. We're only providing them with the Test dataset to see how they perform. They don't gain the benefit of the 400,000 training examples that we provided to the Traditional ML models.\n",
"Es importante tener en cuenta que no estamos entrenando los modelos de Frontier. Solo les proporcionamos el conjunto de datos de prueba para ver cómo funcionan. No obtienen el beneficio de los 400 000 ejemplos de entrenamiento que proporcionamos a los modelos de ML tradicionales.\n",
"\n",
"HAVING SAID THAT...\n",
"DICHO ESTO...\n",
"\n",
"It's entirely possible that in their monstrously large training data, they've already been exposed to all the products in the training AND the test set. So there could be test \"contamination\" here which gives them an unfair advantage. We should keep that in mind."
"Es totalmente posible que en sus monstruosos datos de entrenamiento, ya hayan estado expuestos a todos los productos en el conjunto de entrenamiento Y de prueba. Por lo tanto, podría haber una \"contaminación\" de prueba aquí que les dé una ventaja injusta. Debemos tener eso en cuenta."
]
},
{
Expand Down Expand Up @@ -54,8 +54,8 @@
"metadata": {},
"outputs": [],
"source": [
"# moved our Tester into a separate package\n",
"# call it with Tester.test(function_name, test_dataset)\n",
"# movimos nuestro Tester a un paquete separado\n",
"# lo llamamos mediante Tester.test(function_name, test_dataset)\n",
"\n",
"from testing import Tester"
]
Expand All @@ -67,7 +67,7 @@
"metadata": {},
"outputs": [],
"source": [
"# environment\n",
"# entorno\n",
"\n",
"load_dotenv()\n",
"os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')\n",
Expand All @@ -93,7 +93,7 @@
}
],
"source": [
"# Log in to HuggingFace\n",
"# Log in en HuggingFace\n",
"\n",
"hf_token = os.environ['HF_TOKEN']\n",
"login(hf_token, add_to_git_credential=True)"
Expand Down Expand Up @@ -127,7 +127,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Let's avoid curating all our data again! Load in the pickle files:\n",
"# ¡Evitemos tener que volver a curar todos nuestros datos! Carguemos los archivos pickle:\n",
"\n",
"with open('train.pkl', 'rb') as file:\n",
" train = pickle.load(file)\n",
Expand All @@ -141,9 +141,9 @@
"id": "e5856173-e68c-4975-a769-5f1736e227a5",
"metadata": {},
"source": [
"# Before we look at the Frontier\n",
"# Antes de analizar los modelos Frontera\n",
"\n",
"## There is one more model we could consider"
"## Hay un modelo más que podríamos considerar"
]
},
{
Expand All @@ -153,7 +153,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Write the test set to a CSV\n",
"# Escribe el conjunto de pruebas en un CSV\n",
"\n",
"import csv\n",
"with open('human_input.csv', 'w') as csvfile:\n",
Expand All @@ -169,7 +169,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Read it back in\n",
"# Lo leemos de vuelta\n",
"\n",
"human_predictions = []\n",
"with open('human_output.csv', 'r') as csvfile:\n",
Expand Down Expand Up @@ -472,9 +472,9 @@
"id": "066fef03-8338-4526-9df3-89b649ad4f0a",
"metadata": {},
"source": [
"## First, the humble but mighty GPT-4o-mini\n",
"## Primero, el humilde pero poderoso GPT-4o-mini\n",
"\n",
"It's called mini, but it packs a punch."
"Se llama mini, pero es muy potente."
]
},
{
Expand All @@ -484,14 +484,14 @@
"metadata": {},
"outputs": [],
"source": [
"# First let's work on a good prompt for a Frontier model\n",
"# Notice that I'm removing the \" to the nearest dollar\"\n",
"# When we train our own models, we'll need to make the problem as easy as possible, \n",
"# but a Frontier model needs no such simplification.\n",
"# Primero, trabajemos en un buen mensaje para un modelo Frontier\n",
"# Observe que estoy eliminando el \"al dólar más cercano\"\n",
"# Cuando entrenemos nuestros propios modelos, necesitaremos hacer que el problema sea lo más fácil posible,\n",
"# pero un modelo Frontier no necesita tal simplificación.\n",
"\n",
"def messages_for(item):\n",
" system_message = \"You estimate prices of items. Reply only with the price, no explanation\"\n",
" user_prompt = item.test_prompt().replace(\" to the nearest dollar\",\"\").replace(\"\\n\\nPrice is $\",\"\")\n",
" system_message = \"Estimas los precios de los artículos. Respondes solo con el precio, sin explicaciones.\"\n",
" user_prompt = item.test_prompt().replace(\" al dólar más cercano\",\"\").replace(\"\\n\\nPrice is $\",\"\")\n",
" return [\n",
" {\"role\": \"system\", \"content\": system_message},\n",
" {\"role\": \"user\", \"content\": user_prompt},\n",
Expand Down Expand Up @@ -529,7 +529,7 @@
}
],
"source": [
"# Try this out\n",
"# Vamos a probarlo\n",
"\n",
"messages_for(test[0])"
]
Expand All @@ -541,7 +541,7 @@
"metadata": {},
"outputs": [],
"source": [
"# A utility function to extract the price from a string\n",
"# Una función de utilidad para extraer el precio de un string\n",
"\n",
"def get_price(s):\n",
" s = s.replace('$','').replace(',','')\n",
Expand All @@ -567,7 +567,7 @@
}
],
"source": [
"get_price(\"The price is roughly $99.99 because blah blah\")"
"get_price(\"El precio es de aproximadamente $99,99 porque bla, bla, bla.\")"
]
},
{
Expand All @@ -577,7 +577,7 @@
"metadata": {},
"outputs": [],
"source": [
"# The function for gpt-4o-mini\n",
"# La función para gpt-4o-mini\n",
"\n",
"def gpt_4o_mini(item):\n",
" response = openai.chat.completions.create(\n",
Expand Down Expand Up @@ -895,7 +895,7 @@
"metadata": {},
"outputs": [],
"source": [
"# The function for gpt-4o - the August model\n",
"# La función para gpt-4o - para el modelo de Agosto\n",
"\n",
"def gpt_4o_frontier(item):\n",
" response = openai.chat.completions.create(\n",
Expand Down
Loading

0 comments on commit 0b6edaa

Please sign in to comment.