-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Añadir corpus original Spanish Dish Tiltle. #43
base: main
Are you sure you want to change the base?
Changes from 6 commits
ea8bd1f
65a82df
28ca071
9372ba7
1d6ebe3
8639aba
836ae71
411e4bb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
{ | ||
"cells": [], | ||
"metadata": {}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,42 @@ | ||||||
# Platos de comida | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Propuesta para el estudio de sesgos: de dónde son las recetas? Incluyen recetas de diferentes países/continentes? |
||||||
## Descripción | ||||||
El siguiente dataset son imagenes con platos de comidas y su titulo. El dataset se creó haciendo scrapy a la siguiente página web <a href="https://www.recetasgratis.net/">Recetas gratis</a>, la metodología es la siguiente: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
1. Se obtiene el link de la página principal de la categoría de comida. | ||||||
2. Se obtiene el link de la página de cada receta. | ||||||
3. Se obtiene el link de la imagen de la receta. | ||||||
4. Se obtiene el titulo de la receta. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
## Imagenes | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
Las imagenes tienen un tamaño de 300x300 pixeles y se encuentran en formato jpg. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
## Metadatos | ||||||
Los metadatos que se encuentran en el dataset son los siguientes: | ||||||
+ **prompt**: Titulo de la receta. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
+ **source**: path de la imagen. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
+ **uuid**: Identificador único de la imagen. | ||||||
|
||||||
Nota 1: El dataset se encuentra en formato csv. | ||||||
Nota 2: El nombre de las imagenes tambien va el titulo | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
## Directorio | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Incluye por favor todos los ficheros y su explicación There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Especifica por favor la función del notebook en el nombre de |
||||||
```bash | ||||||
|-- README.md - Este archivo | ||||||
|-- dataset.csv - Dataset | ||||||
|-- images - Imagenes | ||||||
|-- src - Código fuente, en especial el script de scrapy | ||||||
``` | ||||||
## Análisis exploratorio de datos | ||||||
|
||||||
El ánilisis exploratorio se centra en el texto, para las imagenes tocaría aplicar herramientas de visión por computador como clip, para crear ciertas clasificaciones. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Incluye también una frase diciendo que el notebook está disponible con un enlace al notebook EDA.ipynb There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
### Análisis de texto | ||||||
|
||||||
<img src="nube_de_palabras.png"> | ||||||
En la imagen podemos ver las palabras más frecuentes para el texto, tambien podemos ver un boxplot del texto | ||||||
<img src="box_plot.png"> | ||||||
Aquí podemos ver como existen palabras muy pequeñas y muy grandes, por lo que recomendamos al usario que se fije en el texto para ver si le sirve el tamaño del texto | ||||||
<img src="distribution.png"> | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. En este caso igual |
||||||
En el siguiente histograma podemos ver la distribución de los tamaños de los textos, podemos ver que la mayoría de textos tienen un tamaño menor a 78 caracteres, el 75% del dataset tiene un tamaño de 31 caracteres. | ||||||
|
||||||
### Análisis de imagenes | ||||||
Se recomienda analizar por medio de redes neuronales, para sacar más provecho y verificar la correspondecia entre el prompt y la imagen. (Una idea es hacer esto con CLIP) | ||||||
|
||||||
<img src="dishes_prompt.png"> |
Large diffs are not rendered by default.
Large diffs are not rendered by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Este fichero no hace falta que lo incluyas, incluye .ipynb_checkpoints en el .gitignore :)