Skip to content

Trivial fixes for the "Fetch surrounding chunks" notebook #268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -204,9 +204,9 @@
"## Import model\n",
"Using the eland_import_hub_model script, download and install all-MiniLM-L6-v2 transformer model. Setting the NLP --task-type as text_embedding.\n",
"\n",
"To get the cloud id, go to Elastic cloud and On the deployment overview page, copy down the Cloud ID.\n",
"To get the Cloud ID, go to Elastic cloud and on the deployment overview page, copy down the Cloud ID.\n",
"\n",
"To authenticate your request, You could use API key. Alternatively, you can use your cloud deployment username and password."
"To authenticate your request, you could use API key. Alternatively, you can use your cloud deployment username and password."
],
"metadata": {
"id": "rOWheQ-uJE2C"
Expand Down Expand Up @@ -277,9 +277,9 @@
"# delete model if already downloaded and deployed\n",
"try:\n",
" esclient.ml.delete_trained_model(model_id=elser_model_id, force=True)\n",
" print(\"Model deleted successfully, We will proceed with creating one\")\n",
" print(\"Model deleted successfully, we will proceed with creating one\")\n",
"except exceptions.NotFoundError:\n",
" print(\"Model doesn't exist, but We will proceed with creating one\")\n",
" print(\"Model doesn't exist, but we will proceed with creating one\")\n",
"\n",
"# Creates the ELSER model configuration. Automatically downloads the model if it doesn't exist.\n",
"esclient.ml.put_trained_model(\n",
Expand Down Expand Up @@ -310,10 +310,10 @@
" )\n",
"\n",
" if status[\"trained_model_configs\"][0][\"fully_defined\"]:\n",
" print(\"ELSER Model is downloaded and ready to be deployed.\")\n",
" print(\"ELSER model is downloaded and ready to be deployed.\")\n",
" break\n",
" else:\n",
" print(\"ELSER Model is downloaded but not ready to be deployed.\")\n",
" print(\"ELSER model is downloaded but not ready to be deployed.\")\n",
" time.sleep(5)"
],
"metadata": {
Expand Down Expand Up @@ -346,10 +346,10 @@
" model_id=elser_model_id,\n",
" )\n",
" if status[\"trained_model_stats\"][0][\"deployment_stats\"][\"state\"] == \"started\":\n",
" print(\"ELSER Model has been successfully deployed.\")\n",
" print(\"ELSER model has been successfully deployed.\")\n",
" break\n",
" else:\n",
" print(\"ELSER Model is currently being deployed.\")\n",
" print(\"ELSER model is currently being deployed.\")\n",
" time.sleep(5)"
],
"metadata": {
Expand Down Expand Up @@ -697,9 +697,9 @@
" for nested_hit in nested_hits:\n",
" chunk_number = nested_hit[\"_source\"][\"chunk_number\"]\n",
" text = nested_hit[\"_source\"][\"text\"]\n",
" # print(f\"Text from Chunk {chunk_number}: {text}\")\n",
" # print(f\"Text from chunk {chunk_number}: {text}\")\n",
" print(\n",
" f\"\\n\\nText from Chunk {chunk_number}: {textwrap.fill(first_passage_text, width=200)}\"\n",
" f\"\\n\\nText from chunk {chunk_number}: {textwrap.fill(first_passage_text, width=200)}\"\n",
" )\n",
" else:\n",
" print(\"No hits found.\")\n",
Expand Down Expand Up @@ -966,7 +966,7 @@
"source": [
"## Fetch and Process the Book Text\n",
"\n",
"This section downloads the full text of \"Harry Potter and the Sorcerer's Stone\" from a specified URL and processes it to extract chapters and their titles. The text is then structured into a pandas DataFrame for further analysis and indexing.\n",
"This section downloads the full text of \"Harry Potter and the Sorcerer's Stone\" from a specified URL and processes it to extract chapters and their titles. The text is then structured into a Pandas DataFrame for further analysis and indexing.\n",
"\n",
"### Key Steps:\n",
"1. **Download Text**: The book is fetched using `urllib.request` from the provided URL.\n",
Expand Down Expand Up @@ -1052,7 +1052,7 @@
"source": [
"## Indexing DataFrame into Elasticsearch\n",
"\n",
"This section uploads the structured data from a pandas DataFrame into a specified Elasticsearch index. The DataFrame contains chapter information from \"Harry Potter and the Sorcerer's Stone\", including chapter titles, full texts, and additional metadata.\n",
"This section uploads the structured data from a Pandas DataFrame into a specified Elasticsearch index. The DataFrame contains chapter information from \"Harry Potter and the Sorcerer's Stone\", including chapter titles, full texts, and additional metadata.\n",
"\n",
"### Key Operation:\n",
"- **Index Data**: The `index_dataframe` function is called with the Elasticsearch client, the raw source index name, and the DataFrame as arguments. This operation effectively uploads the data into Elasticsearch, making it searchable and ready for further processing.\n"
Expand Down Expand Up @@ -1318,4 +1318,4 @@
]
}
]
}
}
Loading