Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using nested objects #51

Open
kermitt2 opened this issue Nov 26, 2016 · 0 comments
Open

Using nested objects #51

kermitt2 opened this issue Nov 26, 2016 · 0 comments

Comments

@kermitt2
Copy link
Member

When indexing for test around 1000 TEI, elasticsearch was managing ~100.000 documents in the index. This is due I guess to the new nested objects - the bibliographical references are now managed as nested objects.

This could be an issue for scalability - for only 1 million TEI documents, it means 100 million documents in the index - from my experience it will seriously impact the search-time performance (one reason is that Lucene has to manage longer list of documents in the inverted indexes).

I think there is no need for nested object corresponding to the bibliographical references:

  • for the search applications, there is no "live" search for bibliographical references in documents,
  • for solving bibliographical references - an offline process, (i) we need much more sophisticated matching techniques (c.f. entity cooking) than what a search with ES can provide, (ii) we do not need to mix off-line search processing indexes with online user search index.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant