You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When indexing for test around 1000 TEI, elasticsearch was managing ~100.000 documents in the index. This is due I guess to the new nested objects - the bibliographical references are now managed as nested objects.
This could be an issue for scalability - for only 1 million TEI documents, it means 100 million documents in the index - from my experience it will seriously impact the search-time performance (one reason is that Lucene has to manage longer list of documents in the inverted indexes).
I think there is no need for nested object corresponding to the bibliographical references:
for the search applications, there is no "live" search for bibliographical references in documents,
for solving bibliographical references - an offline process, (i) we need much more sophisticated matching techniques (c.f. entity cooking) than what a search with ES can provide, (ii) we do not need to mix off-line search processing indexes with online user search index.
The text was updated successfully, but these errors were encountered:
When indexing for test around 1000 TEI, elasticsearch was managing ~100.000 documents in the index. This is due I guess to the new nested objects - the bibliographical references are now managed as nested objects.
This could be an issue for scalability - for only 1 million TEI documents, it means 100 million documents in the index - from my experience it will seriously impact the search-time performance (one reason is that Lucene has to manage longer list of documents in the inverted indexes).
I think there is no need for nested object corresponding to the bibliographical references:
The text was updated successfully, but these errors were encountered: