Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in nomalizing with long content to dork collection. #32

Open
sidra-asa opened this issue Mar 1, 2016 · 2 comments
Open

Error in nomalizing with long content to dork collection. #32

sidra-asa opened this issue Mar 1, 2016 · 2 comments

Comments

@sidra-asa
Copy link

I'm found some errors in mnemosyne.err as below.

OperationFailure: Btree::insert: key too large to index, failing mnemosyne.dork.$content_1 1233 { : "/999999.9+/%2A%2A/uNiOn/%2A%2A/aLl+/%2A%2A/sElEcT+0x393133353134353632312e39,0x393133353134353632322e39,0x393133353134353632332e39,0x39313335313435363..." }

It could be the content is too long to be indexed.
I've using hashed content as index key instead of text :

https://github.com/johnnykv/mnemosyne/blob/master/persistance/mnemodb.py#L48

from pymongo import MongoClient, HASHED


self.db.dork.ensure_index([('content', HASHED)], unique=False, background=True)

Now it seems work fine.
If any suggestion, please let me know.

@sh4t
Copy link

sh4t commented Aug 13, 2016

@sidra-asa

Were you still seeing errors after related to upsert?

Traceback (most recent call last):
  File "/opt/mnemosyne/env/local/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run
    result = self._run(*self.args, **self.kwargs)
  File "/opt/mnemosyne/normalizer/normalizer.py", line 125, in inserter
    self.database.insert_normalized(norm, id, identifier)
  File "/opt/mnemosyne/persistance/mnemodb.py", line 97, in insert_normalized
    upsert=True)
  File "/opt/mnemosyne/env/local/lib/python2.7/site-packages/pymongo/collection.py", line 552, in update
    _check_write_command_response(results)
  File "/opt/mnemosyne/env/local/lib/python2.7/site-packages/pymongo/helpers.py", line 205, in _check_write_command_response
    raise OperationFailure(error.get("errmsg"), error.get("code"), error)
OperationFailure: insertDocument :: caused by :: 17280 Btree::insert: key too large to index, failing mnemosyne.dork.$content_1 1127 { : "/suse/include/components/com_artlinks/support/mailling/maillist/inc/include/control/999999.9+%0BuNiOn%0BaLl+%0BsElEcT+0x393133353134353632312e39,0x393..." }
<Greenlet at 0x7f9f7d9db7d0: <bound method Normalizer.inserter of <normalizer.normalizer.Normalizer object at 0x7f9f7d9c5f90>>([([{'session': {'_id': ObjectId('57aee159e5645d38e)> failed with OperationFailure

I'm attempting the hashed index as well, though not recreating entire collcetion; failing still though I believe it is because of the upsert on update method.

mnemosyne/persistance/mnemodb.py line 97~

                elif collection is 'dork':
                    self.db[collection].update({'content': document['content'], 'type': document['type']},
                                               {'$set': {'lasttime': document['timestamp']},
                                                '$inc': {'count': document['count']}},
                                               upsert=True)

@sidra-asa
Copy link
Author

@sh4t

I dropped the index of dork content, and created hashed one.
I just checked the log , but there's no such error like yours.
Could you give it a try to see if error occurs ?

If any suggestion, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants