Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSONDecodeError #26

Open
pdb159 opened this issue Jun 9, 2023 · 4 comments
Open

JSONDecodeError #26

pdb159 opened this issue Jun 9, 2023 · 4 comments

Comments

@pdb159
Copy link

pdb159 commented Jun 9, 2023

For certain dates i receive a JSONDecodeError as well as an AttributeError: 'ValueError' object has no attribute 'pos'. Does this mean there are no news articles available for the selected day and if yes is there a way to access GDELT directly to get the respective data for the date?

Thanks for the help!

@networks1
Copy link

I just had the same thing happen:

Traceback (most recent call last):
  File "C:\Python311\Lib\site-packages\gdeltdoc\helpers.py", line 15, in load_json
    result = json.loads(json_message)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\json\decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
               ^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Invalid \escape: line 1 column 99103 (char 99102)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python311\Lib\site-packages\gdeltdoc\helpers.py", line 15, in load_json
    result = json.loads(json_message)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\json\decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
               ^^^^^^^^^^^^^^^^^^^^^^
ValueError: Exceeds the limit (4300) for integer string conversion: value has 248854 digits; use sys.set_int_max_str_digits() to increase the limit

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\boss\Dropbox (ASU)\merck grant\gdelt-search.py", line 74, in <module>
    new_articles = gd.article_search(f)
                   ^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\gdeltdoc\api_client.py", line 79, in article_search
    articles = self._query("artlist", filters.query_string)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\gdeltdoc\api_client.py", line 168, in _query
    return load_json(response.content, self.max_depth_json_parsing)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\gdeltdoc\helpers.py", line 27, in load_json
    return load_json(json_message=new_message, max_recursion_depth=max_recursion_depth,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\gdeltdoc\helpers.py", line 20, in load_json
    idx_to_replace = int(e.pos)
                         ^^^^^
AttributeError: 'ValueError' object has no attribute 'pos'

@alex9smith
Copy link
Owner

Thanks for the reports! I'll do some digging and figure this out

@alex9smith
Copy link
Owner

@networks1 @pdb159 could you give me an example query that gives this error?

@networks1
Copy link

Running this should reproduce it. I can't remember the exact days. The first was in late September I think. There were a couple in December too.

date_generated = pd.date_range('2020-09-01','2020-12-31',freq ="D").strftime("%Y-%m-%d").tolist()
api_timeout = 5
for dt in date_generated:
    start_date = dt
    end_date = (datetime.strptime(dt,"%Y-%m-%d") + timedelta(days=1)).strftime("%Y-%m-%d")
    f = Filters(
        # keyword = kw,
        near = near(20,"COVID","vaccine"),
        start_date = start_date,
        end_date =  end_date,
        num_records = 250, 
        country = "US"
        )
    new_articles = gd.article_search(f)
    time.sleep(api_timeout)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants