ZeroDivisionError when training with zero-length data #49

haywhisksoftware · 2014-02-04T22:15:53Z

(Minor bug.)
I installed scrapely from pip this morning.

This is a wacky edge case, but I think you could raise a more constructive error.

(Who wants to extract a zero-length string from a document? It's a bit like a magician pulling some atmosphere out of a hat: it's always going to be there...)

Check it out:

In [97]: from scrapely import Scraper

In [98]: s = Scraper()

In [99]: s.train('http://www.google.com', {'image': u''})
- - - - - - - - - - - - - - - - -
ZeroDivisionError                         Traceback (most recent call last)
/home/username/myfolder/<ipython-input-99-233d0ac90e7f> in <module>()
----> 1 s.train('http://www.google.com', {'image': u''})

/usr/local/lib/python2.7/dist-packages/scrapely/__init__.pyc in train(self, url, data, encoding)
     44     def train(self, url, data, encoding=None):
     45         page = url_to_page(url, encoding)
---> 46         self.train_from_htmlpage(page, data)
     47 
     48     def scrape(self, url, encoding=None):

/usr/local/lib/python2.7/dist-packages/scrapely/__init__.pyc in train_from_htmlpage(self, htmlpage, data)
     39                 if isinstance(value, str):
     40                     value = value.decode(htmlpage.encoding or 'utf-8')
---> 41                 tm.annotate(field, best_match(value))
     42         self.add_template(tm.get_template())
     43 

/usr/local/lib/python2.7/dist-packages/scrapely/template.pyc in annotate(self, field, score_func, best_match)
     31 
     32         """
---> 33         indexes = self.select(score_func)
     34         if not indexes:
     35             raise FragmentNotFound("Fragment not found annotating %r using: %s" % 

/usr/local/lib/python2.7/dist-packages/scrapely/template.pyc in select(self, score_func)
     46         matches = []
     47         for i, fragment in enumerate(htmlpage.parsed_body):
---> 48             score = score_func(fragment, htmlpage)
     49             if score:
     50                 matches.append((score, i))

/usr/local/lib/python2.7/dist-packages/scrapely/template.pyc in func(fragment, page)
     95         fdata = page.fragment_data(fragment).strip()
     96         if text in fdata:
---> 97             return float(len(text)) / len(fdata) - (1e-6 * fragment.start)
     98         else:
     99             return 0.0

ZeroDivisionError: float division by zero

The text was updated successfully, but these errors were encountered:

ironmaniiith · 2016-02-28T21:49:10Z

This is the reason for the error.

return float(len(text)) / len(fdata) - (1e-6 * fragment.start)

If the float that is being returned is inversely proportional to length of fdata, can we just write this.?

fdata = page.fragment_data(fragment).strip()
if text in fdata:
    if not len(fdata):
        return float("inf")
    return float(len(text)) / len(fdata) - (1e-6 * fragment.start)
else:
    return 0.0
return func

moneypython · 2016-08-27T17:33:35Z

This isn't a wacky edge-case at all.

I got the same error using actual data and had to patch it.

marekyggdrasil · 2019-11-27T14:54:07Z

Same here, I reproduced this error using regular, non-empty data.

patch for issue #49 and fixed Travis tests

marekyggdrasil · 2019-11-29T01:43:29Z

the patch has been merged, I believe this issue can be closed?

pablohoffman changed the title ~~Train with zero-length expected text -> ZeroDivisionError~~ ZeroDivisionError when training with zero-length data Apr 25, 2014

marekyggdrasil added a commit to marekyggdrasil/scrapely that referenced this issue Nov 27, 2019

issue scrapy#49

f7eb748

marekyggdrasil mentioned this issue Nov 27, 2019

patch for issue #49 and fixed Travis tests #119

Merged

ruairif added a commit that referenced this issue Nov 28, 2019

Merge pull request #119 from marekyggdrasil/master

31b5881

patch for issue #49 and fixed Travis tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZeroDivisionError when training with zero-length data #49

ZeroDivisionError when training with zero-length data #49

haywhisksoftware commented Feb 4, 2014

ironmaniiith commented Feb 28, 2016

moneypython commented Aug 27, 2016

marekyggdrasil commented Nov 27, 2019

marekyggdrasil commented Nov 29, 2019

ZeroDivisionError when training with zero-length data #49

ZeroDivisionError when training with zero-length data #49

Comments

haywhisksoftware commented Feb 4, 2014

ironmaniiith commented Feb 28, 2016

moneypython commented Aug 27, 2016

marekyggdrasil commented Nov 27, 2019

marekyggdrasil commented Nov 29, 2019