Fix Static Crawling Issue Due to Newly Implemented Anti-Scraping Mechanism #109
+44,250
−19,437
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
作者您好,
首先感謝您開發並分享這麼實用的專案。我在使用過程中發現,自從過年之後,原本透過靜態爬蟲requests去抓取http://isin.twse.com.tw/isin/C_public.jsp?strMode=2 上的所有股票代號資料的方法已經無法正常運作了。我推測這可能是網站加強了防爬機制的結果。
為了解決這個問題,我對fetch.py中的fetch_data函數進行了一番修正,改用Selenium進行動態爬蟲。考慮到可能有使用者會在無GUI環境下運行此專案,我有啟用了無頭模式(headless mode)。但...一旦啟用無頭模式後,就頻繁遇到連線失敗的問題。經過一番嘗試後,我發現了一個可行的解決方案:先訪問主頁面https://isin.twse.com.tw 並暫停幾秒,然後再去訪問目標URL,這樣就能順利獲取所需的資料了。
如果我的修改存在任何問題,或者有更好的解決方案,請隨時聯繫我。