在 gae flexible 上長時間運行的雲任務會提前終止而不會出錯。如何調試？我錯過了什麼？ (Long running cloud task on gae flexible terminates early without error. How to debug? What am I missing?)

問題描述

我正在使用 python 和燒瓶靈活地在 gae 上運行應用程序。我定期使用 cron 作業調度雲任務。這些基本上循環遍歷所有用戶並執行一些聚類分析。任務終止而不會引發任何類型的錯誤，但不會執行所有工作（意味著並非所有用戶都被循環通過）。它似乎不會在 276.5s ‑ 323.3s 的一致時間發生，也不會在同一個用戶處停止。有沒有人經歷過類似的事情？

我的猜測是我在某處違反了某種類型的資源限製或超時。我想過或嘗試過的事情：

應該允許雲任務運行長達一個小時（根據這個：https://cloud.google.com/tasks/docs/creating‑appengine‑handlers)
我將 gunicorn workers 的超時時間增加到 3600反映這一點。
我有幾個工人正在運行。
我試圖找出是否有內存峰值或cpu過載，但沒有發現任何可疑之處。

對不起，如果我太含糊或完全沒有抓住重點，我對這個問題感到很困惑。感謝您的任何指點。

我試圖找出是否有內存峰值或cpu過載，但沒有發現任何可疑之處。

對不起，如果我太含糊或完全沒有抓住重點，我對這個問題感到很困惑。感謝您的任何指點。

我試圖找出是否有內存峰值或cpu過載，但沒有發現任何可疑之處。

對不起，如果我太含糊或完全沒有抓住重點，我對這個問題感到很困惑。感謝您的任何指點。

參考解法

方法 1:

Thank you for all the suggestions, I played around with them and have found out the root cause, although by accident reading firestore documentation. I had no indication that this had anything to do with firestore.

From here: https://googleapis.dev/python/firestore/latest/collection.html I found out that Query.stream() (or Query.get()) has a timeout on the individual documents like so:

Note: The underlying stream of responses will time out after the max_rpc_timeout_millis value set in the GAPIC client configuration for the RunQuery API. Snapshots not consumed from the iterator before that point will be lost.

So what eventually timed out was the query of all users, I came across this by chance, none of the errors I caught pointed me back towards the query. Hope this helps someone in the future!

方法 2:

Other than use Cloud Scheduler, you can inspect the logs to make sure the Tasks ran properly and make sure there's no deadline issues. As application logs are grouped, and after the task itself is executed, it’s sent to Stackdriver. When a task is forcibly terminated, no log may be output. Try catching the Deadline exception so that some log is output and you may see some helpful info to start troubleshooting.

(by Lennart Paar、Lennart Paar、Michael T)

參考文件

Long running cloud task on gae flexible terminates early without error. How to debug? What am I missing? (CC BY‑SA 2.5/3.0/4.0)

在 gae flexible 上長時間運行的雲任務會提前終止而不會出錯。如何調試？我錯過了什麼？ (Long running cloud task on gae flexible terminates early without error. How to debug? What am I missing?)

問題描述

參考解法

方法 1:

方法 2:

參考文件

相關問題

留言討論

在 gae flexible 上長時間運行的雲任務會提前終止而不會出錯。如何調試？我錯過了什麼？ (Long running cloud task on gae flexible terminates early without error. How to debug? What am I missing?)

問題描述

參考解法

方法 1:

方法 2:

參考文件

相關問題

Heroku 無法啟動我的應用程序，但 `foreman start` 工作 (Heroku fails to start my app, but `foreman start` works)

為什麼不再推薦 gunicorn_django ？ (Why is gunicorn_django not recommended anymore?)

Supervisord 拋出錯誤：“無法執行 /var/application/gunicorn_start：ENOEXEC” (Supervisord throws error: "couldn't exec /var/application/gunicorn_start: ENOEXEC")

服務器上更改的 HTML 文件未反映 (The HTML file changed on server is not reflected)

在 gae flexible 上長時間運行的雲任務會提前終止而不會出錯。如何調試？我錯過了什麼？ (Long running cloud task on gae flexible terminates early without error. How to debug? What am I missing?)

Gunicorn 沒有自動啟動 (Gunicorn not starting automatically)

nginx的配置文件中的主機名未知？ (Hostname in configfile of nginx unkown?)

重新啟動 gunicorn 和 nginx 不會反映拉取請求後的更改 (restarting gunicorn and nginx doesn't reflect changes after pull request)

當 gunicorn / celery 服務重新啟動時，Django 中有沒有辦法只執行一次 python 代碼？ (Is there a way in Django to execute some python code only once when gunicorn / celery services are getting restarted?)

ModuleNotFoundError：在 Heroku 服務器上使用 Django 和 Gunicorn 時沒有名為“App Name”的模塊 (ModuleNotFoundError: No module named 'App Name' when using Django and Gunicorn on a heroku server)

kubernetes 正在暴露未聲明的端口 (kubernetes is exposing not declared port)

PM2.js 在 Virtualenv/Anaconda 環境中運行 Gunicorn/Flask 應用程序 (PM2.js to Run Gunicorn/Flask App inside Virtualenv/Anaconda env)

留言討論