問題描述
BeautifulSoup:如何以 datwtime 格式獲取 youtube 視頻的發布日期時間? (BeautifulSoup: How to get publish datetime of a youtube video in datwtime format?)
在我的爬蟲的一部分中,我需要以 youtube 視頻的日期時間格式抓取發布的時間和日期。我正在使用 bs4,到目前為止,我可以按照 YT GUI 向我們顯示的方式獲得發布的時間格式,即“發佈於 2017 年 5 月 6 日”。但我無法檢索實際的日期時間。我該怎麼做?
我的代碼:
video_obj["date_published"] = video_soup.find("strong", attrs={"class": "watch‑time‑text"}).text
return video_obj["date_published"]
輸出:
Published on Feb 8, 2020
我想要的方式:
YYYY‑MM‑DD HH:MM:SS
參考解法
方法 1:
Once you get:
Published on Feb 8, 2020
You can do following to remove "Published on"
date_string = soup_string.strip("Published on")
To get this in format of YYYY‑MM‑DD HH:MM:SS you can use python‑dateutil library in python. You can install it using:
pip install python‑dateutil
Code:
from dateutil import parser
formatted_date = parser.parse("Published on Feb 8, 2020", fuzzy=True)
This will output date in YYYY‑MM‑DD HH:MM:SS
You can read more about python‑dateutil parser here
方法 2:
You could use pythons datetime to parse the String and Format the output.
pubstring = video_obj["date_published"] # "Published on Feb 8, 2020"
# pubstring[:13] cuts of first 13 chars
dt = datetime.datetime.strptime(pubstring[13:], "%b %d, %Y")
return dt.strftime("%F") # Format as needed
(by Proteeti Prova、Chinmay Atrawalkar、Ben)