問題描述
mongoDB aggregate() 在電子郵件對象集合中查找電子郵件時間 (mongoDB aggregate() finding email times in a collection of email objects)
我是 mongoDB 的新手,我正在使用的集合是由一個 python 文件構建的,該文件使用 gmail API 訪問我的 gmail。我正在嘗試查詢以查找一周中每一天的時間收到的電子郵件數量最多。這是我的嘗試:
db.gmail_data2.aggregate(
[{$unwind: '$payload.headers' },
{$match:{'payload.headers.name':'Date'}},
{$project:{email_hour:{$hour:"payload.headers.value"}}},
{$group:{_id:"$email_hour",total:{$max:{$sum:1}}}}])
這是電子郵件對象的樣子:
{
"id": string,
"threadId": string,
"labelIds": [
string
],
"snippet": string,
"historyId": unsigned long,
"internalDate": long,
"payload": {
"partId": string,
"mimeType": string,
"filename": string,
"headers": [
{
"name": string,
"value": string
}
],
"body": users.messages.attachments Resource,
"parts": [
(MessagePart)
]
},
"sizeEstimate": integer,
"raw": bytes
}
電子郵件的日期在對象的 payload.headers 部分,其中“名稱”是日期和“值”是 ISODate 格式的日期。該查詢在沒有 $max 命令的情況下工作,並給出了所有電子郵件中每小時的電子郵件數量。一旦我輸入 $max 命令,它就會輸出: { "_id" : hour, "total" : 1 }
每小時。
參考解法
方法 1:
you'll need to do something like this. First $group stage to group the emails by date and hour while counting emails for each hour and the final group to group by date and pick the max email count for each day while pushing the email count by hour into an array. $project stage to $filter the max email hour row by comparing the previous email max count with the array's count and return the matching value. You can add another project stage at the end if you would like to only get the hour.
aggregate(
[{
$unwind: '$payload.headers'
}, {
$match: {
'payload.headers.name': 'Date'
}
}, {
$group: {
_id: {
email_date: {
$dateToString: {
format: "%Y‑%m‑%d",
date: '$payload.headers.value'
}
},
email_hour: {
$hour: '$payload.headers.value'
}
},
count: {
$sum: 1
}
}
}, {
$group: {
_id: '$_id.email_date',
email_by_hour: {
$push: {
email_hour: '$_id.email_hour',
count: '$count'
}
},
max_count: {
$max: '$count'
}
}
}, {
$project: {
_id: 0,
email_date: '$_id',
max_email_hour: {
$filter: {
input: '$email_by_hour',
as: 'item',
cond: {
$eq: ['$$item.count', '$max_count']
}
}
}
}
}])
Sample Output:
{ "email_date" : "2016‑11‑21", "max_email_hour" : [ { "email_hour" : 1, "count" : 3 } ] }
{ "email_date" : "2016‑11‑20", "max_email_hour" : [ { "email_hour" : 12, "count" : 2 } ] }