解决正则表示式匹配($regex)引起的一次mongo数据库cpu占用率高的问题_正则表达式

某一天，监控到mongo数据库cpu使用率高了很多，查了一下，发现是下面这种语句引起的：

									db.example_collection.find({

									 "idField" : 

									{ "$regex" : "123456789012345678"

									} ,

									 "dateField" : 

									{ "$regex" : "2019/10/10"

									}})

通常，遇到这种情况，我第一反应是缺少相关字段的索引，导致每执行一次这种语句都会全表扫描一次。

但是我用explain( )语句分析了下，发现上面所涉及的两个字段idField、dateField是有索引的，并且该语句也是有使用到索引的。如下为explain( )的结果：

									mgset-11111111:PRIMARY> db.example_collection.find({ "idField" : { "$regex" : "123456789012345678"} , "dateField" : { "$regex" : "2019/10/10"}}).explain("queryPlanner")

									{

									    "queryPlanner" : {

									        "plannerVersion" : 1,

									        "namespace" : "example_db.example_collection",

									        "indexFilterSet" : false,

									        "parsedQuery" : {

									            "$and" : [

									                {

									                    "idField" : {

									                        "$regex" : "123456789012345678"

									                    }

									                },

									                {

									                    "dateField" : {

									                        "$regex" : "2019/10/10"

									                    }

									                }

									            ]

									        },

									        "winningPlan" : {

									            "stage" : "FETCH",

									            "inputStage" : {

									                "stage" : "IXSCAN",

									                "filter" : {

									                    "$and" : [

									                        {

									                            "idField" : {

									                                "$regex" : "123456789012345678"

									                            }

									                        },

									                        {

									                            "dateField" : {

									                                "$regex" : "2019/10/10"

									                            }

									                        }

									                    ]

									                },

									                "keyPattern" : {

									                    "idField" : 1,

									                    "dateField" : 1

									                },

									                "indexName" : "idField_1_dateField_1",

									                "isMultiKey" : false,

									                "multiKeyPaths" : {

									                    "idField" : [ ],

									                    "dateField" : [ ]

									                },

									                "isUnique" : false,

									                "isSparse" : false,

									                "isPartial" : false,

									                "indexVersion" : 2,

									                "direction" : "forward",

									                "indexBounds" : {

									                    "idField" : [

									                        "[\"\", {})",

									                        "[/123456789012345678/, /123456789012345678/]"

									                    ],

									                    "dateField" : [

									                        "[\"\", {})",

									                        "[/2019/10/10/, /2019/10/10/]"

									                    ]

									                }

									            }

									        },

									        "rejectedPlans" : [ ]

									    },

									    "ok" : 1

									}

查看mongo的日志发现，这种语句执行一次就要800~900ms，的确是比较慢。除非数据库cpu核数很多，要不然只要这种语句每秒并发稍微高一点，cpu很快就被占满了。

之后搜索了下，发现有可能是正则表达式的问题。原来，虽然该语句的确是使用了索引，但是explain( )语句的输出中还有一个字段"indexBounds"，表示执行该语句时所需扫描的索引范围。说实话，上面那个输出中，我始终没看明白它那个索引范围。上面的语句对idField、dateField这两个字段都进行了普通的正则表达式匹配，我猜测它应该是扫描了整个索引树，所以导致索引并未实际提升该语句的查询效率。

我看了下数据库里面的数据，发现idField、dateField这两个字段完全没有必要进行正则匹配，进行普通的文本匹配就行。将正则匹配操作$regex去掉之后，再分析一下，结果是这样的：

									mgset-11111111:PRIMARY> db.example_collection.find({ "idField" : "123456789012345678", "dateField" : "2019/10/10"}).explain("queryPlanner")

									{

									    "queryPlanner" : {

									        "plannerVersion" : 1,

									        "namespace" : "example_db.example_collection",

									        "indexFilterSet" : false,

									        "parsedQuery" : {

									            "$and" : [

									                {

									                    "idField" : {

									                        "$eq" : "123456789012345678"

									                    }

									                },

									                {

									                    "dateField" : {

									                        "$eq" : "2019/10/10"

									                    }

									                }

									            ]

									        },

									        "winningPlan" : {

									            "stage" : "FETCH",

									            "inputStage" : {

									                "stage" : "IXSCAN",

									                "keyPattern" : {

									                    "idField" : 1,

									                    "dateField" : 1

									                },

									                "indexName" : "idField_1_dateField_1",

									                "isMultiKey" : false,

									                "multiKeyPaths" : {

									                    "idField" : [ ],

									                    "dateField" : [ ]

									                },

									                "isUnique" : false,

									                "isSparse" : false,

									                "isPartial" : false,

									                "indexVersion" : 2,

									                "direction" : "forward",

									                "indexBounds" : {

									                    "idField" : [

									                        "[\"123456789012345678\", \"123456789012345678\"]"

									                    ],

									                    "dateField" : [

									                        "[\"2019/10/10\", \"2019/10/10\"]"

									                    ]

									                }

									            }

									        },

									        "rejectedPlans" : [ ]

									    },

									    "ok" : 1

									}