Hello guys,
what’s the best way to handle a use case where a query should return documents that contain at least N instances of the specified words?
For example, suppose my database contains the following phrases:
"to be or not to be"
"to be human"
When I search for "to be", Elasticsearch correctly returns both phrases. However, if I search for "to be to be", I want only the first phrase to match - since it contains "to be" twice, as requested in query - ES by default returns both phrases since it treats "to be to be" query equally to just "to be".
In other words, how can I make word frequency in the query affect the search results?
Hello @marc21
Welcome to the community.
POST test-index/_doc
{
"id" : 1,
"message" : "to be or not to be"
}
POST test-index/_doc
{
"id" : 2,
"message" : "to be human"
}
GET test-index/_search
{
"size": 1,
"track_total_hits": true,
"query": {
"match": {
"message": "to be to be"
}
}
}
Output :
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.9168171,
"hits": [
{
"_index": "test-index",
"_id": "QgqWQ5cBpIOblOt4qNbY",
"_score": 0.9168171,
"_source": {
"id": 1,
"message": "to be or not to be"
}
}
]
}
}
As we see from above example the output returned is automatically sorted by "_score": 0.9168171
If you are only interested in 1 record by using size=1 , it will always return you the best match for your query. As we have enabled "track_total_hits": true this will also give the total count of matching documents from which the top result with highest _score will be displayed as size=1
Thanks!!
Thank you for quick response. Unfortunately I cannot use _score
sorting - we are using alphabetical sorting 
Also when I search for "to be to be to be" for previous dataset I want no matching result at all - hits total should be 0.
"be" - total hits 2
"to be" - total hits 2
"be be" - total hits 1
"to be to be" - total hits 1
"to be to be to be" - total hits 0
1 Like