I was meaning about help to create grok pattern for this strange date: [20/May/2025:13:02:52 +0000]
I tried to use many options with timestamp with no effect. It is possible to create grok for year, month ... etc
I implemented grok as You sent to me and basically it works fine,
but I have servers with different logs - I mean date first: [20/May/2025:13:02:52 +0000] 10.10.0.1 TLSv1.2 ECDHE-RSA "POST /service/modell/ExternalMode HTTP/1.1" 3188
o I mean : date, ip, greedydata
and grok poattern with:
grok { match => { "message" => "%{HTTPD_COMMONLOG}" } }
date { match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ] }
does not work with such logs.
I think that I will have to implement many groks for them - I mean
grok can match against a list of patterns. If HTTPD_COMMONLOG does not fit then you can rearrange the parts of it, and include new patterns to match your other log format.
You may want to model the second pattern on the ECS-compatible HTTD_COMMONLOG rather than the legacy version, so that you get field names like [source][address] instead of [clientip]
If you want the individual parts of the date you could override the bundled HTTPDATE pattern with something like
This is not a site to provide explanations of regular expressions, but I can explain some terms that might help you get started elsewhere
| is used for alternation. So (?:-|%{NUMBER:bytes}) matches either - or a number, which is what is used for fields like byte counts in HTTP logs.
() are used to create a capture group. It captures part of the regular expression so that it can be referenced when doing a substitution. See this post for an example.
Sometimes you need to use () to surround part of a pattern (as in that alternation above), but you do not want to capture it to reference later. In that case you would use (?: and ) around the pattern to create a non-capture group.
Standard web logs will record the user request surrounded by double quotes., like this: "GET /foo/ HTTP/1.0"
The first part of the pattern you mentioned is trying to match that. Within the double quotes it first tries to match the verb and then the URI (which cannot contain a space). The HTTP/1.0 after that is actually optional, so the HTTP/%{NUMBER:httpversion} is made into a non-capture group using (?: and ) and then followed by ? which means it occurs 0 or 1 times (i.e. it is optional).
If matching the verb, URI and optional version fails then it says (using alternation) to just capture everything within the double quotes in the field [rawrequest].
Lastly, there are many slightly different flavours of regexps -- Ruby, Java, perl, POSIX, shell, csh, ed, and many more. logstash uses Ruby regexps for grok and mutate+gsub filters.
There are places in logstash where Java regexps are used (some file paths) but the differences between Java and Ruby are unlikely to matter in those places.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.