Hi,
I'm investigating a situation where Logstash receives a certain amount of data, but the traffic sent from Logstash to Elasticsearch is significantly larger, even though I perform minimal transformations.
Setup
- Inputs from multiple Winlogbeat agents
- Output to Elasticsearch.
- Pipeline does basic operations
- Compression is enabled for the Elasticsearch output.
Logstash pipeline:
input {
beats {
port => 15044
ssl_enabled => true
ssl_client_authentication => "optional"
ssl_certificate_authorities => "/appl/logstashdata/certs/http_ca.crt"
ssl_certificate => "/appl/logstashdata/certs/testlogstash1/testlogstash1.crt"
ssl_key => "/appl/logstashdata/certs/testlogstash1/testlogstash1.key"
}
}
filter {
if [agent][version] =~ /^8\./ {
mutate {
replace => {
"[data_stream][type]" => "logs"
"[data_stream][dataset]" => "winlogbeat_8"
"[data_stream][namespace]" => "default"
}
}
} else if [agent][version] =~ /^9\./ {
mutate {
replace => {
"[data_stream][type]" => "logs"
"[data_stream][dataset]" => "winlogbeat_9"
"[data_stream][namespace]" => "default"
}
}
} else if [agent][version] =~ /^10\./ {
mutate {
replace => {
"[data_stream][type]" => "logs"
"[data_stream][dataset]" => "winlogbeat_10"
"[data_stream][namespace]" => "default"
}
}
} else {
mutate {
replace => {
"[data_stream][type]" => "logs"
"[data_stream][dataset]" => "winlogbeat_fallback"
"[data_stream][namespace]" => "default"
}
}
}
mutate {
lowercase => [
"[log][level]",
"[agent][hostname]",
"[host][name]",
"[winlog][computer_name]",
"[winlog][event_data][SubjectDomainName]"
]
}
mutate {
remove_field => [
"[@version]",
"[agent][id]",
"[agent][name]",
"[agent][type]",
"[ecs][version]",
"[winlog][event_data][Binary]",
"[event][original]"
]
}
}
output {
elasticsearch {
hosts => [
"https://testelastic1:9200",
"https://testelastic2:9200",
"https://testelastic3:9200"
]
api_key => "${ES_LOGSTASH_API_KEY}"
ssl_enabled => true
ssl_certificate_authorities => "/appl/logstashdata/certs/http_ca.crt"
ssl_verification_mode => "full"
data_stream => "true"
compression_level => 7
}
}
There is winlogbeat.yml configuration
winlogbeat.event_logs:
- name: Application
ignore_older: 24h
- name: System
ignore_older: 24h
- name: Security
ignore_older: 24h
- name: Windows PowerShell
ignore_older: 24h
- name: Microsoft-Windows-PowerShell/Operational
ignore_older: 24h
- name: Microsoft-Windows-Windows Defender/Operational
ignore_older: 24h
fields:
project:
name: "codera"
env: "test"
fields_under_root: true
output.logstash:
compression_level: 7
bulk_max_size: 32
loadbalance: true
hosts:
- "testlogstash1:15044"
- "testlogstash2:15044"
ssl:
enabled: true
...
logging.level: warning
logging.to_eventlog: true
monitoring.enabled: false
Observation
- Verified via
vnstat
and Packetbeat on the Logstash servers.
Screen from testlogstash1
-
Input traffic (from Beats to Logstash) is noticeably smaller than output traffic (from Logstash to Elasticsearch), even with compression enabled.
-
The difference is surprisingly large.
Question
Why does Logstash generate so much more outbound traffic than it receives?
Could this be caused by:
-
Metadata or structure overhead in the bulk API?
-
Something else in how Logstash handles events?