Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

errors on production, socket fail #254

Open
michelson opened this issue Oct 17, 2024 · 15 comments
Open

errors on production, socket fail #254

michelson opened this issue Oct 17, 2024 · 15 comments

Comments

@michelson
Copy link

Hi there, I have migrated some months ago to Falcon on rails app,

suddenly I've started to see weird errors which are not being notified on my reporting error, but users noted that some realtime (socket connection) failed.

the logs is like this:

{"time":"2024-10-17T02:38:15+00:00","severity":"warn","oid":3028920,"pid":1,"subject":"Async::Task","annotation":"Reading HTTP/1.1 requests for Async::HTTP::Protocol::HTTP1::Server.","message":["Task may have ended with unhandled exception.","undefined method `write' for nil:NilClass"],"error":{"kind":"NoMethodError","message":"undefined method `write' for nil:NilClass","stack":"/usr/local/bundle/gems/protocol-http1-0.18.0/lib/protocol/http1/connection.rb:134:in `write_response'\n/usr/local/bundle/gems/async-http-0.63.0/lib/async/http/protocol/http1/server.rb:63:in `each'\n/usr/local/bundle/gems/async-http-0.63.0/lib/async/http/server.rb:40:in `accept'\n/usr/local/bundle/gems/async-io-1.41.0/lib/async/io/server.rb:15:in `block in accept_each'\n/usr/local/bundle/gems/async-io-1.41.0/lib/async/io/socket.rb:58:in `block in accept'\n/usr/local/bundle/gems/async-2.8.2/lib/async/task.rb:161:in `block in run'\n/usr/local/bundle/gems/async-2.8.2/lib/async/task.rb:331:in `block in schedule'\n"}}

not sure what could happen here but seems to be async http gems

the installed dependences were:

    async (2.8.2)
      console (~> 1.10)
      fiber-annotation
      io-event (~> 1.1)
      timers (~> 4.1)
    async-container (0.16.13)
      async
      async-io
    async-http (0.63.0)
      async (>= 1.25)
      async-io (>= 1.28)
      async-pool (>= 0.2)
      protocol-http (~> 0.26.0)
      protocol-http1 (~> 0.18.0)
      protocol-http2 (~> 0.16.0)
      traces (>= 0.10.0)
    async-http-cache (0.4.3)
      async-http (~> 0.56)
    async-io (1.41.0)
      async
    async-pool (0.4.0)
      async (>= 1.25)
@ioquatix
Copy link
Member

ioquatix commented Oct 17, 2024

Thanks for your report, I'll investigate it.

Probably a good first step would be to upgrade to the latest version of Ruby, async-http and falcon as well as all the other dependencies.

@michelson
Copy link
Author

When I upgraded to the latest Falcon, the Webrick started in development. Maybe I'm missing something

@ioquatix
Copy link
Member

@michelson can you show me exactly what command you are running?

@michelson
Copy link
Author

HI Samuel, rails s on development.

@ioquatix
Copy link
Member

Do you have both puma and falcon in your gemfile.lock?

@michelson
Copy link
Author

no, just Falcon on Gemfile. It also happened to me on a new rails app.
steps:

  • remove puma
  • add falcon
  • rails s
    = webrick start.

@ioquatix
Copy link
Member

Okay, let me check what's going on.

@ioquatix
Copy link
Member

ioquatix commented Oct 29, 2024

samuel@aiko ~/tmp> rails new foob
      create  
...
         run  bundle install --quiet
samuel@aiko ~/tmp> cd foob
samuel@aiko ~/t/foob (main)> bundle remove puma
Removing gems from /home/samuel/tmp/foob/Gemfile
puma (>= 5.0) was removed.
samuel@aiko ~/t/foob (main)> bundle add falcon
Fetching gem metadata from https://rubygems.org/.........
Resolving dependencies...
Fetching gem metadata from https://rubygems.org/.........
Resolving dependencies...
samuel@aiko ~/t/foob (main)> bin/rails s
=> Booting Falcon v0.48.3
=> Rails 7.2.1.2 application starting in development http://localhost:3000
=> Run `bin/rails server --help` for more startup options

At least with this simple example, I could not reproduce it. Do you mind showing me your output?

@jakeonfire
Copy link

jakeonfire commented Mar 6, 2025

also seeing similar errors:

{
  "time": "2025-03-06T23:10:24+00:00",
  "severity": "warn",
  "oid": 55408,
  "pid": 19,
  "subject": "Async::Task",
  "annotation": "Reading HTTP/1.1 requests for Async::HTTP::Protocol::HTTP1::Server.",
  "message": "Task may have ended with unhandled exception.",
  "event": {
    "type": "failure",
    "root": "/app",
    "class": "Errno::EPIPE",
    "message": "Broken pipe",
    "backtrace": [
      "/usr/local/lib/ruby/3.4.0/socket.rb:461:in 'BasicSocket#__write_nonblock'",
      "/usr/local/lib/ruby/3.4.0/socket.rb:461:in 'BasicSocket#write_nonblock'",
      "/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/buffered.rb:93:in 'IO::Stream::Buffered#syswrite'",
      "/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:184:in 'IO::Stream::Generic#drain'",
      "/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:196:in 'block in IO::Stream::Generic#flush'",
      "/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:195:in 'Thread::Mutex#synchronize'",
      "/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:195:in 'IO::Stream::Generic#flush'",
      "/usr/local/bundle/gems/protocol-http1-0.30.0/lib/protocol/http1/connection.rb:527:in 'Protocol::HTTP1::Connection#write_chunked_body'",
      "/usr/local/bundle/gems/protocol-http1-0.30.0/lib/protocol/http1/connection.rb:598:in 'Protocol::HTTP1::Connection#write_body'",
      "/usr/local/bundle/gems/async-http-0.87.0/lib/async/http/protocol/http1/server.rb:133:in 'block in Async::HTTP::Protocol::HTTP1::Server#each'",
      "/usr/local/bundle/gems/async-2.23.0/lib/async/task.rb:330:in 'Async::Task#defer_stop'",
      "/usr/local/bundle/gems/async-http-0.87.0/lib/async/http/protocol/http1/server.rb:80:in 'Async::HTTP::Protocol::HTTP1::Server#each'",
      "/usr/local/bundle/gems/async-http-0.87.0/lib/async/http/server.rb:50:in 'Async::HTTP::Server#accept'",
      "/usr/local/bundle/gems/falcon-0.51.0/lib/falcon/server.rb:57:in 'Falcon::Server#accept'",
      "/usr/local/bundle/gems/io-endpoint-0.15.2/lib/io/endpoint/wrapper.rb:216:in 'block (2 levels) in IO::Endpoint::Wrapper#accept'",
      "/usr/local/bundle/gems/async-2.23.0/lib/async/task.rb:200:in 'block in Async::Task#run'",
      "/usr/local/bundle/gems/async-2.23.0/lib/async/task.rb:438:in 'block in Async::Task#schedule'"
    ]
  }
}

these must be in-bound requests. could it be from timeouts configured on the upstream server? we have faraday making requests from an upstream server to our falcon server with a 15 second timeout, and requests are expected to occasionally timeout based on slow requests to other servers downstream from the falcon server.

@travisbell
Copy link

travisbell commented Mar 11, 2025

Huh, I came in to report the exact same thing:

{
  "time": "2025-03-11T16:07:55+00:00",
  "severity": "warn",
  "oid": 90828100,
  "pid": 388,
  "subject": "Async::Task",
  "annotation": "Reading HTTP/1.1 requests for Async::HTTP::Protocol::HTTP1::Server.",
  "message": "Task may have ended with unhandled exception.",
  "event": {
    "type": "failure",
    "root": "/root/workspace/web",
    "class": "Errno::EPIPE",
    "message": "Broken pipe",
    "backtrace": [
      "/usr/local/lib/ruby/3.3.0/socket.rb:461:in `__write_nonblock'",
      "/usr/local/lib/ruby/3.3.0/socket.rb:461:in `write_nonblock'",
      "/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/buffered.rb:93:in `syswrite'",
      "/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:184:in `drain'",
      "/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:196:in `block in flush'",
      "/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:195:in `synchronize'",
      "/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:195:in `flush'",
      "/usr/local/bundle/gems/protocol-http1-0.30.0/lib/protocol/http1/connection.rb:527:in `write_chunked_body'",
      "/usr/local/bundle/gems/protocol-http1-0.30.0/lib/protocol/http1/connection.rb:598:in `write_body'",
      "/usr/local/bundle/gems/async-http-0.87.0/lib/async/http/protocol/http1/server.rb:133:in `block in each'",
      "/usr/local/bundle/gems/async-2.23.1/lib/async/task.rb:330:in `defer_stop'",
      "/usr/local/bundle/gems/async-http-0.87.0/lib/async/http/protocol/http1/server.rb:80:in `each'",
      "/usr/local/bundle/gems/async-http-0.87.0/lib/async/http/server.rb:50:in `accept'",
      "/usr/local/bundle/gems/falcon-0.51.1/lib/falcon/server.rb:57:in `accept'",
      "/usr/local/bundle/gems/io-endpoint-0.15.2/lib/io/endpoint/wrapper.rb:216:in `block (2 levels) in accept'",
      "/usr/local/bundle/gems/async-2.23.1/lib/async/task.rb:200:in `block in run'",
      "/usr/local/bundle/gems/async-2.23.1/lib/async/task.rb:438:in `block in schedule'"
    ]
  }
}

What we're seeing is a seemingly random flurries of these errors. They seem random at this point (they don't seem to align with any kind of a burst in traffic) and during these events, CPU jumps up to near 100%, and for a few minutes we see these errors flood the logs. Most of the workers restart on their own and things move along until the whole thing happens again. During these events, Falcon seems to stop serving requests.

Like @jakeonfire we are also using Faraday, but not for all requests. Faraday (with the async-http adapter) is the HTTP client we're using for upstream search requests (OpenSearch).

I've not seen any difference between Ruby 3.3.7 or 3.4.2.

@ioquatix Do you have any idea what could be going on?

@ioquatix
Copy link
Member

Thanks for all the logs and data I will review this week.

@jakeonfire
Copy link

jakeonfire commented Mar 11, 2025

fwiw ours are fairly steady:
Image

and there is no evidence that they are causing a bigger issue, strangely.

@travisbell
Copy link

travisbell commented Mar 14, 2025

We just had one of these events I was referring to. For us, it looks like this:

Image

And request times during this:

Image

@travisbell
Copy link

travisbell commented Mar 25, 2025

It turns out there is often also the following error during these spikes:

/usr/local/bundle/gems/async-2.23.1/lib/async/scheduler.rb:305:in 'IO::Event::Selector::EPoll#io_wait': execution expired (Faraday::TimeoutError)
	from /usr/local/bundle/gems/async-2.23.1/lib/async/scheduler.rb:305:in 'Async::Scheduler#io_wait'
	from /usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/buffered.rb:118:in 'IO#wait_readable'
	from /usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/buffered.rb:118:in 'IO::Stream::Buffered#sysread'
	from /usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:327:in 'IO::Stream::Generic#fill_read_buffer'
	from /usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:164:in 'IO::Stream::Generic#gets'
	from /usr/local/bundle/gems/protocol-http1-0.34.0/lib/protocol/http1/connection.rb:350:in 'Protocol::HTTP1::Connection#read_line?'
	from /usr/local/bundle/gems/protocol-http1-0.34.0/lib/protocol/http1/connection.rb:365:in 'Protocol::HTTP1::Connection#read_line'
	from /usr/local/bundle/gems/protocol-http1-0.34.0/lib/protocol/http1/connection.rb:424:in 'Protocol::HTTP1::Connection#read_response_line'
	from /usr/local/bundle/gems/protocol-http1-0.34.0/lib/protocol/http1/connection.rb:451:in 'Protocol::HTTP1::Connection#read_response'
	from /usr/local/bundle/gems/async-http-0.88.0/lib/async/http/protocol/http1/response.rb:15:in 'Async::HTTP::Protocol::HTTP1::Response.read'
	from /usr/local/bundle/gems/async-http-0.88.0/lib/async/http/protocol/http1/client.rb:86:in 'Async::HTTP::Protocol::HTTP1::Client#call'
	from /usr/local/bundle/gems/protocol-http-0.49.0/lib/protocol/http/request.rb:87:in 'Protocol::HTTP::Request#call'
	from /usr/local/bundle/gems/async-http-0.88.0/lib/async/http/client.rb:146:in 'Async::HTTP::Client#make_response'
	from /usr/local/bundle/gems/async-http-0.88.0/lib/async/http/client.rb:104:in 'Async::HTTP::Client#call'
	from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/adapter.rb:205:in 'block (2 levels) in Async::HTTP::Faraday::Adapter#perform_request'
	from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/adapter.rb:242:in 'block in Async::HTTP::Faraday::Adapter#with_timeout'
	from /usr/local/bundle/gems/async-2.23.1/lib/async/scheduler.rb:562:in 'Async::Scheduler#with_timeout'
	from /usr/local/bundle/gems/async-2.23.1/lib/async/task.rb:141:in 'Async::Task#with_timeout'
	from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/adapter.rb:241:in 'Async::HTTP::Faraday::Adapter#with_timeout'
	from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/adapter.rb:191:in 'block in Async::HTTP::Faraday::Adapter#perform_request'
	from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/adapter.rb:233:in 'block (2 levels) in Async::HTTP::Faraday::Adapter#with_client'
	from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/clients.rb:107:in 'Async::HTTP::Faraday::PersistentClients#with_client'
	from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/clients.rb:162:in 'Async::HTTP::Faraday::PerThreadPersistentClients#with_client'
	from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/adapter.rb:232:in 'block in Async::HTTP::Faraday::Adapter#with_client'
	from /usr/local/bundle/gems/async-2.23.1/lib/kernel/sync.rb:24:in 'Kernel#Sync'
	from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/adapter.rb:222:in 'Async::HTTP::Faraday::Adapter#with_client'
	from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/adapter.rb:170:in 'Async::HTTP::Faraday::Adapter#perform_request'
	from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/adapter.rb:161:in 'Async::HTTP::Faraday::Adapter#call'
	from /usr/local/bundle/gems/opentelemetry-instrumentation-faraday-0.26.0/lib/opentelemetry/instrumentation/faraday/middlewares/tracer_middleware.rb:43:in 'block (2 levels) in OpenTelemetry::Instrumentation::Faraday::Middlewares::TracerMiddleware#call'
	from /usr/local/bundle/gems/opentelemetry-api-1.5.0/lib/opentelemetry/trace/tracer.rb:37:in 'block in OpenTelemetry::Trace::Tracer#in_span'
	from /usr/local/bundle/gems/opentelemetry-api-1.5.0/lib/opentelemetry/trace.rb:70:in 'block in OpenTelemetry::Trace#with_span'
	from /usr/local/bundle/gems/opentelemetry-api-1.5.0/lib/opentelemetry/context.rb:88:in 'OpenTelemetry::Context.with_value'
	from /usr/local/bundle/gems/opentelemetry-api-1.5.0/lib/opentelemetry/trace.rb:70:in 'OpenTelemetry::Trace#with_span'
	from /usr/local/bundle/gems/opentelemetry-api-1.5.0/lib/opentelemetry/trace/tracer.rb:37:in 'OpenTelemetry::Trace::Tracer#in_span'
	from /usr/local/bundle/gems/opentelemetry-instrumentation-faraday-0.26.0/lib/opentelemetry/instrumentation/faraday/middlewares/tracer_middleware.rb:38:in 'block in OpenTelemetry::Instrumentation::Faraday::Middlewares::TracerMiddleware#call'
	from /usr/local/bundle/gems/opentelemetry-common-0.22.0/lib/opentelemetry/common/http/client_context.rb:49:in 'block in OpenTelemetry::Common::HTTP::ClientContext#with_attributes'
	from /usr/local/bundle/gems/opentelemetry-api-1.5.0/lib/opentelemetry/context.rb:88:in 'OpenTelemetry::Context.with_value'
	from /usr/local/bundle/gems/opentelemetry-common-0.22.0/lib/opentelemetry/common/http/client_context.rb:49:in 'OpenTelemetry::Common::HTTP::ClientContext#with_attributes'
	from /usr/local/bundle/gems/opentelemetry-instrumentation-faraday-0.26.0/lib/opentelemetry/instrumentation/faraday/middlewares/tracer_middleware.rb:37:in 'OpenTelemetry::Instrumentation::Faraday::Middlewares::TracerMiddleware#call'
	from /usr/local/bundle/gems/faraday-2.12.2/lib/faraday/middleware.rb:56:in 'Faraday::Middleware#call'
	from /usr/local/bundle/gems/faraday-retry-2.2.1/lib/faraday/retry/middleware.rb:153:in 'Faraday::Retry::Middleware#call'
	from /usr/local/bundle/gems/faraday-gzip-3.0.3/lib/faraday/gzip/middleware.rb:42:in 'Faraday::Gzip::Middleware#call'
	from /usr/local/bundle/gems/faraday-2.12.2/lib/faraday/rack_builder.rb:152:in 'Faraday::RackBuilder#build_response'
	from /usr/local/bundle/gems/faraday-2.12.2/lib/faraday/connection.rb:452:in 'Faraday::Connection#run_request'
	from /usr/local/bundle/gems/faraday-2.12.2/lib/faraday/connection.rb:280:in 'Faraday::Connection#post'

It's kind of hard to tell which error comes first though, or if they are even truly connected. I thought it might be helpful to see it as well though.

@artrybalko
Copy link

I think what @jakeonfire and I found while troubleshooting it further is that these errors come up when the server making the inbound request to Falcon closes the connection on a timeout. In our case we were catching external web hooks and doing an unreasonable amount of processing before responding with a 200. Since the processing was of no interest to the caller we moved it to an async block and started responding right away. I believe the errors stopped immediately after that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants