Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next #20

Open
wants to merge 113 commits into
base: next
Choose a base branch
from
Open

Next #20

wants to merge 113 commits into from

Conversation

malahal
Copy link
Contributor

@malahal malahal commented Apr 22, 2019

No description provided.

jtlayton and others added 30 commits October 18, 2018 15:44
An xprt has a ref for the hash table (that's released by SVC_DESTROY());
but when it's first created, only 1 ref was taken, so there wasn't a ref
for the caller.

Add an extra ref for the caller when the xprt is first created.

Signed-off-by: Daniel Gryniewicz <[email protected]>
svc_xprt_lookup - Add extra ref on create
The check to see if the given address will fit in the storage is
backwards.  Fix it, so that IPv4 can fit in the storage, and so that a
huge address doesn't overflow.

Signed-off-by: Daniel Gryniewicz <[email protected]>
timespec[add|sub|cmp] are now public in FreeBSD 12, they now have
an additional variable similar to NetBSD. We use the system macros
by default, otherwise the updated version has been implemented.

Signed-off-by: Jack Halford <[email protected]>
Make URCU a mandatory dependency
Fix addr size check in clnt_dg_control()
When _LGPL_SOURCE is defined the lttng headers will inline some of the
RCU functions with the "bulletproof" functions. This means that we can't
currently select a urcu flavor at runtime that will be used universally.
I'm a little unclear on what happens if you mix different flavored urcu
calls, but I doubt that it's anything good.

Let's switch back to using urcu-bp universally for now until this has a
better solution. It's not ideal for performance, but our use of URCU is
pretty minimal for now, and it's better to be safe than fast.

Signed-off-by: Jeff Layton <[email protected]>
XPRT refcounting could leak a ref in some cases.  Fix this by dropping
the extra ref in error cases, and by passing the event ref (which should
have been dropped, because events are oneshot) to the event processor.

This fixes the FD depletion while doing mount/unmount in a tight loop.

Signed-off-by: Daniel Gryniewicz <[email protected]>
TCP has a double-ref setup, where the tree has one, and the event loop
has one.  This allows processors to destroy the xprt on error without
freeing it, ensuring no use-after-free.

UDP does not have a XPRT tree, since it re-uses a single socket, so it
only has one ref.  This means that destroy-on-error immediately frees.

Add an extra ref to the UDP path so that it matches TCP, and to allow
destroy.

Signed-off-by: Daniel Gryniewicz <[email protected]>
When destroying a client, the XPRT is not also destroyed, but just
unref'd.  This is fine, if multiple clients share a XPRT, but local
clients all have unique FDs, and therefore unique XPRTs.  This means
that their XPRTs are never destroyed.  Ganesha uses these a lot, leading
to many leaked XPRTs and therefore FDs.

Add a flag on a client that indicates it's local, and use that flag to
desrtoy the XPRT when the client is destroyed.

Signed-off-by: Daniel Gryniewicz <[email protected]>
Signed-off-by: Frank S. Filz <[email protected]>
Instead of calling request_cb which alllocates an svc_req and then
calls SVC_DECODE and then frees the request, we call alloc_cb,
call SVC_DECODE ourselves, and then call free_cb.

This makes it a little easier to handle suspending an async
request in the future.

Signed-off-by: Frank S. Filz <[email protected]>
If the protocol layer needs to save some data (it probably does), it
should use SVCXPRT->xp_u1 or SVCXPRT->xp_u2 (Ganesha only uses xp_u2
so it could use xp_u1).

If a request is to be suspended, the SVCXPRT and svc_req should not be
used once it is possible for another thread to be processing an async
completion of the request. This is because if such an async completion
happened to run before the original thread returned up the stack, the
request could actually be requeued and even executing on another RPC
thread while the first thread is still unwinding.

Signed-off-by: Frank S. Filz <[email protected]>
Fix security warnings from AppScan tool
Convert unsafe strncpy to strlcpy
Signed-off-by: Daniel Gryniewicz <[email protected]>
Make a header for strl*()
Coverity Scan fix:
In __rpc_taddr2uaddr_af() free the allocated memory for
variable 'ret' when returning NULL from the function.

Signed-off-by: Madhu Thorat <[email protected]>
Free 'ret' when returning NULL in __rpc_taddr2uaddr_af()
On modern Linux, statd only listens on TCP.  Forcing UDP for connections
causes them to time out.  Any system that claims to support TCP but
doesn't is so out-of-date we probably have other issues there.

Fix it so that TCP actually uses TCP.

Signed-off-by: Daniel Gryniewicz <[email protected]>
NSM - Don't force UDP portmapper lookups
We are doing accept and returning XPRT_DIED without closing fd, which
will be unmonitored and cause fd leak.
Client will see connection succeded but IO will hang on it.
"gss_sign" is a deprecated GSS-API function. This needs to be replaced by
"gss_get_mic".

Signed-off-by: Sachin Punadikar <[email protected]>
TweakySolution and others added 6 commits December 12, 2019 10:06
When testing asynchronous I/O, observed all the work threads
eventually waiting on the same epoll event. This was because
the svc worker was being rescheduled multiple times when
EWOULDBLOCK occurs on send and we request evchan write (OUT)
control on the fd_send.

Only one svc_rqst_epoll_loop should be running per sr_rec.

Signed-off-by: TweakySolution <[email protected]>
Signed-off-by: Daniel Gryniewicz <[email protected]>
Move async callback to svc_req and give it its own wpe
Signed-off-by: Daniel Gryniewicz <[email protected]>
dang and others added 19 commits February 20, 2020 11:52
XPRTs are generally stored in some form of lookup table.  When we're
destroying the XPRT, we need to remove it from it's lookup table before
we drop the sentinal ref.  Otherwise, another thread can find it, and
increment it's ref from 0 -> 1.  This is very hard to handle in an
atomic fashion.

Add an unlink() API method, and implement it for all backends.  This
allows us to remove the XPRT from it's lookup table before we drop it's
sentinal ref.

Signed-off-by: Daniel Gryniewicz <[email protected]>
XPRT - Unregister xprt before dropping sentinal ref
There is a race when unhooking events from epoll, where the event could
be ready for delivery (or even delivered, but the thread not scheduled)
and so the event is processed after the unhook, and therefore after the
XPRT has been freed.  To close this, stop putting a pointer to the rec
in the event data, and instead put the FD in there and use it to look up
the XPRT.  This ensures that, if we got the XPRT from lookup, it's valid
and ref'd for the duration of the event.

Once we're no longer storing a XPRT pointer in the epoll event, we don't
need a refcount across the hook/event/unhook series.  Remove these
refcounts, allowing a destroyed XPRT to just be freed.

Signed-off-by: Daniel Gryniewicz <[email protected]>
clang complains about uninitialized use of xv/gv_count variables because
there is no final else clause, even if we do check that svc is either of
these types at the start of the function.
Silence false positive by skipping the last "always-true" check
authgss_prot: silence false-positive clang warning
Take a XPRT ref when hooking events
Many error paths call DESTROY, which will unlink and drop the ref.  This
means that the final RELEASE will free, causing the DESTROY to
use-after-free.  Instead, make sure we DESTROY first.

Signed-off-by: Daniel Gryniewicz <[email protected]>
VC - RELEASE after DESTROY
The check for initialization in GSS was not locked, so there could be a
race where two threads both initialize.  Fix this by moving the check
inside the lock.

Signed-off-by: Daniel Gryniewicz <[email protected]>
Avoid race checking for GSS init
In clnt_tli_ncreate() if client creation fails then we don't
close FD and cause FD leak.
Fixed by closing FD in case of client creation failure.

Signed-off-by: Madhu Thorat <[email protected]>
Seperate out work queue and worker queue.
Always submit work to the work queue and wakeup available
work thread.
Each work thread will always check for entry in work queue,
so that it can pickup next work without scheduling.
Reduce the number of idle thread count, less number of threads
helps in less scheduling overhead.
work_pool: Fix for efficient scheduling.
We are updating data->ptr while rearming fd_send, which is
changing the data fd since its union, cauing IO is hung.

Signed-off-by: Gaurav Gangalwar <[email protected]>
rqst: Fix hang in non blocking send.
In clnt_tli_ncreate() close FD in case client creation failed
We need to free gss_buffer
Fix mem_leak in xdr_rpc_gss_buf
@malahal malahal force-pushed the next branch 2 times, most recently from 2ff5a1b to f21f32a Compare August 27, 2020 14:22
Neither uid nor gid should be -1. If a client sends such a value,
setfsuid, setfsuid or __NR_setgroups may fail. Check for invalid
uid/gid and return a failure in such a case.

Signed-off-by: Malahal Naineni <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants