-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VK node is not working in-edge deployment type #357
Comments
Hi @hakanaltindag , thanks for reaching out. We'd like to ask for some additional information. Which plugin are you using? The slurm one? In any case we might need the logs from the interLink api server and the plugin. In addition, did the manual health check succeded as indicated here? https://intertwin-eu.github.io/interLink/docs/Cookbook#test-interlink-stack-health Which version of interLink are you running? |
Installed slurm as described in documentation. Here is interlink log
Connection test `curl -v --unix-socket ${HOME}/.interlink.sock http://unix/pinglink
< HTTP/1.1 500 Internal Server Error
Also /logs/plugin.log is empty. Do you have any idea? |
thanks @hakanaltindag , it's definetely the plugin not starting for some reasons, all the rest looks like working fine. I suppose you don't see any plugin process running, isn't it? If so I think I need your plugin config file and the command you are executing. If you are following the guide I suppose is the following though:
is this right? If so, then I need only the config file that you pass into the env variable above. |
Yes that's right.
When i try to check logs
Also my plugin-config.yaml
Healthcheck seems connected to interlink.sock
The only question in my mind is what exactly should the interlink port and sidecar port? Interlink port 30443? sidecar port 4001? |
can you try with regarding the ports, the one shared looks good. interlink il listening on a unix socket, so no need to indicate any port to the plugin. |
yess! it worked. added node and now ready! It would be great if you could update the doc. Thanks for your support |
@Bianco95 can you take care of this? |
@hakanaltindag thanks for your patience! If you want to reach out to share your case study, we are all eager to hear it! So, feel free to jump into the Slack channel (the link is on the home page of interlink)! |
Ok I am going to update the doc |
OS: Rocky 8
Kubernetes Version: 1.24.17
I have created a kubernetes cluster from ground-up. I have edge node it can communicates cluster. I followed the installation steps. I can see vk-node in cluster but it is Not Ready status. I inspected node, I only see NetworkUnavailable true and message is " RouteController created a route"
And pod logs below;
time="2025-01-17T11:55:43Z" level=error msg="server error: 502" time="2025-01-17T11:55:43Z" level=error msg="Ping Failed with exit code: -1" time="2025-01-17T11:55:43Z" level=error msg="Error: <nil>" time="2025-01-17T11:55:43Z" level=info msg=endNodeLoop time="2025-01-17T11:55:43Z" level=debug msg="Received node status update" time="2025-01-17T11:55:43Z" level=debug msg="got node from api server" time="2025-01-17T11:55:43Z" level=debug msg="Generated three way patch" error="<nil>" patch="{\"metadata\":{\"annotations\":{\"virtual-kubelet.io/last-applied-node-status\":\"{\\\"capacity\\\":{\\\"cpu\\\":\\\"10\\\",\\\"memory\\\":\\\"64Gi\\\",\\\"nvidia.com/gpu\\\":\\\"0\\\",\\\"pods\\\":\\\"10\\\"},\\\"allocatable\\\":{\\\"cpu\\\":\\\"10\\\",\\\"memory\\\":\\\"64Gi\\\",\\\"nvidia.com/gpu\\\":\\\"0\\\",\\\"pods\\\":\\\"10\\\"},\\\"conditions\\\":[{\\\"type\\\":\\\"Ready\\\",\\\"status\\\":\\\"False\\\",\\\"lastHeartbeatTime\\\":\\\"2025-01-17T11:55:43Z\\\",\\\"lastTransitionTime\\\":\\\"2025-01-17T11:55:43Z\\\",\\\"reason\\\":\\\"KubeletPending\\\",\\\"message\\\":\\\"kubelet is pending.\\\"},{\\\"type\\\":\\\"OutOfDisk\\\",\\\"status\\\":\\\"False\\\",\\\"lastHeartbeatTime\\\":\\\"2025-01-17T11:55:43Z\\\",\\\"lastTransitionTime\\\":\\\"2025-01-17T11:55:43Z\\\",\\\"reason\\\":\\\"KubeletHasSufficientDisk\\\",\\\"message\\\":\\\"kubelet has sufficient disk space available\\\"},{\\\"type\\\":\\\"MemoryPressure\\\",\\\"status\\\":\\\"False\\\",\\\"lastHeartbeatTime\\\":\\\"2025-01-17T11:55:43Z\\\",\\\"lastTransitionTime\\\":\\\"2025-01-17T11:55:43Z\\\",\\\"reason\\\":\\\"KubeletHasSufficientMemory\\\",\\\"message\\\":\\\"kubelet has sufficient memory available\\\"},{\\\"type\\\":\\\"DiskPressure\\\",\\\"status\\\":\\\"False\\\",\\\"lastHeartbeatTime\\\":\\\"2025-01-17T11:55:43Z\\\",\\\"lastTransitionTime\\\":\\\"2025-01-17T11:55:43Z\\\",\\\"reason\\\":\\\"KubeletHasNoDiskPressure\\\",\\\"message\\\":\\\"kubelet has no disk pressure\\\"},{\\\"type\\\":\\\"NetworkUnavailable\\\",\\\"status\\\":\\\"True\\\",\\\"lastHeartbeatTime\\\":\\\"2025-01-17T11:55:43Z\\\",\\\"lastTransitionTime\\\":\\\"2025-01-17T11:55:43Z\\\",\\\"reason\\\":\\\"RouteCreated\\\",\\\"message\\\":\\\"RouteController created a route\\\"}],\\\"addresses\\\":[{\\\"type\\\":\\\"InternalIP\\\",\\\"address\\\":\\\"10.233.64.13\\\"}],\\\"daemonEndpoints\\\":{\\\"kubeletEndpoint\\\":{\\\"Port\\\":10250}},\\\"nodeInfo\\\":{\\\"machineID\\\":\\\"\\\",\\\"systemUUID\\\":\\\"\\\",\\\"bootID\\\":\\\"\\\",\\\"kernelVersion\\\":\\\"\\\",\\\"osImage\\\":\\\"\\\",\\\"containerRuntimeVersion\\\":\\\"\\\",\\\"kubeletVersion\\\":\\\"0.3.6\\\",\\\"kubeProxyVersion\\\":\\\"\\\",\\\"operatingSystem\\\":\\\"linux\\\",\\\"architecture\\\":\\\"virtual-kubelet\\\"}}\"},\"creationTimestamp\":null},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Ready\"},{\"type\":\"OutOfDisk\"},{\"type\":\"MemoryPressure\"},{\"type\":\"DiskPressure\"},{\"type\":\"NetworkUnavailable\"}],\"conditions\":[{\"lastHeartbeatTime\":\"2025-01-17T11:55:43Z\",\"lastTransitionTime\":\"2025-01-17T11:55:43Z\",\"type\":\"Ready\"},{\"lastHeartbeatTime\":\"2025-01-17T11:55:43Z\",\"lastTransitionTime\":\"2025-01-17T11:55:43Z\",\"type\":\"OutOfDisk\"},{\"lastHeartbeatTime\":\"2025-01-17T11:55:43Z\",\"lastTransitionTime\":\"2025-01-17T11:55:43Z\",\"type\":\"MemoryPressure\"},{\"lastHeartbeatTime\":\"2025-01-17T11:55:43Z\",\"lastTransitionTime\":\"2025-01-17T11:55:43Z\",\"type\":\"DiskPressure\"},{\"lastHeartbeatTime\":\"2025-01-17T11:55:43Z\",\"lastTransitionTime\":\"2025-01-17T11:55:43Z\",\"type\":\"NetworkUnavailable\"}]}}" time="2025-01-17T11:55:43Z" level=debug msg="updated node status in api server" node.Status.Conditions="[{Ready False 2025-01-17 11:55:43 +0000 UTC 2025-01-17 11:55:43 +0000 UTC KubeletPending kubelet is pending.} {OutOfDisk False 2025-01-17 11:55:43 +0000 UTC 2025-01-17 11:55:43 +0000 UTC KubeletHasSufficientDisk kubelet has sufficient disk space available} {MemoryPressure False 2025-01-17 11:55:43 +0000 UTC 2025-01-17 11:55:43 +0000 UTC KubeletHasSufficientMemory kubelet has sufficient memory available} {DiskPressure False 2025-01-17 11:55:43 +0000 UTC 2025-01-17 11:55:43 +0000 UTC KubeletHasNoDiskPressure kubelet has no disk pressure} {NetworkUnavailable True 2025-01-17 11:55:43 +0000 UTC 2025-01-17 11:55:43 +0000 UTC RouteCreated RouteController created a route}]" node.resourceVersion=27304 time="2025-01-17T11:55:45Z" level=info msg="No pods to monitor, waiting for the next loop to start" time="2025-01-17T11:55:45Z" level=info msg="statusLoop=end" time="2025-01-17T11:55:45Z" level=info msg=statusLoop time="2025-01-17T11:55:45Z" level=debug msg="404 request not found" uri=/metrics/cadvisor vars="map[]" W0117 11:55:47.697675 1 reflector.go:539] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list *v1.Secret: field label not supported: spec.nodeName E0117 11:55:47.697710 1 reflector.go:147] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: Failed to watch *v1.Secret: failed to list *v1.Secret: field label not supported: spec.nodeName time="2025-01-17T11:55:50Z" level=info msg="No pods to monitor, waiting for the next loop to start" time="2025-01-17T11:55:50Z" level=info msg="statusLoop=end" time="2025-01-17T11:55:50Z" level=info msg=statusLoop time="2025-01-17T11:55:50Z" level=debug msg="404 request not found" uri=/metrics vars="map[]"
Also i got error in oauth.log
192.168.23.11:2908 - b8a4a1f7-5596-484f-ab29-741ebfdedc63 - hakanaltindag [2025/01/17 11:57:17] 192.168.23.14:30443 POST /home/msevinc/.interlink.sock "/pinglink" HTTP/1.1 "Go-http-client/1.1" 502 2257 2.295
ConfigMap
`Name: cn04-vk-virtual-kubelet-config
Namespace: interlink
Labels: app.kubernetes.io/managed-by=Helm
Annotations: meta.helm.sh/release-name: cn04-vk
meta.helm.sh/release-namespace: interlink
Data
InterLinkConfig.yaml:
InterlinkURL: "https://192.168.23.14"
InterlinkPort: "30443"
ExportPodData: false
VerboseLogging: true
ErrorsOnlyLogging: false
ServiceAccount: "cn04-vk"
Namespace: "interlink"
VKTokenFile: /opt/interlink/token
CPU: "10"
Memory: "64Gi"
Pods: "10"
nvidia.com/gpu: "0"
HTTP:
Insecure: true
KubeletHTTP:
Insecure: true
BinaryData
Events:
`
Do you have any idea about this issue? or can you provide a detailed documentation of installing interlink in-edge node scenario?
The text was updated successfully, but these errors were encountered: