Skip to content

Container uptime can differ from docker ps statistics #1607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Mr-Right2 opened this issue Mar 7, 2017 · 10 comments
Closed

Container uptime can differ from docker ps statistics #1607

Mr-Right2 opened this issue Mar 7, 2017 · 10 comments

Comments

@Mr-Right2
Copy link

Mr-Right2 commented Mar 7, 2017

When I use the container_start_time_seconds metric to substract it from the current time, I expect this to be equal to the container uptime as can be obtained from running docker ps.

However, I've noticed that this is not the case. When I create (for example) the following alert:
(time() - container_start_time_seconds{image!="",id=~"^/docker.*",name="mongodb"} ) < 60

I receive alert from time to time that my container restarted. When I look at the output of docker ps, I see that the container did not crash or restart at all.

For example, the current output of docker ps:

CONTAINER ID    IMAGE                         COMMAND                  CREATED       STATUS        PORTS                                                              NAMES
b3bf5be016b3    nginx:1.9.9                   "nginx -g 'daemon off"   2 weeks ago   Up 2 weeks    80/tcp, 0.0.0.0:9000->9000/tcp, 443/tcp, 0.0.0.0:9100->9100/tcp    nginx-metrics
13e57876fb6a    google/cadvisor:v0.24.1       "/usr/bin/cadvisor -l"   2 weeks ago   Up 2 weeks    8080/tcp                                                           cadvisor
89640d57f4a0    prom/node-exporter:v0.13.0    "/bin/node_exporter"     2 weeks ago   Up 2 weeks    9100/tcp                                                           node-exporter
c5b90ebe790f    mongo:3.2.9                   "/entrypoint.sh mongo"   2 weeks ago   Up 2 weeks    127.0.0.1:27017->27017/tcp                                         mongodb

And the current value of time() - container_start_time_seconds for these containers:
containers

I'm using:
cAdvisor version v0.24.1
Docker version 1.12.1 (build 23cf638)
Docker-compose version 1.8.0 (build f3628c7)

I wonder if this is a known bug, or if it might be possible to use and include the container uptime information from the docker stats, e.g. the same information as shown by from sudo docker inspect --format='{{.State.StartedAt}}' cadvisor

@Mr-Right2 Mr-Right2 changed the title Container uptime is can differ from docker ps statistics Container uptime can differ from docker ps statistics Mar 7, 2017
@dashpole
Copy link
Collaborator

We generally try not to rely on docker's APIs. What event happened 19 hours ago? Did cAdvisor restart?

@Mr-Right2
Copy link
Author

Mr-Right2 commented Mar 22, 2017

I cannot find any events at that time that might have caused the issue. The same thing happened a few times more but cAdvisor did not restart (docker ps shows Up 2 weeks as you can see)

@yongzhang
Copy link

+1 I have the same issue.

@yongzhang
Copy link

Any update on this, I have plenty of invalid container restart alerts which makes me crazy...

@maptile
Copy link

maptile commented Nov 1, 2017

Any update on this? We have meet some problems described in #1704 in 0.26.2 and fall back to 0.25.0, then meet this problem. Aha, still receive some restart emails per day. Hope it can be resolved in the near future.

The graph in Prometheus
screenshot from 2017-11-01 10-38-04

The docker ps

CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS                     NAMES
5f769338aef8        cadvisor:latest          "/usr/bin/cadvisor..."   43 hours ago        Up 43 hours         0.0.0.0:32779->8080/tcp   production_cadvisor_1
b68be65fec5f        couchdb                  "couchdb"                5 days ago          Up 5 days           0.0.0.0:32782->5984/tcp   couchdb
2ae4f05dd35c        alpine-discover:latest   "node index.js"          10 days ago         Up 10 days          0.0.0.0:8379->8379/tcp    production_discover_1
9ee6800151c4        node-exporter:latest     "/bin/node_exporter"     10 days ago         Up 10 days          0.0.0.0:32769->9100/tcp   production_node_exporter_1

@hairyhenderson
Copy link

I've been experiencing this bug fairly consistently lately with cAdvisor 0.28.0. After some seemingly-random period of time, the container_start_time_seconds starts climbing:

screen shot 2017-11-14 at 22 23 15

It's to the point now that I need to disable my "container flapping" alert entirely. I can't trust this metric anymore.

@dashpole
Copy link
Collaborator

I dont have bandwidth to look at this right now. If someone has time to dig into this, that would be greatly appreciated.

@sentinelt
Copy link
Contributor

Looks like the problem comes from taking the container creation time from cgroups properties.
I suspect cgroup.clone_children property of the cgroup is updated although the container has not really been restarted.

Here is the part from container/common/helpers.go responsible for this:

      // Assume unified hierarchy containers.
      // Get the lowest creation time from all hierarchies as the container creation time.
      now := time.Now()
      lowestTime := now
      for _, cgroupPath := range cgroupPaths {
          // The modified time of the cgroup directory changes whenever a subcontainer is created.
          // eg. /docker will have creation time matching the creation of latest docker container.
          // Use clone_children as a workaround as it isn't usually modified. It is only likely changed
          // immediately after creating a container.
          cgroupPath = path.Join(cgroupPath, "cgroup.clone_children")
          fi, err := os.Stat(cgroupPath)
          if err == nil && fi.ModTime().Before(lowestTime) {
              lowestTime = fi.ModTime()
          }
      }
      if lowestTime != now {
          spec.CreationTime = lowestTime
      }

dashpole added a commit that referenced this issue Dec 20, 2017
fix #1607; use container creation time provided by Docker handler
dashpole added a commit that referenced this issue May 23, 2018
Cherrypick: fix #1607; use container creation time provided by Docker handler
@I3olle
Copy link

I3olle commented Jun 6, 2018

Also having this issue. Weirdly enough I so far only have it on a single host though. Not sure why

@dashpole
Copy link
Collaborator

dashpole commented Jun 7, 2018

It turns out cgroup files are not actual files. If the dentry caches are cleared, the next time the file is touched it has a new modification time. I believe this can occur when running low on memory.

I havent found a good way to reliably get the start time of a cgroup. We use the start time for docker containers that docker gives us, so that shouldn't be a problem. but for other cgroups and runtimes it may be an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants