Skip to content
This repository has been archived by the owner on Dec 12, 2020. It is now read-only.

issue from ctrip: windows mesos agent crashed once an application created by marathon #1

Open
shiyan2016 opened this issue Mar 21, 2017 · 6 comments
Assignees

Comments

@shiyan2016
Copy link

shiyan2016 commented Mar 21, 2017

I use the master branch on windows 2016 datacenter desktop system with the commit 499173d395db40e753e86ec5847b2e3944b87c35, and running the mesos agent like this:
.\mesos-agent.exe --master=10.18.11.20:5050 --work_dir=D:\ --isolation=windows/cpu,filesystem/windows --hostname=10.18.62.225 --ip=10.18.62.225 --containerizers="docker,mesos" --log_dir=D:\mesos-log --launcher_dir=D:\mesos-build\mesos-master\build\src --runtime_dir=D:\mesos-runtime

when I create an application with marathon, the windows mesos agent will crashed with the errors:

slave.cpp:4610] Check failed: resource.has_allocation_info()
*** Check failure stack trace: ***
image

@lilyfang lilyfang assigned lilyfang and johnkord and unassigned lilyfang and johnkord Mar 21, 2017
@lilyfang
Copy link
Member

I could not repro the agent crash issue. Couple of the questions,

  • How do you create an application with marathon?
  • What application do you create?

@yongluo2013
Copy link

@shiyan2016 can you help post you environment info ?

@shiyan2016
Copy link
Author

@lilyfang I try to create the appliation running notepad using marathon with the json:
{
"id": "1",
"cmd": "notepad",
"cpus": 1,
"mem": 128,
"disk": 0,
"instances": 1
}.
And cretae the application running windows container using marathon with the json:
{
"id": "2",
"cmd": "ping -t localhost",
"cpus": 1,
"mem": 2048,
"disk": 20480,
"instances": 1,
"container": {
"docker": {
"image": "hub.cloud.ctripcorp.com/windows_container/windowsservercore",
"network": "HOST"
},
"type": "DOCKER"
}
}. The docker version is: 1.12.
Creating the two applications using marathon will make the agent crashed.

@shiyan2016
Copy link
Author

And the mesos master env set up refering this: http://www.cnblogs.com/ee900222/p/docker_2.html.

@lilyfang
Copy link
Member

lilyfang commented Apr 6, 2017

Based on investigation, I can only repro this issue when my setup has a mismatch version between mesos master and agent. Please make sure you install mesos master with mesos-1.2.0(+) if you build your mesos agent with the latest mesos mainstream source code.

@lilyfang
Copy link
Member

lilyfang commented Apr 8, 2017

Here is my steps,

  1. Set up the marathon, master and zookeeper(not the agent part) by following the steps at
    https://github.com/Microsoft/mesos-log/blob/master/notes/deployment.md
  2. Set up the agent by following the steps at
    https://mesos.apache.org/documentation/latest/windows/
  3. Try to create application notepad.exe via marathon,
    {
    "id": "1",
    "cmd": "notepad",
    "cpus": 1,
    "mem": 128,
    "disk": 0,
    "instances": 1
    }.
  4. Run the windows agent something as below,
    c:\mesos\build\src>mesos-agent.exe --master=zk://40.118.201.232:2181/mesos --work_dir=c:\mesos\w --runtime_dir=c:\mesos\w --launcher_dir=c:\mesos\build\src --isolation=windows/cpu,filesystem/windows --ip=10.0.0.7

The above combination basically will give you the latest windows agent + master 1.1.0. Then when you try to run windows agent, resource.has_allocation_info() will return NULL, the check which makes sure resource has_allocation_info() will fail.

The issue will go away when master 1.2.0 is used. We need to follow up with Mesosphere to understand how to make sure old master can work with the new agent next.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants