CUDA API support: cudaHostAlloc and cudaFreeHost

Need to implement cudaHostAlloc and cudaFreeHost to support vLLM.
Test case is in:
https://github.com/QuarkContainer/Quark/commit/16bf3d2ec375b54aff6789257754dc2eff27df8c


To build:
`nvcc -cudart shared test_cudahostalloc.cpp -o test_cudahostalloc -lcuda`

To Run:
`./test_cudahostalloc 1024 1024`