Skip to content

fix(perf/UX): Use num physical cores by default, warn about E/P cores #934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Apr 30, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 33 additions & 8 deletions examples/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#if defined (_WIN32)
#include <fcntl.h>
#include <io.h>
#include <windows.h>
#pragma comment(lib,"kernel32.lib")
extern "C" __declspec(dllimport) void* __stdcall GetStdHandle(unsigned long nStdHandle);
extern "C" __declspec(dllimport) int __stdcall GetConsoleMode(void* hConsoleHandle, unsigned long* lpMode);
Expand All @@ -23,17 +24,41 @@ extern "C" __declspec(dllimport) int __stdcall WideCharToMultiByte(unsigned int
#define CP_UTF8 65001
#endif

bool gpt_params_parse(int argc, char ** argv, gpt_params & params) {
// determine sensible default number of threads.
// std::thread::hardware_concurrency may not be equal to the number of cores, or may return 0.
int32_t get_num_physical_cores() {
#ifdef __linux__
std::ifstream cpuinfo("/proc/cpuinfo");
params.n_threads = std::count(std::istream_iterator<std::string>(cpuinfo),
std::istream_iterator<std::string>(),
std::string("processor"));
std::string line;
while (std::getline(cpuinfo, line)) {
if (line.find("cpu cores") != std::string::npos) {
line.erase(0, line.find(": ") + 2);
try {
return (int32_t) std::stoul(line);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there .erase applied to whole string? Why no checking find for result?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean. What do you suggest?

} catch (std::invalid_argument& e) {} // Ignore if we could not parse
}
}
#elif defined(__APPLE__) && defined(__MACH__)
int num_physical_cores;
size_t len = sizeof(num_physical_cores);
int result = sysctlbyname("hw.perflevel0.physicalcpu", &num_physical_cores, &len, NULL, 0);
if (result == 0) {
return (int32_t) num_physical_cores;
}
result = sysctlbyname("hw.physicalcpu", &num_physical_cores, &len, NULL, 0);
if (result == 0) {
return (int32_t) num_physical_cores;
}
#elif defined(_WIN32)
SYSTEM_INFO sysinfo;
GetNativeSystemInfo(&sysinfo);
return (in32_t) sysinfo.dwNumberOfProcessors;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this typo and the problems with the externs above (see CI checks), this won't build on Windows, so I assume it hasn't been tested at all on Windows?

#endif
if (params.n_threads == 0) {
params.n_threads = std::max(1, (int32_t) std::thread::hardware_concurrency());
return -1;
}

bool gpt_params_parse(int argc, char ** argv, gpt_params & params) {
// Clip if not a valid number of threads
if (params.n_threads <= 0) {
params.n_threads = std::max(1, std::min(8, (int32_t) std::thread::hardware_concurrency()));
}

bool invalid_param = false;
Expand Down
4 changes: 3 additions & 1 deletion examples/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,11 @@
// CLI argument parsing
//

int32_t get_num_physical_cores();

struct gpt_params {
int32_t seed = -1; // RNG seed
int32_t n_threads = std::min(4, (int32_t) std::thread::hardware_concurrency());
int32_t n_threads = get_num_physical_cores(); // (if <= 0, = clip(num_logical_cores, 1, 8))
int32_t n_predict = 128; // new tokens to predict
int32_t repeat_last_n = 64; // last n tokens to penalize
int32_t n_parts = -1; // amount of model parts (-1 = determine from model dimensions)
Expand Down