Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak #298

Open
ljlooper opened this issue Feb 10, 2025 · 6 comments
Open

Memory leak #298

ljlooper opened this issue Feb 10, 2025 · 6 comments
Labels
question Further information is requested

Comments

@ljlooper
Copy link

At first, a connection was created, and 100,000 pieces of data were inserted, followed by a random id query every 20 seconds, after the query, the rows were closed, and there was no query blocking, I confirmed that the code was OK, but the memory continued to grow from the initial 200M, and now has reached more than 1g, and continues to increase

@ljlooper ljlooper added the question Further information is requested label Feb 10, 2025
@jlmadurga
Copy link

I'm having also issues with memory. In my case I'm using Session and it seems there is a leak altough I'm closing it.

I'm case using:

  • session :memory:
  • chdb 3.0.0

Check this script to reproduce it with memray, I'm using a file with 3MB of JSON data.

import memray
import chdb
from chdb import session
import os
import time

def load_data(file_path):
    """Simulates loading a file into memory as a string."""
    with open(file_path, "r") as f:
        return f.read()

def chdb_memory_test(data):
    """Creates a chDB session, inserts data, and closes the session."""
    with session.Session(":memory:") as ses:
        ses.query("CREATE TABLE test (data String) Engine = Memory")
        ses.query(f"INSERT INTO test FORMAT JSONEachRow {data}")


if __name__ == "__main__":
    file_path = "events.json"

    data = load_data(file_path)

    with memray.Tracker("memray_report.bin"):
        for _ in range(50):  # Run multiple iterations to confirm leaks
            chdb_memory_test(data)
            time.sleep(0.1)  # Simulate processing time

    print("Memray tracking complete. Analyze with:")
    print("memray flamegraph memray_report.bin")

Image

Executing it there is a warning:
UserWarning: There is already an active session. Creating a new session will close the existing one. It is recommended to close the existing session before creating a new one. Closing the existing session :memory:

but I guess this is caused by not setting to None g_session on close() https://github.com/chdb-io/chdb/blob/main/chdb/session/state.py#L86

@auxten
Copy link
Member

auxten commented Feb 10, 2025

Thanks for reporting @jlmadurga @ljlooper. Let me check it.

@auxten
Copy link
Member

auxten commented Feb 11, 2025

@ljlooper are you also using the memory engine which is the default engine of Session and Connection

@ljlooper
Copy link
Author

yes, Please review my test code
`
func main() {

    db, err := sql.Open("chdb", "session=/opt/data/chdb_test_parquet;driverType=Parquet")
    if err != nil {
        fmt.Println("open:", err)
        return
    }
    defer db.Close()

    fmt.Println("open db success")
    _, err = db.Exec("create database if not exists tdb")
    if err != nil {
        fmt.Println("create database ", err)
        return
    }
    _, err = db.Exec(`CREATE TABLE IF NOT EXISTS tdb.test
            (
                id String,
                etag String,					  
                contents String
            )ENGINE = MergeTree()
            PRIMARY KEY (id)
            ORDER BY (id);`)
    if err != nil {
        fmt.Println("create table ", err)
        return
    }

    batchWrite(db)

    go func() {
        http.ListenAndServe("0.0.0.0:12346", nil)
    }()

    tk := time.NewTicker(20 * time.Second)
    for {
        <-tk.C
        // query(db)
        ConcurrencyQuery(10, db)
    }
}

func ConcurrencyQuery(n int, db *sql.DB) {

    var wg sync.WaitGroup
    for i := 0; i < n; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            query(db)
        }()
    }

    wg.Wait()
}

func batchWrite(db *sql.DB) {
    t0 := time.Now()
    count := 100000
    sqlBuf := bytes.NewBufferString("INSERT INTO tdb.test VALUES")
    for i := 1; i <= count; i++ {
        if i%1000 == 0 {
            sqlBuf.WriteString(fmt.Sprintf(` ('%d', 'xx','xx')`, i%1000))
            _, err := db.Exec(sqlBuf.String())
            if err != nil {
                panic("write: " + err.Error())
            }
            sqlBuf = bytes.NewBufferString("INSERT INTO tdb.test VALUES")
        }
        sqlBuf.WriteString(fmt.Sprintf(` ('%d', 'xx', 'xx'),`, i%1000))
    }
    fmt.Printf("write count: %d, used: %+v\n", count, time.Since(t0))
}

func query(db *sql.DB) {
    t0 := time.Now()
    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()

    row, err := db.QueryContext(ctx, fmt.Sprintf(`select count(*) from tdb.test where id='%d'`, rand.Intn(100)))
    if err != nil {
        panic("query:" + err.Error())
    }
    defer row.Close()

    var count = 0
    for row.Next() {
        err := row.Scan(&count)
        if err != nil {
            panic("scan: " + err.Error())
        }
    }
    fmt.Println("count:", count)
    fmt.Printf("qeury used: %+v\n", time.Since(t0))
}

`

@auxten
Copy link
Member

auxten commented Feb 11, 2025

@ljlooper Oh, You are using the chdb-go which is still using the libchdb 2.x.
I think you issue and @jlmadurga should be 2 different issues.
Any way, I will look into them.

@catundercar
Copy link

@ljlooper Oh, You are using the chdb-go which is still using the libchdb 2.x. I think you issue and @jlmadurga should be 2 different issues. Any way, I will look into them.

After preliminary investigation, I used Session and go SQL Driver, and tested the two driver types of ARROW and PARQUET, both of which have memory leakage problems. After the above code reduces the ticker interval, the phenomenon will be more obvious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants