Skip to content

Commit

Permalink
Merge pull request #772 from renweizhukov/patch-1
Browse files Browse the repository at this point in the history
Update 4.mdx
  • Loading branch information
Vaibhavs10 authored Feb 3, 2025
2 parents ab57fd6 + 832bdba commit fb04c75
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions chapters/en/chapter5/4.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -87,13 +87,13 @@ RAM used: 5678.33 MB
Here the `rss` attribute refers to the _resident set size_, which is the fraction of memory that a process occupies in RAM. This measurement also includes the memory used by the Python interpreter and the libraries we've loaded, so the actual amount of memory used to load the dataset is a bit smaller. For comparison, let's see how large the dataset is on disk, using the `dataset_size` attribute. Since the result is expressed in bytes like before, we need to manually convert it to gigabytes:

```py
print(f"Number of files in dataset : {pubmed_dataset.dataset_size}")
print(f"Dataset size in bytes: {pubmed_dataset.dataset_size}")
size_gb = pubmed_dataset.dataset_size / (1024**3)
print(f"Dataset size (cache file) : {size_gb:.2f} GB")
```

```python out
Number of files in dataset : 20979437051
Dataset size in bytes : 20979437051
Dataset size (cache file) : 19.54 GB
```

Expand Down

0 comments on commit fb04c75

Please sign in to comment.