Skip to content

Commit fb04c75

Browse files
authored
Merge pull request #772 from renweizhukov/patch-1
Update 4.mdx
2 parents ab57fd6 + 832bdba commit fb04c75

File tree

1 file changed

+2
-2
lines changed
  • chapters/en/chapter5

1 file changed

+2
-2
lines changed

chapters/en/chapter5/4.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -87,13 +87,13 @@ RAM used: 5678.33 MB
8787
Here the `rss` attribute refers to the _resident set size_, which is the fraction of memory that a process occupies in RAM. This measurement also includes the memory used by the Python interpreter and the libraries we've loaded, so the actual amount of memory used to load the dataset is a bit smaller. For comparison, let's see how large the dataset is on disk, using the `dataset_size` attribute. Since the result is expressed in bytes like before, we need to manually convert it to gigabytes:
8888

8989
```py
90-
print(f"Number of files in dataset : {pubmed_dataset.dataset_size}")
90+
print(f"Dataset size in bytes: {pubmed_dataset.dataset_size}")
9191
size_gb = pubmed_dataset.dataset_size / (1024**3)
9292
print(f"Dataset size (cache file) : {size_gb:.2f} GB")
9393
```
9494

9595
```python out
96-
Number of files in dataset : 20979437051
96+
Dataset size in bytes : 20979437051
9797
Dataset size (cache file) : 19.54 GB
9898
```
9999

0 commit comments

Comments
 (0)