Skip to content

Commit 4a35486

Browse files
committed
Merge remote-tracking branch 'origin/candidate-10.0.x' into candidate-10.2.x
Signed-off-by: Gordon Smith <[email protected]> # Conflicts: # commons-hpcc/pom.xml # dfsclient/pom.xml # pom.xml # spark-hpcc/pom.xml # wsclient/pom.xml
2 parents 6f31b66 + 479757d commit 4a35486

File tree

1 file changed

+25
-3
lines changed

1 file changed

+25
-3
lines changed

spark-hpcc/Examples/PySparkExample.ipynb

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -119,12 +119,12 @@
119119
"- **recordSamplingSeed**: *Optional* The seed that controls the random generation used for sampling. The same seed against the same HPCC cluster and HPCC platform version should result in the same sampling.\n",
120120
"- **projectList**: *Optional* The columns that should be read from the HPCC Systems dataset.\n",
121121
"- **useTLK** *Optional* Defaults to false, determines whether or not the TLK (Top Level Key) should be used when reading index files.\n",
122-
"- **fileParts** *Optional* List of file parts to read; supports a comma-separated list of files, file part ranges, or a combination of both.\n",,
122+
"- **fileParts** *Optional* List of file parts to read; supports a comma-separated list of files, file part ranges, or a combination of both.\n",
123123
"- **stringProcessing** *Optional* Comma separated list of processing rules to apply to strings available rules: [NONE, TRIM, TRIM_FIXED, EMPTY_TO_NULL]. Default behavior is NONE. TRIM will apply left and right trim all strings, TRIM_FIXED will apply left trim and right trim for only fixed length strings, and EMPTY_TO_NULL will convert empty strings to a null value.\n",
124-
"- **unsignedEightToDecimal** *Optional* By default Unsigned8 values in HPCC are read into a Java long, which will result in some values overflowing and becoming negative. If the Unsigned8 value is used a a unique identifier this is acceptable, but for cases where the numeric value is needed the Unsigned8 can be read into a Decimal value which supports the entire Unsigned8 range.\n"
124+
"- **unsignedEightToDecimal** *Optional* By default Unsigned8 values in HPCC are read into a Java long, which will result in some values overflowing and becoming negative. If the Unsigned8 value is used a a unique identifier this is acceptable, but for cases where the numeric value is needed the Unsigned8 can be read into a Decimal value which supports the entire Unsigned8 range.\n",
125125
"\n",
126126
"---\n",
127-
"## Troubleshooting:\n",
127+
"## Troubleshooting Reads:\n",
128128
"- **Empty dataset**: This typically indicates that an error occured on one or more of the worker nodes during the read process, the worker logs should contain more information about the particular failure.\n"
129129
]
130130
},
@@ -357,6 +357,28 @@
357357
" for f in futures:\n",
358358
" f.result() # Wait for completion"
359359
]
360+
},
361+
{
362+
"cell_type": "markdown",
363+
"id": "a4149a90",
364+
"metadata": {},
365+
"source": [
366+
"\n",
367+
"# General Troubleshooting Tips\n",
368+
"\n",
369+
"---\n",
370+
"\n",
371+
"- **Issues with individual file parts:** Potential problems such as slow transfer speeds for a particular file part can be diagnosed by using the fileParts option during the read process to selectively read individual file parts and compare transfer speeds.\n",
372+
"\n",
373+
"```python\n",
374+
" readDf = spark.read.load(format=\"hpcc\",\n",
375+
" host=\"http://127.0.0.1:8010\",\n",
376+
" cluster=\"mythor\",\n",
377+
" path=\"/spark/test/dataset\",\n",
378+
" fileParts=\"13\")\n",
379+
"```\n",
380+
"- **Network Congestion:** Reading data between distributed clusters (Spark and HPCC) opens many concurrent TCP connections, which can overwhelm network links or gateways. On the receiving cluster, this appears as slow transfer speeds and file access timeouts; on the sending cluster, it shows up as large TCP send queues and dropped packets. To mitigate this, limit the number of concurrent Spark read tasks, either by reducing executor cores per job or by limiting the number of workers."
381+
]
360382
}
361383
],
362384
"metadata": {

0 commit comments

Comments
 (0)