Skip to content

DBR version for Zingg 0.5.0 #1201

@robertohonores

Description

@robertohonores

We are currently testing Zingg 0.5.0 with Databricks. However, when using DBR 14.3 or 15.4 (both with Spark 3.5.0), we encounter the following error:

Py4JError: zingg.common.client.util.ColName does not exist in the JVM
File <command-6215136635607487>, line 7
      4 import time
      5 import uuid
----> 7 from zingg.client import Arguments, ClientOptions, ZinggWithSpark
      8 from zingg.pipes import Pipe, FieldDefinition, MatchType
     10 from ipywidgets import widgets, interact
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/zingg/client.py:214
    211 else:
    212     setupJVMAndSpark()
--> 214 setupJVMBaseObjects()
    217 def getDfFromDs(data):
    218     """Method to convert spark dataset to dataframe
    219 
    220     :param data: provide spark dataset
   (...)
    223     :rtype: DataFrame
    224     """
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/zingg/client.py:202, in setupJVMBaseObjects()
    200 global ZinggOptions
    201 global LabelMatchType
--> 202 ColName = getJVM().zingg.common.client.util.ColName
    203 MatchType = getJVM().zingg.common.client.MatchTypes
    204 ZinggOptions = getJVM().zingg.common.client.ZinggOptions

Also, it appears in the logs this error.

ERROR DatabricksMain$DBUncaughtExceptionHandler: Uncaught exception in thread Thread-127!
java.lang.UnsupportedClassVersionError: zingg/common/client/util/ColName has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
	at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:152)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
	at com.databricks.backend.daemon.driver.ClassLoaders$ReplWrappingClassLoader.loadClass(ClassLoaders.scala:65)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:406)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at py4j.reflection.CurrentThreadClassLoadingStrategy.classForName(CurrentThreadClassLoadingStrategy.java:40)
	at py4j.reflection.ReflectionUtil.classForName(ReflectionUtil.java:51)
	at py4j.reflection.TypeUtil.forName(TypeUtil.java:243)
	at py4j.commands.ReflectionCommand.getUnknownMember(ReflectionCommand.java:175)
	at py4j.commands.ReflectionCommand.execute(ReflectionCommand.java:87)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
	at java.lang.Thread.run(Thread.java:750)

When using DBR 16,0, 16.2, and 16.4 with Spark 3.5.2 and Scala 2.12, we instead receive a serialization error:

Py4JJavaError: An error occurred while calling o549.execute.
: zingg.common.client.ZinggClientException: org.apache.spark.sql.types.StringType$; local class incompatible: stream classdesc serialVersionUID = 3796071416192072411, local class serialVersionUID = 7529903822443873529
	at zingg.common.core.executor.Matcher.execute(Matcher.java:206)
	at zingg.common.client.Client.execute(Client.java:281)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:197)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:117)
	at java.base/java.lang.Thread.run(Thread.java:840)

Could you please advise which DBR version we should use?
Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions