@@ -70,22 +70,43 @@ Example of using the Storage API from Databricks:
70
70
dbutils.library.installPyPI(" google-cloud-bigquery" , " 1.16.0" )
71
71
dbutils.library.restartPython()
72
72
73
- # Read one day of pings and select a subset of columns.
73
+ from google.cloud import bigquery
74
+
75
+
76
+ def get_table (view ):
77
+ bq = bigquery.Client()
78
+ view = view.replace(" :" , " ." )
79
+ # partition filter is required, so try a couple options
80
+ for partition_column in [" DATE(submission_timestamp)" , " submission_date" ]:
81
+ try :
82
+ job = bq.query(
83
+ f " SELECT * FROM ` { view} ` WHERE { partition_column} = CURRENT_DATE " ,
84
+ bigquery.QueryJobConfig(dry_run = True ),
85
+ )
86
+ break
87
+ except Exception :
88
+ continue
89
+ else :
90
+ raise ValueError (" could not determine partition column" )
91
+ assert len (job.referenced_tables) == 1
92
+ table = job.referenced_tables[0 ]
93
+ return f " { table.project} : { table.dataset_id} . { table.table_id} "
94
+
95
+
96
+ # Read one day of main pings and select a subset of columns.
74
97
core_pings_single_day = spark.read.format(" bigquery" ) \
75
- .option(" table" , " moz-fx-data-shared-prod.telemetry_stable.core_v10 " ) \
98
+ .option(" table" , get_table( " moz-fx-data-shared-prod.telemetry.main " ) ) \
76
99
.load() \
77
100
.where(" submission_timestamp >= to_date('2019-08-25') submission_timestamp < to_date('2019-08-26')" ) \
78
101
.select(" client_id" , " experiments" , " normalized_channel" )
79
102
```
80
103
81
104
A couple of things are worth noting in the above example.
82
105
83
- * You must supply an actual _ table_ name to read from here, fully qualified
84
- with project name and dataset name.
106
+ * ` get_table ` is necessary because an actual _ table_ name is required to read
107
+ from BigQuery here, fully qualified with project name and dataset name.
85
108
The Storage API does not support accessing ` VIEW ` s, so the convenience names
86
109
such as ` telemetry.core ` are not available via this API.
87
- You can find the table corresponding to a given view using the BigQuery
88
- console or using Data Catalog.
89
110
* You must supply a filter on the table's date partitioning column, in this
90
111
case ` submission_timestamp ` .
91
112
Additionally, you must use the ` to_date ` function to make sure that predicate
0 commit comments