google cloud vertex ai – Using Langchain with BigQuery – Error with tables containing RECORD fields

I’m trying to build a simple Text to Query pipeline using Langchain, BigQuery and Vertex LLM.
Initiating the langchain SQLDatabase Object works fine

from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *
from langchain import SQLDatabase, SQLDatabaseChain

dataset_id = 'ecomm'
table_uri = f"bigquery://{PROJECT_ID}/{dataset_id}"
engine = create_engine(f"bigquery://{PROJECT_ID}/{dataset_id}")

db = SQLDatabase(engine=engine,metadata=MetaData(bind=engine),include_tables=['retail_products'],sample_rows_in_table_info=3)

but then, when calling a

print(db.table_info)

I get following error:

BadRequest: 400 Cannot access field key on a value with type ARRAY<STRUCT<key STRING, value STRUCT<text ARRAY<STRING>, numbers ARRAY<FLOAT64>, searchable BOOL, ...>>> at [1:406]
[SQL: SELECT `retail_products`.`name`, `retail_products`.`id`, `retail_products`.`type`, `retail_products`.`primaryProductId`, `retail_products`.`collectionMemberIds`, `retail_products`.`gtin`, `retail_products`.`categories`, `retail_products`.`title`, `retail_products`.`brands`, `retail_products`.`description`, `retail_products`.`languageCode`, `retail_products`.`attributes`, `retail_products`.`attributes`.`key`, `retail_products`.`attributes`.`value`, `retail_products`.`attributes`.`value`.`text`, `retail_products`.`attributes`.`value`.`numbers`, `retail_products`.`attributes`.`value`.`searchable`, `retail_products`.`attributes`.`value`.`indexable`, `retail_products`.`tags`, `retail_products`.`priceInfo`, `retail_products`.`priceInfo`.`currencyCode`, `retail_products`.`priceInfo`.`price`, `retail_products`.`priceInfo`.`originalPrice`, `retail_products`.`priceInfo`.`cost`, `retail_products`.`priceInfo`.`priceEffectiveTime`, `retail_products`.`priceInfo`.`priceExpireTime`, `retail_products`.`rating`, `retail_products`.`rating`.`ratingCount`, `retail_products`.`rating`.`averageRating`, `retail_products`.`rating`.`ratingHistogram`, `retail_products`.`expireTime`, `retail_products`.`ttl`, `retail_products`.`ttl`.`seconds`, `retail_products`.`ttl`.`nanos`, `retail_products`.`availableTime`, `retail_products`.`availability`, `retail_products`.`availableQuantity`, `retail_products`.`fulfillmentInfo`, `retail_products`.`fulfillmentInfo`.`type`, `retail_products`.`fulfillmentInfo`... FROM `retail_products` LIMIT %

The table schema is:

name STRING NULLABLE
id STRING REQUIRED
type STRING NULLABLE
primaryProductId STRING NULLABLE
collectionMemberIds STRING REPEATED
gtin STRING NULLABLE
categories STRING REPEATED
title STRING REQUIRED
brands STRING REPEATED
description STRING NULLABLE
languageCode STRING NULLABLE
attributes RECORD REPEATED
tags STRING REPEATED
priceInfo RECORD NULLABLE
rating RECORD NULLABLE
expireTime STRING NULLABLE
ttl RECORD NULLABLE
availableTime STRING NULLABLE
availability STRING NULLABLE
availableQuantity INTEGER NULLABLE
fulfillmentInfo RECORD REPEATED
uri STRING NULLABLE
images RECORD REPEATED
audience RECORD NULLABLE
colorInfo RECORD NULLABLE
sizes STRING REPEATED
materials STRING REPEATED
patterns STRING REPEATED
conditions STRING REPEATED
retrievableFields STRING NULLABLE
publishTime STRING NULLABLE
promotions RECORD REPEATED

Seems the problem is coming for RECORD type fields on my table.
This works fine on a simple table without nested fields

How can I fix this ?

Read more here: Source link