Create BigQuery Table using Pandas Dataframe from Google Compute Engine
If you are working in Google Compute Engine (GCE) through VM Instances, you can create BigQuery table from pandas dataframe. Each column of dataframe will create a separate field in the BigQuery table.
In below, I have described how you can do this.
Step1: Change API access scopes on GCP VM instance
Change the access scopes to Allow full access to all Cloud APIs
from Allow default access
in your VM Instance to give permission to your VM Instance to write on BigQuery API.
Step 2: Install dependencies
We will use pandas_gbq module to create bigquery table from our pandas dataframe. So, first, install the module in your virtual environment.
# using pip
$ pip install pandas-gbq -U# using conda
$ conda install pandas-gbq --channel conda-forge
Step 3: Create a dataset in BigQuery home page
Go to your BigQuery console, you can go to your BigQuery console page using the following URL.
https://console.cloud.google.com/bigquery?project=your-project-name# replace your-project-name with the project name that you're
# working on GCP.
Create a new dataset on your bigquery page, if there isn’t. Lets’s say we created a dataset called finance-dept
.Now, we want to create a table for our employee information, let’s say the table name would be emp_info
.
Step 4: Code to create BigQuery table using Pandas Dataframe
code to create the bigquery table:
Here, in the above code,
In our case, dataset_tablename
would be finance-dept.emp_info
. gcp_project_name
is your GCP Project ID
. This will create a schema automatically for your table based on each columns’ data type. You don’t need to create it manually.
There are 3 different methods available for the if_exists
parameter. See the details of each method on its documentation page.
Now, you can find the table that has been created by the above code in the left panel by visiting the following URL:
https://console.cloud.google.com/bigquery?project=<PROJECT-ID>
Replace the <PROJECT_ID>
with your GCP Project ID.