Data Science

Google BigQuery Quick Start Tutorial

Introduction to Google BigQuery

Technical Rundown

BigQuery is a columnar database, this is built using Google’s own Capacitor framework and features nested and repeated fields.

BigQuery databases can take a variety of data types as inputs and is a great fit for semi-structured data. Nested fields like totals (visits etc) and others are used to keep storing data affordable and fast. Similar databases are Redshift or Parquet.

Querying BigQuery can be done in either standard or legacy SQL depending on the flavor you prefer. To access your Google Cloud Console (where you run queries) follow this link.


Google BigQuery Tutorial & Examples

Running Queries

To get started running queries, I suggest using The Google BigQuery Cookbook, this is your one stop shop for questions, details, and samples to help you get more familiar. Please note the following data nuances:

  • Tables are broken out by day meaning that you will need to use a wildcard, or * to pull a larger date range.
  • There is also an “intraday” table that will give you data for the last 24 hours. This data is streaming an updating all of the time.

Screen Shot 2018 10 18 at 12.29.33 PM

Visits over the last 36 months:

SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN
FORMAT_DATE(‘%Y%m%d’,DATE_SUB(CURRENT_DATE(), INTERVAL 36 MONTH))
AND
FORMAT_DATE(‘%Y%m%d’,DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY date
ORDER BY date ASC


Transactions by device browser & type:

SELECT
device.browser,
SUM ( totals.transactions ) AS total_transactions
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN ‘20170701’ AND ‘20170731’
GROUP BY
device.browser
ORDER BY
total_transactions DESC

Exploring Data

Now that you have successfully run a query or two, it is probably time to start analyzing! From the console, you can hit the “Explore in Data Studio” button that looks like this:

Google Data Studio Explorer Tool:

Screen Shot 2018 10 18 at 12.29.51 PM

Exporting Data

If you’d like to save a query result use the button that says “Save Results”:

Screen Shot 2018 10 18 at 1.24.25 PM

This will allow you to select from a variety of data outputs to make sharing and analysis easy. If you’d like to save the data as a SQL view, you can use the “Save View” feature.

Screen Shot 2018 10 18 at 1.25.36 PM

Google BigQuery is a powerful tool that drives real business value by combining and storing data from a plethora of sources. With the easy-to-use interface, any standard SQL user can feel right at home with BigQuery’s capabilities. Want to learn more about the Google product suite? Start here.