Analytics

Google BigQuery FAQs Answered

Frequently Asked Questions

What is BigQuery?

BigQuery was introduced by Google as a scalable solution to data warehousing to give enterprises more flexibility with their data. BigQuery is a serverless cloud data warehouse that allows users to query and join various datasets in a user-friendly interface at a low cost.

Querying with BigQuery is fast and cost-effective, and enables users to pull insights from massive datasets in a matter of seconds! BigQuery easily connects to a host of analytics platforms and offers native integration with Google Analytics 360.

If you’re looking for a quick start tutorial for BigQuery, check out our post here. Below we’ll walk through the most frequently asked questions (FAQs) we get about BigQuery. Let’s get started!

What language does BigQuery use? Do I need to know how to program?

BigQuery uses SQL, or Structured Query Language, which is a language used to interact with relational databases such as Google BigQuery. While SQL isn’t a difficult language to learn, it is necessary to have a cursory knowledge of this language when working with BigQuery.

However, you by no means need to be an expert! Throughout this post, we’ll show you some ways to query common reports in GA as well as some other tips and tricks to get you started writing your very own queries.

There are a number of resources to learn basic SQL, in addition to Google provided resources. See below for a list of resources that you can use to brush up on your querying skills!

Google Cloud Documentation – Standard SQL Query Syntax:

List of query examples that you can use in BigQuery provided by Google. This documentation gets into some of the differences that you might encounter in this interface as opposed to working with a standard SQL IDE.

Kaggle – Getting Started with SQL and BigQuery

This is a great place to start to learn some of the specific queries used in BigQuery, as well as suss out some of the specific nuances in the BigQuery interface. Kaggle also has a great free resource to brush up on other SQL concepts as well!

Codecademy – Learn SQL (Free)

This beginner course provided by Codecademy is a great place to start brushing up on your basic SQL concepts and query syntax. While there are paid courses available, this free course is a great place to start on your journey to learning SQL!

DataCamp – All SQL Courses (Paid)

Browse a repository of a number of SQL courses! While you can start Datacamp for free, there is a subscription required to continue your learning. However, as far as paid subscriptions go, the DataCamp courses offer great interactive opportunities.

SQLZoo (Free)

SQLZoo is a free resource that covers some basic SQL concepts such as Selects, Joins, and Aggregations. Although not as structured as the classes above, it’s a great free resource that lets you learn the ropes and do some basic exercises.

W3 Schools – SQL Tutorial (Free)

A great online resource for referencing a lot more specific syntaxes and queries. While this isn’t a great resource for practicing querying yourself, it’s an invaluable site to reference as you work through some of these other courses!

Udemy SQL Courses (Paid)

Another plug for great paid SQL courses. This is a general repository of individually priced SQL courses for you to look at. If a subscription service isn’t your cup of tea and you’d rather pay upfront for a complete course, look here for your preferred courses!

Do I need to be connected to the internet to use BigQuery? Is there an offline mode that I can work in?

As of now, there’s no way to execute queries while you’re offline. While you can write these queries without any problems in the editor, you’ll need to be online in order to run them as well as save them in the interface. If you do try to run queries or query data from a table offline, you’ll see the following error in the interface:

pasted image 0 29

Does BigQuery have any public datasets available to integrate with my data?

Yes, it does! Google has provided over 100 datasets in a public library that is easily accessible on the GCP Marketplace. You can connect to these public datasets through any project.

These datasets range from data for finance to transportation to health and everything in between. Leverage these as initial training materials for BigQuery, or connect them with your business data to determine how disparate datasets can be connected for valuable insights.

How much does BigQuery cost?

BigQuery has a very flexible and scalable pricing structure.

While this approach often lessens data warehousing costs for enterprises, it is important to carefully consider how heavily BigQuery will be used for querying and storing your data to understand the costs that are associated with the tool. There are some features that are free of charge in BigQuery such as exporting, copying, loading and deleting data.

The three operations that BigQuery charges for are (1) storage, (2) querying and (3) inserts. Within these operations are a variety of different payment structures, included in a breakdown from Google Cloud Platform below.

BQ Pricing

BigQuery and Google Analytics 360 Integration

Interested in the integration between BigQuery and GA360? Continue reading below for more insights!

Why should I set up BigQuery with Google Analytics 360?

While the Google Analytics interface provides enhanced capabilities for 360 customers, taking the next step and pushing GA data into BigQuery provides users with the opportunity to pull insights at a highly flexible, granular level. BigQuery allows Google Analytics users to avoid any sampling within standard GA reports.

Nervous that the use of complex segments or additional dimensions will cause your data to be sampled in the GA interface? BigQuery is your answer. It also further enables users to automate their process to extract, transfer, and load data. This means that users can analyze their GA data through automated output from BigQuery without having to navigate multiple reports within the GA interface.

Additionally, data in BigQuery gives users a considerably more granular look at the data from their website. While Google Analytics processes and pre-aggregates all data from a website prior to showing it to users through standard reports.

BigQuery provides access to unchartered territory – raw Google Analytics data. Access to raw data enables BigQuery users to use raw, hit-level data in their analyses. This means that session-level and user-level data can be (carefully) combined within reports to your heart’s content.

mindblown

BigQuery makes it a whole lot easier to join other datasets together that are relevant to your business as a whole.

Do you have a ton of data on your customers or finances from a third party source? Use BigQuery to combine your Google Analytics data with other data that continues to drive business decisions and strategy.

Does the BigQuery GA360 schema give me access to the same data that I see in Google Analytics?

No. As mentioned above, the Google Analytics schema within BigQuery is slightly different from what you would see in the GA interface. You get hit level data, which is collected at each individual user interaction.

The schema table features a row for each session on the site. This row features a unique visitID (unique session identifier), full visitorID (client ID), channel grouping, total number of pageviews and hits within the session, and other overall information about the session. Nested under this row for the session are rows for each of the hits that occured within the session.

Unfamiliar with dimensions, metrics, and their scopes? Learn the basics in this post.

A variety of different information is included in the schema for each individual hit, including the hit number, hit time, hit page path, whether the hit was an interaction, & more.

Not all of your favorite standard dimensions and metrics from the Google Analytics interface are available within BigQuery, due to the way by which Google Analytics processes and pre-aggregates certain fields to simplify analysis within the interface.

Examples of elements that are not available within the BigQuery interface are sessions, landing page, and time on page. The schema is linked here.

How many Google Analytics views can I export to BigQuery?

You can only export one view per Google Analytics property.

When selecting which view to export, it is important to consider which views have been customized with various changes to the View Settings (traffic filters, content groupings, channel settings, etc.), or which views have the most historical data.

The view that you choose to push to BigQuery will depend on use cases for your data. We recommend selecting the view with the most data, universal customization, and essential filters that have cleaned your data (such as bot filters).

How much of my data will backfill?

Upon initially setting up the Google Analytics 360 BigQuery integration, GA will export 13 months or 10 billion hits of historical data to BigQuery – whichever one is smaller.

The backfilling process may take up to four weeks. The backfilling process will only happen once per Google Analytics view. If you choose to unlink a view from BigQuery and relink it again in the future, the backfilling process will not occur again.

Womp womp.

So be confident when selecting the GA view for linking to BigQuery!

Can I use BigQuery without GA360?

There is no native integration available for non-GA360 users. This means there is no way to set up a consistent stream of your Google Analytics data from the interface using the standard schema for GA360 data in BigQuery.

However, there are various third party tools that make the GA BigQuery connection easy to handle.

Additionally, you have the ability to export various reports from the Google Analytics interface and upload them to BigQuery to combine your data with other third party data sources.

What is an example of a basic query in BigQuery?

The following query will give you the number of users, visits, and transactions by date for the last 1000 days.

The major elements of the query include:

SELECT statement: querying for date, number of users (calculated by counting the total number of unique fullVisitorIds), visits, and transactions.

FROM statement: querying from the Sample Google Analytics Data Set, and using the * in place of date to allow for a date range to be defined through WHERE statement.

WHERE statement: querying the data set for GA session tables whose table suffixes fall within the selected 1000-day date range, using the FORMAT_DATE and DATE_SUB functions.

GROUP BY statement: grouping the output by date

ORDER BY statement: ordering output by descending date

Query Example
pasted image 0 27

Portion of Output
pasted image 0 26

Can I add data from a flat file such as Excel .xlsx or .csv’s into BigQuery? Can I upload data from Google Drive?

Yes, you can upload data from flat files and from Google Drive as a table into BigQuery! Navigate to the dataset that you want to add the table to and hit the CREATE TABLE button on the right side of the interface.

pasted image 0 23

After hitting CREATE TABLE, you can browse flat file uploads, Drive uploads, GCS uploads, and Google Cloud BigTable uploads.

pasted image 0 24

Adding this data will nest it under the Dataset that you selected above as a table that can now be queried!

How do other products in the Google Marketing Platform integrate with BigQuery? Can I import Google Ads data into BigQuery?

Yes, you can import Google Ads data into BigQuery! Through the Data Transfer Service functionality in GCP, you can add data from a variety of Google products, including Campaign Manager, Google Ads, Google Cloud Storage, and even Youtube Channels and Redshift.

pasted image 0 28

You can also hit the EXPLORE DATA SOURCES to the right and explore other connectors, such as Supermetrics and FiveTran connectors.

pasted image 0 25

How does BigQuery compare to other data warehouses such as Amazon Redshift?

This is a great question and one that many businesses face everyday,

We want a data warehouse, but what are the differences and which one is right for my business needs?

Let’s run through a quick comparison of BigQuery and another market leader Redshift (i.e. the Burger King v. McDonalds of data warehouses).

Both of these products were released around the same time (2011-12) and have many similarities. But we are going to look at some of the key differences between the products.

Features BigQuery Redshift
Platform
  • Google Cloud Platform (GCP)
  • Amazon Web Services (AWS)
Usability
  • Sign up for an account and query data same day
  • No configuration
  • Automatically scales
  • Requires administrators to configure environment
  • Periodic management and maintenance requirements
Pricing
  • Infrastructure as a Service (IaaS)
  • Requires provisioning compute and storage nodes
Column Limits
  • 10,000
  • 1,600

The main points of difference between the two products at a glance are column limits and usability.

BigQuery has a much more simplistic and “ready out of the box” setup compared to Redshift that requires more technical expertise when setting up your data warehouse.

Column limits are another area where there is a sizeable difference and BigQuery can handle over 6x the number of columns as Redshift.


Still Have Questions?

We have answered many questions in this post – so now it’s our turn, is BigQuery right for your business needs? Talk to a Seer expert today and find out what’s right for your company! You can contact us for more here.

Subscribe to our newsletter for more tutorials, tips and tricks.