What Marketing Data is Safe to Use in ChatGPT?

chatgpt public vs private

ChatGPT is the shiniest tool in the marketer’s toolbox these days. There’s no shortage of blog posts about prompting, but we’ve found that there aren’t many resources available about how to govern the use of this technology. To that end, as Seer develops protocols and guidance for our team, we intend on sharing these publicly to help others set their own policy. That said, we advise all organizations to seek legal counsel as part of finalizing your own policy.

In an ideal world, we’d opt to run all queries locally via an open source LLM. Then, the data you enter into the model is as safe as any data you enter into your computer. We aren’t yet living in an ideal world though, and our best bet for now is to safely use a model like ChatGPT. At Seer, we use ChatGPT for Teams which ensures the conversations we have are not collected and used as training data for future queries.

Even with this level of security, we recommend thinking critically about all data you include in conversations with ChatGPT. We sort all data into three main categories: public data, non-public data, and PII.

Public Data: Data that ChatGPT can (probably) already access

This includes data that may or may not be behind a paywall, but is currently available to anyone with access. At Seer, we consider these data sources to be a ‘green light’ to use for both Seer internal and client use cases within ChatGPT.  Some examples of public data include:

  • Backlinks & Internal Links
      • Ahrefs, SEMRush, Majestic
  • Scraped Client or Competitor Websites
      • Screaming Frog
  • Public Financial Reports
  • Competitor Intelligence
      • SimilarWeb
      • Sparktoro
  • Keyword Reports & Search Results
      • Organic Rankings, PAAs, Titles and Descriptions
      • SEMRush, Ahrefs
  • Google Trends
  • Public Business Profiles
      • Google / Bing My Business Profiles
  • Monthly Search Volume
      • Google Ads / Keyword Planner
      • DataforSEO API
  • Public Customer Feedback
      • Reddit, Quora, Yelp, G2
      • Amazon Reviews, E-comm Reviews
      • Public Social Media posts
  • Public Demographic Info
    • US Census

Non-Public Data: Data that ChatGPT can’t currently access

This includes data that is proprietary and/or confidential. We won’t use this data unless our client has opted into our ability to use it. We believe it is safe, and the upside greatly outweighs the potential risk, but we place this decision in our clients’ hands.

  • Google / Bing Ads Data: Detailed advertising performance metrics, including impressions, conversions, costs, and clicks.
  • Google /Adobe Analytics Data: Website analytics, such as user sessions, behavior, bounce rates, and event tracking.
  • Google / Bing Search Console Data: Insights on website search performance, including clicks, click-through rates (CTRs), and impressions.
  • Google / Bing My Business Data: visits, phone calls, or local profile analytics.
  • Internal Client Communications: Correspondence and communications containing strategic or operational insights.
  • Paid Social Metrics: Performance data from social media advertising campaigns, including engagement rates, reach, and conversion metrics.
  • Client Revenue Data: Financial information related to sales, revenue figures, and overall financial performance.
  • CRM Data: Hubspot, Salesforce, lead data, excluding PII.
  • Materials Used for Internal Presentations: Strategy documents, internal reports, and presentations containing confidential business insights.
  • Customer Information & Interviews: Detailed records of customer preferences, feedback, and interactions obtained through direct communications (stripped of PII)

Personally Identifiable Information: Data that public models should never access

The type of marketers we work with at Seer want nothing to do with PII. The potential risk of improperly using PII way outweighs any upside. That said, there are times when PII is inadvertently part of our data sets: hardcoded in analytics data or mixed within audience data used for analysis.

This data should never be entered into any large language model for both ethical and legal reasons. These types of data include:

  • Full Names
  • Email Addresses
  • Home Addresses
  • Phone Numbers
  • Login Names or Screen Names

Moving Forward: Balancing Regulatory Compliance with Innovation

In short, if your organization is serious about adopting these new technologies then you must also be serious about regulatory compliance. Agencies owe this to their clients, and brands owe this to their customers. These new technologies are only becoming more powerful, and the use cases will continue to disrupt the digital marketing community. Now is the time to establish your own governance policies and ensure that your team and your clients are on the same page with what you will (and won’t) do with their data.


We love helping marketers like you.

Sign up for our newsletter to receive updates and more:

Jordan Strauss
Jordan Strauss
Lead, Strategy & Generative AI