Development

September 15, 2022

How to Fix Your LookML Project Structure

Kenny Ning

Freelance Analytics Engineer

A picture of spanish architecture by Fausto Garcia.

Looker projects are getting out of control.

It wasn't always like this. Looker usually starts out as an analyst's paradise—a single model, a few developers, a handful of Explores. Everyone is self-serving insights. Life is good. But as the months roll by, the team grows, access expands, access grants, model sets, dashboards, and datagroups proliferate, and yikes it's been 8 months since Phil left the company and all 23 of his one-off Explores are still there.

If you ask a seasoned LookML developer about the state of their Looker project, they might sigh, grit their teeth, and clench their fists. They might stare sadly into the distance as scenes from yesterday’s workday flip through their mind’s eye like slides on a Kodak Carousel…

10:00am - "Looks like we need to update the ratings Explore with a new column called thumbs_up. Should be simple enough to re-run Create View From Table to pick up the new schema changes."
10:39am - "Hmm, why is ratings.view.lkml 956 lines long? Oh, it looks like there are hundreds of custom fields defined here too. Okay, I guess I’ll manually add this new field to the bottom."
11:09am - "Okay, I’ve added the column to the view file, but it isn’t showing up in the ratings Explore for some reason..."
11:27am - "Ah, it looks like the field has to be manually added to the Explore's fields parameter. Whoa! This model file is 279 lines long and has 35 Explores? And they’re not sorted?!"
11:44am - "I have a merge conflict already? Oh, looks like someone recently re-ordered the Explores in this file, so I’ll have to incorporate those in."
12:15pm - "Should be good now. Let’s see what’s next in the backlog... I have to add a thumbs_down column?!"

Sound familiar?

Here's the problem: two development workflow issues plague every data team writing LookML today:

Keeping Looker views and table schemas in sync: Every schema change requires manual updates to long view files that never stop growing. For the many teams who point Looker views to dbt models, schema changes are even more common and cumbersome.
Tracing how data flows from a view to an explore. Interrelated business logic is scattered across multiple files, demanding frequent context switching. Unrelated business logic is crowded into a few really long files, meaning developers frequently run into annoying merge conflicts with teams working on separate changes.

As a freelance analytics engineer, I've helped startups establish best practices for Looker. At Better.com, I scaled Looker from 0 to 100 developers with 500+ weekly active users. And the team at Spectacles, who I collaborated with on this post, have 10+ years of combined experience administering Looker for dozens of companies. We've seen these problems firsthand and have spent a lot of time thinking of possible solutions.

I've developed a Looker project structure that I believe solves these problems. Adopting this efficient approach will help your team gracefully handle schema changes and improve your project's readability.

Structure overview

To make this explanation concrete, I'll walk through a sample Looker project modeling the popular, public IMDB ratings dataset. To keep things interesting, we'll use the data to make "Frankenmovies": mashups of overlapping movie titles (like "American Beauty and the Beast" or "Men in Black Panther").

The full project example is on GitHub if you'd like to follow along.

A huge thanks to Looker's Fabio Beltramini for his post and talk on refinements (which will be referenced below), as well as dbt’s project structuring guide (the updated one and the original), both of which greatly inspired me to write this.

To get started, let's take a look at our project. It contains:

One view file for each raw table and derived table
One model file containing all Explore definitions

models/
  imdb.model

views/
  frankenmovies.view.lkml  # derived table
  ratings.view.lkml  # raw table
  basics.view.lkml  # raw table

...

Now, consider a new project structure with layers, where each layer adds an incremental piece of logic onto the previous layer:

Base layer: machine-generated LookML views referencing source tables
Standard layer: hand-written changes that add new fields or modify base layer fields
Logical layer: Explore definitions and joins across different views

Here’s how we’d represent a table named ratings in this layered approach:

Graphic showing the different code layers in a LookML project. The layers are the base layer, standard layer and logical layer.

And here’s how we would organize the files in our Looker project:

_base/ # machine-generated
  frankenmovies/imdb/  # database/schema where the raw table resides
  basics.view.lkml
  ratings.view.lkml

_standard/ # hand-written refinements
  frankenmovies/imdb/  # database/schema where the raw table resides
  basics.layer.lkml
  ratings.layer.lkml

logical/ # explores
  frankenmovies.explore.lkml
  ratings.explore.lkml

imdb.model

Let's drill into each of these sections in more detail to demonstrate how our layered structure handles schema changes better and improves readability.

Base layer

The base layer is the layer of machine-generated LookML views that makes it simple to respond to upstream data schema changes. This layer doesn't contain any hand-editing or business logic.

Here’s how the ratings.view.lkml (view on GitHub) would differ in a traditional and layered approach. Here's the traditional view:

-- traditional
# mix of machine-generated and hand-written
view: ratings {
  sql_table_name: frankenmovies.imdb.ratings ;;
  label: "Ratings"

  dimension: title_id {
    type: string
    primary_key: yes
    sql: ${TABLE}.tconst ;;
  }

  dimension: title_average_rating {
    type: number
    label: "Rating"
    sql: ${TABLE}.averageRating ;;
  }

  dimension: title_number_of_votes {
    type: number
    hidden: yes
    sql: ${TABLE}.numVotes ;;
  }

  measure: number_of_votes {
    type: sum
    sql: ${title_number_of_votes} ;;
  }

  measure: average_rating {
    type: number
    value_format_name: decimal_1
    sql: sum(${title_average_rating} * ${title_number_of_votes}) / ${number_of_votes} ;;
  }

}

And here's the view in our layered design:

-- layered
# only machine-generated
view: ratings {
  sql_table_name: frankenmovies.imdb.ratings ;;

  dimension: tconst {
    type: string
    sql: ${TABLE}.tconst ;;
  }

  dimension: average_rating {
    type: number
    sql: ${TABLE}.averageRating ;;
  }

  dimension: num_votes {
    type: number
    sql: ${TABLE}.numVotes ;;
  }

  measure: count {
    type: count
    drill_fields: [detail*]
  }

}

Think about what happens in the traditional structure when a new column is added to the ratings table. Because the ratings view has a mix of machine-generated and hand-written changes, you need to manually code each new field addition.

With an isolated base layer, adding this column is a single button click: re-run the Create View From Table LookML generator to get the latest changes. For some databases, Create View From Table will even pull in column-level descriptions into the view definition, so you’ll even get updated documentation for free with this approach.

Folders

Organize your base folders to mirror how the source data resides in your database, and generate one .view.lkml file per raw table.

For example, if the ratings and titles tables both belong to the frankenmovies database and imdb schema, organize them as follows:

_base/
  frankenmovies/
    imdb/
      ratings.view.lkml

The leading underscore ensures this layer is sorted before the other layers.

This folder hierarchy makes it clear which data powers your Looker models. This is especially helpful if you're referencing tables across different schemas and databases.

Standard layer

The standard layer is made up of hand-written LookML refinements that preserve your column changes when you update the base layer.

Here’s how the ratings table would be represented in the standard layer in our IMDB example (view on GitHub):


include: "/_base/analytics/imdb/ratings.view.lkml"

view: +ratings {
  label: "Ratings"

  dimension: title_id {
    primary_key: yes
    sql: ${tconst} ;;
  }

  dimension: title_average_rating {
    label: "Rating"
    sql: ${average_rating} ;;
  }

  dimension: title_number_of_votes {
    hidden: yes
    sql: ${num_votes} ;;
  }

  measure: number_of_votes {
    type: sum
    sql: ${title_number_of_votes} ;;
  }

  measure: overall_average_rating {
    type: number
    value_format_name: decimal_1
    sql: sum(${title_average_rating} * ${title_number_of_votes}) / ${number_of_votes} ;;
  }
}

The standard layer makes use of an underrated feature called refinements (notice the + symbol in the view definition above). Refinements are similar to extends. Extends make a copy of the extended view file, but refinements make the edits in place. For low-level changes that you’re comfortable applying across your entire project, refinements are a more concise way to add business logic to a view without editing the base file.

Some examples of business logic you should add to the standard layer include:

Renaming columns to be more human-readable
Defining primary keys or composite primary keys
Adding descriptions and labels
Adding measures

Folders

Similar to the base layer above, organize your standard layer folders according to your database structure, and create one .layer.lkml file per raw table.

_standard/
  analytics/
    imdb/
      ratings.layer.lkml

The leading underscore ensures that the standard layer is sorted in the sidebar so it follows the base layer, but precedes other layers.

The new .layer file extension might seem strange at first glance, but choosing "Create Generic LookML File" will allow you to create a LookML file with any arbitrary file extension before .lkml. Adding this extension clarifies that the file contains hand-written code and not machine-generated code.

To recap, for each table in your database there should be one machine-generated base view called <table>.view.lkml and one hand-written standard layer called <table>.layer.lkml.

Logical layer

The logical layer is the layer of Explore definitions based on standard layer views. Here’s how the ratings Explore would be represented in the logical layer in our IMDB example (view on GitHub):

include: "/_standard/frankenmovies/imdb/basics.layer.lkml"
include: "/_standard/frankenmovies/imdb/ratings.layer.lkml"

explore: ratings {
  join: basics {
    relationship: one_to_one
    sql_on: ${ratings.title_id} = ${basics.id} ;;
    type: left_outer
  }
}

Note that the include statement references the standard layer so that we get the refined fields as well as the base fields.

The .explore extension clarifies that this file contains an Explore definition. Separating each explore into its own file solves the problem of frequent merge conflicts among developers working on different Explores simultaneously.

We can also use logical layers to define custom views in the same file as Explores that reference them. For example, in the IMDB project, we want to create “Frankenmovie” names, where we take the first half of one movie title and mash it up with the second half of another movie title. For example, “American Beauty and the Beast” is a “Frankenmovie” title for the two movies “American Beauty” and “Beauty and the Beast”. This requires some messy SQL that we’ll represent as a derived table in a new Explore called frankenmovies.explore.lkml (view on GitHub):

-- layered
include: "/logical/ratings.explore.lkml"

view: frankenmovies_pdt {
  derived_table: {
    persist_for: "24 hours"
    sql:

      with joined as (

        select
          basics.tconst,
          basics.primaryTitle,
          ratings.averageRating,
          ratings.numVotes,
          row_number() over (order by ratings.numVotes desc) as vote_rank
        from frankenmovies.{{_user_attributes['dataset_location']}}.basics
        inner join frankenmovies.{{_user_attributes['dataset_location']}}.ratings
          on basics.tconst = ratings.tconst
        where basics.titleType = 'movie'
          and length(basics.primaryTitle) != length(replace(basics.primaryTitle,' ',''))

      ), splitting as (

        select
          *,
          split(primaryTitle, ' ') as words,
        from joined

      ), words_split as (

        select
          *,
          words[offset(0)] as first_word,
          words[offset(array_length(words)-1)] as last_word
        from splitting

      ), unnested as (

        select
          * except (words),
          row_number() over (partition by tconst) as r
        from words_split, unnest(words) as word

      ), remove_word as (

        select
          tconst,
          primaryTitle,
          averageRating,
          numVotes,
          first_word,
          last_word,
          string_agg(word, ' ') as missing_first_word
        from unnested
        where r > 1
        group by 1,2,3,4,5,6

      ), self_joined as (

        select
          case when a.last_word = b.first_word then a.primaryTitle else b.primaryTitle end as first_movie_name,
          case when b.last_word = a.first_word then a.primaryTitle else b.primaryTitle end as second_movie_name,
          case when a.last_word = b.first_word then a.tconst else b.tconst end as first_movie_id,
          case when b.last_word = a.first_word then a.tconst else b.tconst end as second_movie_id,
          a.numVotes * b.numVotes as frankenmovie_score,
          case
            when a.first_word = b.last_word then b.primaryTitle || ' ' || a.missing_first_word
            when b.first_word = a.last_word then a.primaryTitle || ' ' || b.missing_first_word
          end as frankenmovie_name
        from remove_word a
        inner join remove_word b
          on (a.first_word = b.last_word
          or b.first_word = a.last_word)
          and a.primaryTitle != b.primaryTitle

      )

      select distinct *
      from self_joined
       ;;
  }

  dimension: first_movie_id {
    type: string
    hidden: yes
    sql: ${TABLE}.first_movie_id ;;
  }
  dimension: second_movie_id {
    type: string
    hidden: yes
    sql: ${TABLE}.second_movie_id ;;
  }
  dimension: frankenmovie_score {
    type: number
    sql: ${TABLE}.frankenmovie_score ;;
  }
  dimension: frankenmovie_name {
    type: string
    sql: ${TABLE}.frankenmovie_name ;;
  }
}

view: ratings_entity {
  derived_table: {
    explore_source: ratings {
      column: id { field: basics.id }
      column: title_average_rating {}
    }
  }
  dimension: id {
    description: ""
  }
  dimension: title_average_rating {
    description: ""
    type: number
  }
}

explore: frankenmovies_pdt {
  label: "Frankenmovies"
  view_label: "Frankenmovies"

  join: first_movie {
    from: ratings_entity
    view_label: "First Movie"
    relationship: many_to_one
    sql_on: ${frankenmovies_pdt.first_movie_id} = ${first_movie.id} ;;
  }

  join: second_movie {
    from: ratings_entity
    view_label: "Second Movie"
    relationship: many_to_one
    sql_on: ${frankenmovies_pdt.second_movie_id} = ${second_movie.id} ;;
  }
}

We're mixing view and Explore definitions in the same file. This might feel foreign if you're used to thinking of views and Explores separately. For complex logic like above, it can be very helpful to see how the derived table flows into an Explore without having to jump between different files.

We’re also re-using the ratings Explore defined earlier by including it and turning it into a view via a native derived table. This keeps our code concise and DRY since the relationship between titles and ratings only needs to be defined once in the ratings Explore.

What belongs in the standard layer vs. the logical layer?

The line here can be a bit blurry, but generally you should put low-level, global changes like renaming columns, hiding columns, adding descriptions, and simple measures, into the standard layer and more complex changes in the logical layer (e.g. cross-view calculations, which we’ll explain next).

Where should we define “cross-view calculations”

Cross-view calculations are fields that use fields from more than one view file. In the traditional setup, you’d pick one of those view files and define it there with an include statement for the other view. But this will often throw warnings and/or errors when you re-use this view file in an Explore that doesn’t have the other views joined in.

The solution to this problem is Looker’s best unofficially documented feature, called field-only views (FOVs). FOVs are view definitions without a sql_table_name parameter. They're a great tool for defining cross-view calculations.

In our IMDB example, we want to calculate an average number of votes per title. This new measure references the number_of_votes measure in the ratings view and the count measure in the basics view.

To create this cross-view calculation, you’d define a field-only view at the top of ratings.explore.lkml (notice there’s no sql_table_name argument) and then join it into the Explore with an empty sql: ;; parameter:

view: fov_example {
  measure: avg_votes_per_title {
    sql: ${ratings.number_of_votes} / ${basics.count} ;;
    type: number
  }
}

explore: ratings {
  join: basics {
    relationship: one_to_one
    sql_on: ${ratings.title_id} = ${basics.id} ;;
    type: left_outer
  }

  join: fov_example {
    relationship: one_to_one
    sql: ;; # leave blank
    type: left_outer
  }
}

It’s a strange syntax, but field-only views can help you define cross-view calculations cleanly and only where they’re needed.

Model files

Now that each Explore is defined in its own file, the model file should be very lightweight. In this design, model files only include project-level configuration like the connection, caching, and access grants.

Here’s how the traditional model compares to the layered model file (view on GitHub). Here's the traditional model:

-- traditional
# models/imdb.model
connection: "spectacles_learn_bigquery"

include: "/views/imdb/*.view.lkml"

label: "IMDB"

explore: titles {}

explore: ratings {
  join: titles {
    relationship: one_to_one
    sql_on: ${ratings.title_id} = ${titles.id} ;;
  }
}

explore: frankenmovies_pdt {
  label: "Frankenmovies"
  view_label: "Frankenmovies"

  join: first_movie {
    from: titles
    view_label: "First Movie"
    relationship: many_to_one
    sql_on: ${frankenmovies_pdt.first_movie_id} = ${first_movie.id} ;;
  }

  join: first_movie_ratings {
    from: ratings
    view_label: "First Movie"
    relationship: one_to_one
    sql_on: ${first_movie.id} = ${first_movie_ratings.title_id} ;;
  }

  join: second_movie {
    from: titles
    view_label: "Second Movie"
    relationship: many_to_one
    sql_on: ${frankenmovies_pdt.second_movie_id} = ${second_movie.id} ;;
  }

  join: second_movie_ratings {
    from: ratings
    view_label: "Second Movie"
    relationship: one_to_one
    sql_on: ${second_movie.id} = ${second_movie_ratings.title_id} ;;
  }
}

And here's the model in our layered design:

-- layered
# imdb.model
connection: "spectacles_learn_bigquery"

include: "/logical/*.explore.lkml"

label: "IMDB"

Final thoughts

If you’re manually updating your view files to respond to schema changes every day, then this layered structure is a no-brainer for you in terms of time saved. Additionally, the waterfall layering of base ⇒ standard ⇒ logical effectively abstracts away details until you need them, improving the readability of your project and clarifying where to start for new developers.

We think all new Looker projects should be built this way, but for existing Looker users still on the traditional structure, there will be some switching costs. I did this migration for a client fairly early on in their Looker journey, and it took me roughly ten hours to complete. Depending on how big of a problem schema changes and code readability are for you, this investment of time could be well worth it.

While there’s no objectively “perfect” project structure for every organization, our hope is that a design like this can become a standard over time, reducing decision fatigue and improving onboarding for analysts everywhere.

‍

How to Fix Your LookML Project Structure

Structure overview

Base layer

Folders

Standard layer

Folders

Logical layer

What belongs in the standard layer vs. the logical layer?

Where should we define “cross-view calculations”

Model files

Final thoughts

Content Overload: How to Halve Your Unused Content Using Spectacles

Under the Hood: Understanding Looker's System Activity Model