The beginner's guide to regular expressions for Google Analytics.
Online advertisement requires marketers to pull, aggregate and analyze huge amount of data in order to enable organizations to take data-driven decisions. That is where regular expressions (known as RegEx) come into play. This digital language simply allows sophisticated pattern matching. In other words, with RegEx you can specify which data you want to exclude or include. Regular expressions narrow data and make it more specific.
Most of us are already using those expressions without even noticing it. For example, the default filter box in Google analytics accepts RegEx.
Imagine you are in the traffic sources report in GA and you want to have a look at the results that have "google" OR "bing" as Source/Medium. RegEx can save a lot of your time. By simply adding | (the so called pipe) in the filter box, I get all the results that include google or bing as Source/Medium. The pipe acts as OR statement in the RegEx world.
Another concrete application of RegEx with Analytics is the following: you want to see how a campaign performs in only two cities: Berlin OR Munich. You can then use the following regular expression Berlin|Munich.
Let's have a look at some regex metacharacters.
Dot . : it matches any one character - letter, symbol and character.
Question mark ? : it matches the preceding character 0 or 1 times.
Plus + : it matches the preceding character 1 or more times.
Asterisk * : it matches the preceding character 0 or more times.
Pipe | : it creates an OR match
Caret ^ : it matches the adjacent characters at the beginning of a string
Dollar $ : it matches the adjacent characters at the end of a string
Brackets ( ) : it matches the enclosed characters in exact order anywhere in a string
Square brackets [ ] : it matches the enclosed characters in any order anywhere in a string
Minus - : it creates a range of characters within brackets to match anywhere in a string
Bachslash \ : it indicates that the adjacent character should be interpreted literally rather than as a regex metacharacter.
Comments