How to simplify this regular expression to use in Google Analytics
up vote
1
down vote
favorite
Context: Google Analytics
Need: A filter that takes given a URI or a URN (yes a URN) it returns everything up to the querystring excluded.
As you can imagine there are multiples variations out of there, which I hope I have covered in full with the list below:
https://sub.domain.com/path/folder/article?l=en >> expected https://sub.domain.com/path/folder/article
https://sub.domain.com/path/folder/103#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103?#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103#?3173l=en
0sub.domain.tld >> expected sub.domain.tld
sub.domain.tld/ >> expected sub.domain.tld
sub.domain.tld?param=value >> expected sub.domain.tld
sub.domain.tld/?param=value >> expected sub.domain.tld
sub.domain.tld?param=value#id >> expected sub.domain.tld
sub.domain.tld/?param=value#id >> expected sub.domain.tld
sub.domain.tld/folder >> expected sub.domain.tld/folder
sub.domain.tld/folder/ >> expected sub.domain.tld/folder
sub.domain.tld/folder?param=value >> expected sub.domain.tld/folder
sub.domain.tld/folder/?param=value >> expected sub.domain.tld/folder
sub.domain.tld/1/folder >> expected sub.domain.tld/1/folder
sub.domain.tld/1/folder/ >> expected sub.domain.tld/1/folder
2sub.domain.tld/1/folder?param=value
3sub.domain.tld/1/folder/?param=value
4sub.domain.tld#id
5sub.domain.tld/#id
6sub.domain.tld/1#id
7sub.domain.tld/1/#id
The challenge I cannot solve is obtaining a regular expression that matches things in a subgroup that is always the same.
If you have to play around, I have saved a couple of tests in
- https://regex101.com/r/trZl06/1/
- https://regex101.com/r/SetgFn/2
The latter is quite satisfactory at capturing my cases, but as soon as a capturing-group is added in front of the existing matching condition, the group greps even words that are not expected.
I tried also something like ((.*)(?:[/]?.*)|(.*)(?:?.*))|((.*)/$|(.*)) but the resulting subgroups are always different making the reference in the filter view a bit of a mess.
Is there anything you can think of?
regex
|
show 1 more comment
up vote
1
down vote
favorite
Context: Google Analytics
Need: A filter that takes given a URI or a URN (yes a URN) it returns everything up to the querystring excluded.
As you can imagine there are multiples variations out of there, which I hope I have covered in full with the list below:
https://sub.domain.com/path/folder/article?l=en >> expected https://sub.domain.com/path/folder/article
https://sub.domain.com/path/folder/103#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103?#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103#?3173l=en
0sub.domain.tld >> expected sub.domain.tld
sub.domain.tld/ >> expected sub.domain.tld
sub.domain.tld?param=value >> expected sub.domain.tld
sub.domain.tld/?param=value >> expected sub.domain.tld
sub.domain.tld?param=value#id >> expected sub.domain.tld
sub.domain.tld/?param=value#id >> expected sub.domain.tld
sub.domain.tld/folder >> expected sub.domain.tld/folder
sub.domain.tld/folder/ >> expected sub.domain.tld/folder
sub.domain.tld/folder?param=value >> expected sub.domain.tld/folder
sub.domain.tld/folder/?param=value >> expected sub.domain.tld/folder
sub.domain.tld/1/folder >> expected sub.domain.tld/1/folder
sub.domain.tld/1/folder/ >> expected sub.domain.tld/1/folder
2sub.domain.tld/1/folder?param=value
3sub.domain.tld/1/folder/?param=value
4sub.domain.tld#id
5sub.domain.tld/#id
6sub.domain.tld/1#id
7sub.domain.tld/1/#id
The challenge I cannot solve is obtaining a regular expression that matches things in a subgroup that is always the same.
If you have to play around, I have saved a couple of tests in
- https://regex101.com/r/trZl06/1/
- https://regex101.com/r/SetgFn/2
The latter is quite satisfactory at capturing my cases, but as soon as a capturing-group is added in front of the existing matching condition, the group greps even words that are not expected.
I tried also something like ((.*)(?:[/]?.*)|(.*)(?:?.*))|((.*)/$|(.*)) but the resulting subgroups are always different making the reference in the filter view a bit of a mess.
Is there anything you can think of?
regex
Try^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?, see regex101.com/r/fyGAJc/1
– Wiktor Stribiżew
Nov 21 at 22:01
Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.
– Andrea Moro
Nov 22 at 6:06
The strange thing here is that the[/#]doesn't seem to catch the/. I tried to play around the permutations, but that doesn't make sense.
– Andrea Moro
Nov 22 at 7:00
1
Try regex101.com/r/fyGAJc/2
– Wiktor Stribiżew
Nov 22 at 7:59
I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.
– Andrea Moro
Nov 23 at 7:00
|
show 1 more comment
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Context: Google Analytics
Need: A filter that takes given a URI or a URN (yes a URN) it returns everything up to the querystring excluded.
As you can imagine there are multiples variations out of there, which I hope I have covered in full with the list below:
https://sub.domain.com/path/folder/article?l=en >> expected https://sub.domain.com/path/folder/article
https://sub.domain.com/path/folder/103#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103?#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103#?3173l=en
0sub.domain.tld >> expected sub.domain.tld
sub.domain.tld/ >> expected sub.domain.tld
sub.domain.tld?param=value >> expected sub.domain.tld
sub.domain.tld/?param=value >> expected sub.domain.tld
sub.domain.tld?param=value#id >> expected sub.domain.tld
sub.domain.tld/?param=value#id >> expected sub.domain.tld
sub.domain.tld/folder >> expected sub.domain.tld/folder
sub.domain.tld/folder/ >> expected sub.domain.tld/folder
sub.domain.tld/folder?param=value >> expected sub.domain.tld/folder
sub.domain.tld/folder/?param=value >> expected sub.domain.tld/folder
sub.domain.tld/1/folder >> expected sub.domain.tld/1/folder
sub.domain.tld/1/folder/ >> expected sub.domain.tld/1/folder
2sub.domain.tld/1/folder?param=value
3sub.domain.tld/1/folder/?param=value
4sub.domain.tld#id
5sub.domain.tld/#id
6sub.domain.tld/1#id
7sub.domain.tld/1/#id
The challenge I cannot solve is obtaining a regular expression that matches things in a subgroup that is always the same.
If you have to play around, I have saved a couple of tests in
- https://regex101.com/r/trZl06/1/
- https://regex101.com/r/SetgFn/2
The latter is quite satisfactory at capturing my cases, but as soon as a capturing-group is added in front of the existing matching condition, the group greps even words that are not expected.
I tried also something like ((.*)(?:[/]?.*)|(.*)(?:?.*))|((.*)/$|(.*)) but the resulting subgroups are always different making the reference in the filter view a bit of a mess.
Is there anything you can think of?
regex
Context: Google Analytics
Need: A filter that takes given a URI or a URN (yes a URN) it returns everything up to the querystring excluded.
As you can imagine there are multiples variations out of there, which I hope I have covered in full with the list below:
https://sub.domain.com/path/folder/article?l=en >> expected https://sub.domain.com/path/folder/article
https://sub.domain.com/path/folder/103#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103?#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103#?3173l=en
0sub.domain.tld >> expected sub.domain.tld
sub.domain.tld/ >> expected sub.domain.tld
sub.domain.tld?param=value >> expected sub.domain.tld
sub.domain.tld/?param=value >> expected sub.domain.tld
sub.domain.tld?param=value#id >> expected sub.domain.tld
sub.domain.tld/?param=value#id >> expected sub.domain.tld
sub.domain.tld/folder >> expected sub.domain.tld/folder
sub.domain.tld/folder/ >> expected sub.domain.tld/folder
sub.domain.tld/folder?param=value >> expected sub.domain.tld/folder
sub.domain.tld/folder/?param=value >> expected sub.domain.tld/folder
sub.domain.tld/1/folder >> expected sub.domain.tld/1/folder
sub.domain.tld/1/folder/ >> expected sub.domain.tld/1/folder
2sub.domain.tld/1/folder?param=value
3sub.domain.tld/1/folder/?param=value
4sub.domain.tld#id
5sub.domain.tld/#id
6sub.domain.tld/1#id
7sub.domain.tld/1/#id
The challenge I cannot solve is obtaining a regular expression that matches things in a subgroup that is always the same.
If you have to play around, I have saved a couple of tests in
- https://regex101.com/r/trZl06/1/
- https://regex101.com/r/SetgFn/2
The latter is quite satisfactory at capturing my cases, but as soon as a capturing-group is added in front of the existing matching condition, the group greps even words that are not expected.
I tried also something like ((.*)(?:[/]?.*)|(.*)(?:?.*))|((.*)/$|(.*)) but the resulting subgroups are always different making the reference in the filter view a bit of a mess.
Is there anything you can think of?
regex
regex
asked Nov 21 at 21:54
Andrea Moro
117316
117316
Try^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?, see regex101.com/r/fyGAJc/1
– Wiktor Stribiżew
Nov 21 at 22:01
Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.
– Andrea Moro
Nov 22 at 6:06
The strange thing here is that the[/#]doesn't seem to catch the/. I tried to play around the permutations, but that doesn't make sense.
– Andrea Moro
Nov 22 at 7:00
1
Try regex101.com/r/fyGAJc/2
– Wiktor Stribiżew
Nov 22 at 7:59
I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.
– Andrea Moro
Nov 23 at 7:00
|
show 1 more comment
Try^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?, see regex101.com/r/fyGAJc/1
– Wiktor Stribiżew
Nov 21 at 22:01
Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.
– Andrea Moro
Nov 22 at 6:06
The strange thing here is that the[/#]doesn't seem to catch the/. I tried to play around the permutations, but that doesn't make sense.
– Andrea Moro
Nov 22 at 7:00
1
Try regex101.com/r/fyGAJc/2
– Wiktor Stribiżew
Nov 22 at 7:59
I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.
– Andrea Moro
Nov 23 at 7:00
Try
^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?, see regex101.com/r/fyGAJc/1– Wiktor Stribiżew
Nov 21 at 22:01
Try
^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?, see regex101.com/r/fyGAJc/1– Wiktor Stribiżew
Nov 21 at 22:01
Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.
– Andrea Moro
Nov 22 at 6:06
Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.
– Andrea Moro
Nov 22 at 6:06
The strange thing here is that the
[/#] doesn't seem to catch the /. I tried to play around the permutations, but that doesn't make sense.– Andrea Moro
Nov 22 at 7:00
The strange thing here is that the
[/#] doesn't seem to catch the /. I tried to play around the permutations, but that doesn't make sense.– Andrea Moro
Nov 22 at 7:00
1
1
Try regex101.com/r/fyGAJc/2
– Wiktor Stribiżew
Nov 22 at 7:59
Try regex101.com/r/fyGAJc/2
– Wiktor Stribiżew
Nov 22 at 7:59
I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.
– Andrea Moro
Nov 23 at 7:00
I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.
– Andrea Moro
Nov 23 at 7:00
|
show 1 more comment
1 Answer
1
active
oldest
votes
up vote
0
down vote
accepted
You may use
^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$
See the regex demo.
Details
^- start of string
([^#?]*?)- Group 1: 0 or more chars other than#and?, as few as possible
([/?#]??.*|[/#]?#.*)?- an optional Group 2: either of the two:
[/?#]??.*- an optional/,?or#followed with a?char and then the rest of the string
|- or
[/#]?#.*- an optional/or#followed with a#char and then the rest of the string
(/?)- Group 3: an optional/
$- end of string.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
You may use
^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$
See the regex demo.
Details
^- start of string
([^#?]*?)- Group 1: 0 or more chars other than#and?, as few as possible
([/?#]??.*|[/#]?#.*)?- an optional Group 2: either of the two:
[/?#]??.*- an optional/,?or#followed with a?char and then the rest of the string
|- or
[/#]?#.*- an optional/or#followed with a#char and then the rest of the string
(/?)- Group 3: an optional/
$- end of string.
add a comment |
up vote
0
down vote
accepted
You may use
^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$
See the regex demo.
Details
^- start of string
([^#?]*?)- Group 1: 0 or more chars other than#and?, as few as possible
([/?#]??.*|[/#]?#.*)?- an optional Group 2: either of the two:
[/?#]??.*- an optional/,?or#followed with a?char and then the rest of the string
|- or
[/#]?#.*- an optional/or#followed with a#char and then the rest of the string
(/?)- Group 3: an optional/
$- end of string.
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
You may use
^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$
See the regex demo.
Details
^- start of string
([^#?]*?)- Group 1: 0 or more chars other than#and?, as few as possible
([/?#]??.*|[/#]?#.*)?- an optional Group 2: either of the two:
[/?#]??.*- an optional/,?or#followed with a?char and then the rest of the string
|- or
[/#]?#.*- an optional/or#followed with a#char and then the rest of the string
(/?)- Group 3: an optional/
$- end of string.
You may use
^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$
See the regex demo.
Details
^- start of string
([^#?]*?)- Group 1: 0 or more chars other than#and?, as few as possible
([/?#]??.*|[/#]?#.*)?- an optional Group 2: either of the two:
[/?#]??.*- an optional/,?or#followed with a?char and then the rest of the string
|- or
[/#]?#.*- an optional/or#followed with a#char and then the rest of the string
(/?)- Group 3: an optional/
$- end of string.
answered Nov 23 at 7:51
Wiktor Stribiżew
305k16124201
305k16124201
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420990%2fhow-to-simplify-this-regular-expression-to-use-in-google-analytics%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Try
^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?, see regex101.com/r/fyGAJc/1– Wiktor Stribiżew
Nov 21 at 22:01
Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.
– Andrea Moro
Nov 22 at 6:06
The strange thing here is that the
[/#]doesn't seem to catch the/. I tried to play around the permutations, but that doesn't make sense.– Andrea Moro
Nov 22 at 7:00
1
Try regex101.com/r/fyGAJc/2
– Wiktor Stribiżew
Nov 22 at 7:59
I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.
– Andrea Moro
Nov 23 at 7:00