How to simplify this regular expression to use in Google Analytics











up vote
1
down vote

favorite












Context: Google Analytics



Need: A filter that takes given a URI or a URN (yes a URN) it returns everything up to the querystring excluded.



As you can imagine there are multiples variations out of there, which I hope I have covered in full with the list below:



https://sub.domain.com/path/folder/article?l=en >> expected     https://sub.domain.com/path/folder/article
https://sub.domain.com/path/folder/103#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103?#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103#?3173l=en
0sub.domain.tld >> expected sub.domain.tld
sub.domain.tld/ >> expected sub.domain.tld
sub.domain.tld?param=value >> expected sub.domain.tld
sub.domain.tld/?param=value >> expected sub.domain.tld
sub.domain.tld?param=value#id >> expected sub.domain.tld
sub.domain.tld/?param=value#id >> expected sub.domain.tld
sub.domain.tld/folder >> expected sub.domain.tld/folder
sub.domain.tld/folder/ >> expected sub.domain.tld/folder
sub.domain.tld/folder?param=value >> expected sub.domain.tld/folder
sub.domain.tld/folder/?param=value >> expected sub.domain.tld/folder
sub.domain.tld/1/folder >> expected sub.domain.tld/1/folder
sub.domain.tld/1/folder/ >> expected sub.domain.tld/1/folder
2sub.domain.tld/1/folder?param=value
3sub.domain.tld/1/folder/?param=value
4sub.domain.tld#id
5sub.domain.tld/#id
6sub.domain.tld/1#id
7sub.domain.tld/1/#id


The challenge I cannot solve is obtaining a regular expression that matches things in a subgroup that is always the same.



If you have to play around, I have saved a couple of tests in
- https://regex101.com/r/trZl06/1/
- https://regex101.com/r/SetgFn/2



The latter is quite satisfactory at capturing my cases, but as soon as a capturing-group is added in front of the existing matching condition, the group greps even words that are not expected.



I tried also something like ((.*)(?:[/]?.*)|(.*)(?:?.*))|((.*)/$|(.*)) but the resulting subgroups are always different making the reference in the filter view a bit of a mess.



Is there anything you can think of?










share|improve this question






















  • Try ^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?, see regex101.com/r/fyGAJc/1
    – Wiktor Stribiżew
    Nov 21 at 22:01










  • Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.
    – Andrea Moro
    Nov 22 at 6:06










  • The strange thing here is that the [/#] doesn't seem to catch the /. I tried to play around the permutations, but that doesn't make sense.
    – Andrea Moro
    Nov 22 at 7:00






  • 1




    Try regex101.com/r/fyGAJc/2
    – Wiktor Stribiżew
    Nov 22 at 7:59










  • I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.
    – Andrea Moro
    Nov 23 at 7:00















up vote
1
down vote

favorite












Context: Google Analytics



Need: A filter that takes given a URI or a URN (yes a URN) it returns everything up to the querystring excluded.



As you can imagine there are multiples variations out of there, which I hope I have covered in full with the list below:



https://sub.domain.com/path/folder/article?l=en >> expected     https://sub.domain.com/path/folder/article
https://sub.domain.com/path/folder/103#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103?#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103#?3173l=en
0sub.domain.tld >> expected sub.domain.tld
sub.domain.tld/ >> expected sub.domain.tld
sub.domain.tld?param=value >> expected sub.domain.tld
sub.domain.tld/?param=value >> expected sub.domain.tld
sub.domain.tld?param=value#id >> expected sub.domain.tld
sub.domain.tld/?param=value#id >> expected sub.domain.tld
sub.domain.tld/folder >> expected sub.domain.tld/folder
sub.domain.tld/folder/ >> expected sub.domain.tld/folder
sub.domain.tld/folder?param=value >> expected sub.domain.tld/folder
sub.domain.tld/folder/?param=value >> expected sub.domain.tld/folder
sub.domain.tld/1/folder >> expected sub.domain.tld/1/folder
sub.domain.tld/1/folder/ >> expected sub.domain.tld/1/folder
2sub.domain.tld/1/folder?param=value
3sub.domain.tld/1/folder/?param=value
4sub.domain.tld#id
5sub.domain.tld/#id
6sub.domain.tld/1#id
7sub.domain.tld/1/#id


The challenge I cannot solve is obtaining a regular expression that matches things in a subgroup that is always the same.



If you have to play around, I have saved a couple of tests in
- https://regex101.com/r/trZl06/1/
- https://regex101.com/r/SetgFn/2



The latter is quite satisfactory at capturing my cases, but as soon as a capturing-group is added in front of the existing matching condition, the group greps even words that are not expected.



I tried also something like ((.*)(?:[/]?.*)|(.*)(?:?.*))|((.*)/$|(.*)) but the resulting subgroups are always different making the reference in the filter view a bit of a mess.



Is there anything you can think of?










share|improve this question






















  • Try ^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?, see regex101.com/r/fyGAJc/1
    – Wiktor Stribiżew
    Nov 21 at 22:01










  • Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.
    – Andrea Moro
    Nov 22 at 6:06










  • The strange thing here is that the [/#] doesn't seem to catch the /. I tried to play around the permutations, but that doesn't make sense.
    – Andrea Moro
    Nov 22 at 7:00






  • 1




    Try regex101.com/r/fyGAJc/2
    – Wiktor Stribiżew
    Nov 22 at 7:59










  • I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.
    – Andrea Moro
    Nov 23 at 7:00













up vote
1
down vote

favorite









up vote
1
down vote

favorite











Context: Google Analytics



Need: A filter that takes given a URI or a URN (yes a URN) it returns everything up to the querystring excluded.



As you can imagine there are multiples variations out of there, which I hope I have covered in full with the list below:



https://sub.domain.com/path/folder/article?l=en >> expected     https://sub.domain.com/path/folder/article
https://sub.domain.com/path/folder/103#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103?#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103#?3173l=en
0sub.domain.tld >> expected sub.domain.tld
sub.domain.tld/ >> expected sub.domain.tld
sub.domain.tld?param=value >> expected sub.domain.tld
sub.domain.tld/?param=value >> expected sub.domain.tld
sub.domain.tld?param=value#id >> expected sub.domain.tld
sub.domain.tld/?param=value#id >> expected sub.domain.tld
sub.domain.tld/folder >> expected sub.domain.tld/folder
sub.domain.tld/folder/ >> expected sub.domain.tld/folder
sub.domain.tld/folder?param=value >> expected sub.domain.tld/folder
sub.domain.tld/folder/?param=value >> expected sub.domain.tld/folder
sub.domain.tld/1/folder >> expected sub.domain.tld/1/folder
sub.domain.tld/1/folder/ >> expected sub.domain.tld/1/folder
2sub.domain.tld/1/folder?param=value
3sub.domain.tld/1/folder/?param=value
4sub.domain.tld#id
5sub.domain.tld/#id
6sub.domain.tld/1#id
7sub.domain.tld/1/#id


The challenge I cannot solve is obtaining a regular expression that matches things in a subgroup that is always the same.



If you have to play around, I have saved a couple of tests in
- https://regex101.com/r/trZl06/1/
- https://regex101.com/r/SetgFn/2



The latter is quite satisfactory at capturing my cases, but as soon as a capturing-group is added in front of the existing matching condition, the group greps even words that are not expected.



I tried also something like ((.*)(?:[/]?.*)|(.*)(?:?.*))|((.*)/$|(.*)) but the resulting subgroups are always different making the reference in the filter view a bit of a mess.



Is there anything you can think of?










share|improve this question













Context: Google Analytics



Need: A filter that takes given a URI or a URN (yes a URN) it returns everything up to the querystring excluded.



As you can imagine there are multiples variations out of there, which I hope I have covered in full with the list below:



https://sub.domain.com/path/folder/article?l=en >> expected     https://sub.domain.com/path/folder/article
https://sub.domain.com/path/folder/103#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103?#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103#?3173l=en
0sub.domain.tld >> expected sub.domain.tld
sub.domain.tld/ >> expected sub.domain.tld
sub.domain.tld?param=value >> expected sub.domain.tld
sub.domain.tld/?param=value >> expected sub.domain.tld
sub.domain.tld?param=value#id >> expected sub.domain.tld
sub.domain.tld/?param=value#id >> expected sub.domain.tld
sub.domain.tld/folder >> expected sub.domain.tld/folder
sub.domain.tld/folder/ >> expected sub.domain.tld/folder
sub.domain.tld/folder?param=value >> expected sub.domain.tld/folder
sub.domain.tld/folder/?param=value >> expected sub.domain.tld/folder
sub.domain.tld/1/folder >> expected sub.domain.tld/1/folder
sub.domain.tld/1/folder/ >> expected sub.domain.tld/1/folder
2sub.domain.tld/1/folder?param=value
3sub.domain.tld/1/folder/?param=value
4sub.domain.tld#id
5sub.domain.tld/#id
6sub.domain.tld/1#id
7sub.domain.tld/1/#id


The challenge I cannot solve is obtaining a regular expression that matches things in a subgroup that is always the same.



If you have to play around, I have saved a couple of tests in
- https://regex101.com/r/trZl06/1/
- https://regex101.com/r/SetgFn/2



The latter is quite satisfactory at capturing my cases, but as soon as a capturing-group is added in front of the existing matching condition, the group greps even words that are not expected.



I tried also something like ((.*)(?:[/]?.*)|(.*)(?:?.*))|((.*)/$|(.*)) but the resulting subgroups are always different making the reference in the filter view a bit of a mess.



Is there anything you can think of?







regex google-analytics






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 21 at 21:54









Andrea Moro

117316




117316












  • Try ^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?, see regex101.com/r/fyGAJc/1
    – Wiktor Stribiżew
    Nov 21 at 22:01










  • Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.
    – Andrea Moro
    Nov 22 at 6:06










  • The strange thing here is that the [/#] doesn't seem to catch the /. I tried to play around the permutations, but that doesn't make sense.
    – Andrea Moro
    Nov 22 at 7:00






  • 1




    Try regex101.com/r/fyGAJc/2
    – Wiktor Stribiżew
    Nov 22 at 7:59










  • I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.
    – Andrea Moro
    Nov 23 at 7:00


















  • Try ^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?, see regex101.com/r/fyGAJc/1
    – Wiktor Stribiżew
    Nov 21 at 22:01










  • Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.
    – Andrea Moro
    Nov 22 at 6:06










  • The strange thing here is that the [/#] doesn't seem to catch the /. I tried to play around the permutations, but that doesn't make sense.
    – Andrea Moro
    Nov 22 at 7:00






  • 1




    Try regex101.com/r/fyGAJc/2
    – Wiktor Stribiżew
    Nov 22 at 7:59










  • I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.
    – Andrea Moro
    Nov 23 at 7:00
















Try ^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?, see regex101.com/r/fyGAJc/1
– Wiktor Stribiżew
Nov 21 at 22:01




Try ^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?, see regex101.com/r/fyGAJc/1
– Wiktor Stribiżew
Nov 21 at 22:01












Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.
– Andrea Moro
Nov 22 at 6:06




Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.
– Andrea Moro
Nov 22 at 6:06












The strange thing here is that the [/#] doesn't seem to catch the /. I tried to play around the permutations, but that doesn't make sense.
– Andrea Moro
Nov 22 at 7:00




The strange thing here is that the [/#] doesn't seem to catch the /. I tried to play around the permutations, but that doesn't make sense.
– Andrea Moro
Nov 22 at 7:00




1




1




Try regex101.com/r/fyGAJc/2
– Wiktor Stribiżew
Nov 22 at 7:59




Try regex101.com/r/fyGAJc/2
– Wiktor Stribiżew
Nov 22 at 7:59












I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.
– Andrea Moro
Nov 23 at 7:00




I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.
– Andrea Moro
Nov 23 at 7:00












1 Answer
1






active

oldest

votes

















up vote
0
down vote



accepted










You may use



^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$


See the regex demo.



Details





  • ^ - start of string


  • ([^#?]*?) - Group 1: 0 or more chars other than # and ?, as few as possible


  • ([/?#]??.*|[/#]?#.*)? - an optional Group 2: either of the two:



    • [/?#]??.* - an optional /, ? or # followed with a ? char and then the rest of the string


    • | - or


    • [/#]?#.* - an optional / or # followed with a # char and then the rest of the string




  • (/?) - Group 3: an optional /


  • $ - end of string.






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420990%2fhow-to-simplify-this-regular-expression-to-use-in-google-analytics%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote



    accepted










    You may use



    ^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$


    See the regex demo.



    Details





    • ^ - start of string


    • ([^#?]*?) - Group 1: 0 or more chars other than # and ?, as few as possible


    • ([/?#]??.*|[/#]?#.*)? - an optional Group 2: either of the two:



      • [/?#]??.* - an optional /, ? or # followed with a ? char and then the rest of the string


      • | - or


      • [/#]?#.* - an optional / or # followed with a # char and then the rest of the string




    • (/?) - Group 3: an optional /


    • $ - end of string.






    share|improve this answer

























      up vote
      0
      down vote



      accepted










      You may use



      ^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$


      See the regex demo.



      Details





      • ^ - start of string


      • ([^#?]*?) - Group 1: 0 or more chars other than # and ?, as few as possible


      • ([/?#]??.*|[/#]?#.*)? - an optional Group 2: either of the two:



        • [/?#]??.* - an optional /, ? or # followed with a ? char and then the rest of the string


        • | - or


        • [/#]?#.* - an optional / or # followed with a # char and then the rest of the string




      • (/?) - Group 3: an optional /


      • $ - end of string.






      share|improve this answer























        up vote
        0
        down vote



        accepted







        up vote
        0
        down vote



        accepted






        You may use



        ^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$


        See the regex demo.



        Details





        • ^ - start of string


        • ([^#?]*?) - Group 1: 0 or more chars other than # and ?, as few as possible


        • ([/?#]??.*|[/#]?#.*)? - an optional Group 2: either of the two:



          • [/?#]??.* - an optional /, ? or # followed with a ? char and then the rest of the string


          • | - or


          • [/#]?#.* - an optional / or # followed with a # char and then the rest of the string




        • (/?) - Group 3: an optional /


        • $ - end of string.






        share|improve this answer












        You may use



        ^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$


        See the regex demo.



        Details





        • ^ - start of string


        • ([^#?]*?) - Group 1: 0 or more chars other than # and ?, as few as possible


        • ([/?#]??.*|[/#]?#.*)? - an optional Group 2: either of the two:



          • [/?#]??.* - an optional /, ? or # followed with a ? char and then the rest of the string


          • | - or


          • [/#]?#.* - an optional / or # followed with a # char and then the rest of the string




        • (/?) - Group 3: an optional /


        • $ - end of string.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 23 at 7:51









        Wiktor Stribiżew

        305k16124201




        305k16124201






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420990%2fhow-to-simplify-this-regular-expression-to-use-in-google-analytics%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Sphinx de Gizeh

            Dijon

            Guerrita