openshift 3.11 storageos networking issue












0















I've created an openshift 3.11 3 node cluster, 2 of which are compute
nodes. I've installed storageos on this cluster. One of the compute
nodes seems fine with the storageos installation, however the 2nd
compute node can't reach the 1st node. It appears that the error
is routing related.



the 2nd node will not route to the 1st node it appears.



[root@cortado-o1 standard]# oc get pod -n storageos
NAME READY STATUS RESTARTS AGE
storageos-47qgc 1/1 Running 0 6m
storageos-6bqqp 0/1 Running 3 7m

[root@cortado-o2 ~]# netstat -na | grep 5705
tcp6 0 0 :::5705

[root@cortado-o3 ~]# netstat -na | grep 5705
tcp 0 0 192.168.0.101:43588 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43548 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43522 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43458 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43628 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43602 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43562 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43502 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43476 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43412 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43430 192.168.0.101:5705 TIME_WAIT
tcp6 0 0 :::5705 :::* LISTEN

[root@cortado-o3 ~]# !nc
nc 192.168.0.102 5705
Ncat: No route to host.
[root@cortado-o3 ~]# hostname --ip-address
192.168.0.101

time="2018-11-13T04:24:38Z" level=error msg="failed to join existing cluster" action=create category=etcd endpoint="192.168.0.102,192.168.0.101" error="Get http://192.168.0.102:5705/v1/members: dial tcp 192.168.0.102:5705: connect: no route to host" module=cp
time="2018-11-13T04:24:38Z" level=info msg="not first cluster node, joining first node" action=create address=192.168.0.101 category=etcd host=cortado-o3 module=cp target=192.168.0.101
time="2018-11-13T04:24:38Z" level=error msg="failed to join existing cluster" action=create category=etcd endpoint="192.168.0.102,192.168.0.101" error="503 Service Unavailable" module=cp
time="2018-11-13T04:24:38Z" level=info msg="retrying cluster join in 5 seconds..." action=create category=etcd module=cp


any suggestions? many thanks.










share|improve this question





























    0















    I've created an openshift 3.11 3 node cluster, 2 of which are compute
    nodes. I've installed storageos on this cluster. One of the compute
    nodes seems fine with the storageos installation, however the 2nd
    compute node can't reach the 1st node. It appears that the error
    is routing related.



    the 2nd node will not route to the 1st node it appears.



    [root@cortado-o1 standard]# oc get pod -n storageos
    NAME READY STATUS RESTARTS AGE
    storageos-47qgc 1/1 Running 0 6m
    storageos-6bqqp 0/1 Running 3 7m

    [root@cortado-o2 ~]# netstat -na | grep 5705
    tcp6 0 0 :::5705

    [root@cortado-o3 ~]# netstat -na | grep 5705
    tcp 0 0 192.168.0.101:43588 192.168.0.101:5705 TIME_WAIT
    tcp 0 0 192.168.0.101:43548 192.168.0.101:5705 TIME_WAIT
    tcp 0 0 192.168.0.101:43522 192.168.0.101:5705 TIME_WAIT
    tcp 0 0 192.168.0.101:43458 192.168.0.101:5705 TIME_WAIT
    tcp 0 0 192.168.0.101:43628 192.168.0.101:5705 TIME_WAIT
    tcp 0 0 192.168.0.101:43602 192.168.0.101:5705 TIME_WAIT
    tcp 0 0 192.168.0.101:43562 192.168.0.101:5705 TIME_WAIT
    tcp 0 0 192.168.0.101:43502 192.168.0.101:5705 TIME_WAIT
    tcp 0 0 192.168.0.101:43476 192.168.0.101:5705 TIME_WAIT
    tcp 0 0 192.168.0.101:43412 192.168.0.101:5705 TIME_WAIT
    tcp 0 0 192.168.0.101:43430 192.168.0.101:5705 TIME_WAIT
    tcp6 0 0 :::5705 :::* LISTEN

    [root@cortado-o3 ~]# !nc
    nc 192.168.0.102 5705
    Ncat: No route to host.
    [root@cortado-o3 ~]# hostname --ip-address
    192.168.0.101

    time="2018-11-13T04:24:38Z" level=error msg="failed to join existing cluster" action=create category=etcd endpoint="192.168.0.102,192.168.0.101" error="Get http://192.168.0.102:5705/v1/members: dial tcp 192.168.0.102:5705: connect: no route to host" module=cp
    time="2018-11-13T04:24:38Z" level=info msg="not first cluster node, joining first node" action=create address=192.168.0.101 category=etcd host=cortado-o3 module=cp target=192.168.0.101
    time="2018-11-13T04:24:38Z" level=error msg="failed to join existing cluster" action=create category=etcd endpoint="192.168.0.102,192.168.0.101" error="503 Service Unavailable" module=cp
    time="2018-11-13T04:24:38Z" level=info msg="retrying cluster join in 5 seconds..." action=create category=etcd module=cp


    any suggestions? many thanks.










    share|improve this question



























      0












      0








      0








      I've created an openshift 3.11 3 node cluster, 2 of which are compute
      nodes. I've installed storageos on this cluster. One of the compute
      nodes seems fine with the storageos installation, however the 2nd
      compute node can't reach the 1st node. It appears that the error
      is routing related.



      the 2nd node will not route to the 1st node it appears.



      [root@cortado-o1 standard]# oc get pod -n storageos
      NAME READY STATUS RESTARTS AGE
      storageos-47qgc 1/1 Running 0 6m
      storageos-6bqqp 0/1 Running 3 7m

      [root@cortado-o2 ~]# netstat -na | grep 5705
      tcp6 0 0 :::5705

      [root@cortado-o3 ~]# netstat -na | grep 5705
      tcp 0 0 192.168.0.101:43588 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43548 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43522 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43458 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43628 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43602 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43562 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43502 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43476 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43412 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43430 192.168.0.101:5705 TIME_WAIT
      tcp6 0 0 :::5705 :::* LISTEN

      [root@cortado-o3 ~]# !nc
      nc 192.168.0.102 5705
      Ncat: No route to host.
      [root@cortado-o3 ~]# hostname --ip-address
      192.168.0.101

      time="2018-11-13T04:24:38Z" level=error msg="failed to join existing cluster" action=create category=etcd endpoint="192.168.0.102,192.168.0.101" error="Get http://192.168.0.102:5705/v1/members: dial tcp 192.168.0.102:5705: connect: no route to host" module=cp
      time="2018-11-13T04:24:38Z" level=info msg="not first cluster node, joining first node" action=create address=192.168.0.101 category=etcd host=cortado-o3 module=cp target=192.168.0.101
      time="2018-11-13T04:24:38Z" level=error msg="failed to join existing cluster" action=create category=etcd endpoint="192.168.0.102,192.168.0.101" error="503 Service Unavailable" module=cp
      time="2018-11-13T04:24:38Z" level=info msg="retrying cluster join in 5 seconds..." action=create category=etcd module=cp


      any suggestions? many thanks.










      share|improve this question
















      I've created an openshift 3.11 3 node cluster, 2 of which are compute
      nodes. I've installed storageos on this cluster. One of the compute
      nodes seems fine with the storageos installation, however the 2nd
      compute node can't reach the 1st node. It appears that the error
      is routing related.



      the 2nd node will not route to the 1st node it appears.



      [root@cortado-o1 standard]# oc get pod -n storageos
      NAME READY STATUS RESTARTS AGE
      storageos-47qgc 1/1 Running 0 6m
      storageos-6bqqp 0/1 Running 3 7m

      [root@cortado-o2 ~]# netstat -na | grep 5705
      tcp6 0 0 :::5705

      [root@cortado-o3 ~]# netstat -na | grep 5705
      tcp 0 0 192.168.0.101:43588 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43548 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43522 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43458 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43628 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43602 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43562 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43502 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43476 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43412 192.168.0.101:5705 TIME_WAIT
      tcp 0 0 192.168.0.101:43430 192.168.0.101:5705 TIME_WAIT
      tcp6 0 0 :::5705 :::* LISTEN

      [root@cortado-o3 ~]# !nc
      nc 192.168.0.102 5705
      Ncat: No route to host.
      [root@cortado-o3 ~]# hostname --ip-address
      192.168.0.101

      time="2018-11-13T04:24:38Z" level=error msg="failed to join existing cluster" action=create category=etcd endpoint="192.168.0.102,192.168.0.101" error="Get http://192.168.0.102:5705/v1/members: dial tcp 192.168.0.102:5705: connect: no route to host" module=cp
      time="2018-11-13T04:24:38Z" level=info msg="not first cluster node, joining first node" action=create address=192.168.0.101 category=etcd host=cortado-o3 module=cp target=192.168.0.101
      time="2018-11-13T04:24:38Z" level=error msg="failed to join existing cluster" action=create category=etcd endpoint="192.168.0.102,192.168.0.101" error="503 Service Unavailable" module=cp
      time="2018-11-13T04:24:38Z" level=info msg="retrying cluster join in 5 seconds..." action=create category=etcd module=cp


      any suggestions? many thanks.







      openshift






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 13 '18 at 5:26









      Robert

      2,15062535




      2,15062535










      asked Nov 13 '18 at 4:27









      jeff mccormickjeff mccormick

      83




      83
























          1 Answer
          1






          active

          oldest

          votes


















          0














          I can see on your netstat output that StorageOS is bound to the port, not that they can communicate. In fact the Ncat shows that there is no route to host, so they can't connect. StorageOS needs to be able to communicate among its nodes.



          The StorageOS docs have a reference about the prerequisites of the ports and how to open them. https://docs.storageos.com/docs/prerequisites/firewalls



          It depends on your OpenShift installation if you use ufw, firewalld or straight ip tables.



          For ufw try this:



          ufw default allow outgoing
          ufw allow 5701:5711/tcp
          ufw allow 5711/udp


          For firewalld try this:



          firewall-cmd --permanent  --new-service=storageos
          firewall-cmd --permanent --service=storageos --add-port=5700-5800/tcp
          firewall-cmd --add-service=storageos --zone=public --permanent
          firewall-cmd --reload


          For straight iptables:



          # Inbound traffic
          iptables -I INPUT -i lo -m comment --comment 'Permit loopback traffic' -j ACCEPT
          iptables -I INPUT -m state --state ESTABLISHED,RELATED -m comment --comment 'Permit established traffic' -j ACCEPT
          iptables -A INPUT -p tcp --dport 5701:5711 -m comment --comment 'StorageOS' -j ACCEPT
          iptables -A INPUT -p udp --dport 5711 -m comment --comment 'StorageOS' -j ACCEPT

          # Outbound traffic
          iptables -I OUTPUT -o lo -m comment --comment 'Permit loopback traffic' -j ACCEPT
          iptables -I OUTPUT -d 0.0.0.0/0 -m comment --comment 'Permit outbound traffic' -j ACCEPT


          Check also the troubleshooting page of storageos for this particular issue.
          https://docs.storageos.com/docs/platforms/openshift/troubleshoot/install#peer-discovery---networking



          In addition, less than 3 node cluster is not supported. You can have 1 node for testing or 3+. But having 2 nodes makes impossible to ensure quorum in a distributed environment unless you use StorageOS pointing the kv store to a external etcd.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53273806%2fopenshift-3-11-storageos-networking-issue%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            I can see on your netstat output that StorageOS is bound to the port, not that they can communicate. In fact the Ncat shows that there is no route to host, so they can't connect. StorageOS needs to be able to communicate among its nodes.



            The StorageOS docs have a reference about the prerequisites of the ports and how to open them. https://docs.storageos.com/docs/prerequisites/firewalls



            It depends on your OpenShift installation if you use ufw, firewalld or straight ip tables.



            For ufw try this:



            ufw default allow outgoing
            ufw allow 5701:5711/tcp
            ufw allow 5711/udp


            For firewalld try this:



            firewall-cmd --permanent  --new-service=storageos
            firewall-cmd --permanent --service=storageos --add-port=5700-5800/tcp
            firewall-cmd --add-service=storageos --zone=public --permanent
            firewall-cmd --reload


            For straight iptables:



            # Inbound traffic
            iptables -I INPUT -i lo -m comment --comment 'Permit loopback traffic' -j ACCEPT
            iptables -I INPUT -m state --state ESTABLISHED,RELATED -m comment --comment 'Permit established traffic' -j ACCEPT
            iptables -A INPUT -p tcp --dport 5701:5711 -m comment --comment 'StorageOS' -j ACCEPT
            iptables -A INPUT -p udp --dport 5711 -m comment --comment 'StorageOS' -j ACCEPT

            # Outbound traffic
            iptables -I OUTPUT -o lo -m comment --comment 'Permit loopback traffic' -j ACCEPT
            iptables -I OUTPUT -d 0.0.0.0/0 -m comment --comment 'Permit outbound traffic' -j ACCEPT


            Check also the troubleshooting page of storageos for this particular issue.
            https://docs.storageos.com/docs/platforms/openshift/troubleshoot/install#peer-discovery---networking



            In addition, less than 3 node cluster is not supported. You can have 1 node for testing or 3+. But having 2 nodes makes impossible to ensure quorum in a distributed environment unless you use StorageOS pointing the kv store to a external etcd.






            share|improve this answer




























              0














              I can see on your netstat output that StorageOS is bound to the port, not that they can communicate. In fact the Ncat shows that there is no route to host, so they can't connect. StorageOS needs to be able to communicate among its nodes.



              The StorageOS docs have a reference about the prerequisites of the ports and how to open them. https://docs.storageos.com/docs/prerequisites/firewalls



              It depends on your OpenShift installation if you use ufw, firewalld or straight ip tables.



              For ufw try this:



              ufw default allow outgoing
              ufw allow 5701:5711/tcp
              ufw allow 5711/udp


              For firewalld try this:



              firewall-cmd --permanent  --new-service=storageos
              firewall-cmd --permanent --service=storageos --add-port=5700-5800/tcp
              firewall-cmd --add-service=storageos --zone=public --permanent
              firewall-cmd --reload


              For straight iptables:



              # Inbound traffic
              iptables -I INPUT -i lo -m comment --comment 'Permit loopback traffic' -j ACCEPT
              iptables -I INPUT -m state --state ESTABLISHED,RELATED -m comment --comment 'Permit established traffic' -j ACCEPT
              iptables -A INPUT -p tcp --dport 5701:5711 -m comment --comment 'StorageOS' -j ACCEPT
              iptables -A INPUT -p udp --dport 5711 -m comment --comment 'StorageOS' -j ACCEPT

              # Outbound traffic
              iptables -I OUTPUT -o lo -m comment --comment 'Permit loopback traffic' -j ACCEPT
              iptables -I OUTPUT -d 0.0.0.0/0 -m comment --comment 'Permit outbound traffic' -j ACCEPT


              Check also the troubleshooting page of storageos for this particular issue.
              https://docs.storageos.com/docs/platforms/openshift/troubleshoot/install#peer-discovery---networking



              In addition, less than 3 node cluster is not supported. You can have 1 node for testing or 3+. But having 2 nodes makes impossible to ensure quorum in a distributed environment unless you use StorageOS pointing the kv store to a external etcd.






              share|improve this answer


























                0












                0








                0







                I can see on your netstat output that StorageOS is bound to the port, not that they can communicate. In fact the Ncat shows that there is no route to host, so they can't connect. StorageOS needs to be able to communicate among its nodes.



                The StorageOS docs have a reference about the prerequisites of the ports and how to open them. https://docs.storageos.com/docs/prerequisites/firewalls



                It depends on your OpenShift installation if you use ufw, firewalld or straight ip tables.



                For ufw try this:



                ufw default allow outgoing
                ufw allow 5701:5711/tcp
                ufw allow 5711/udp


                For firewalld try this:



                firewall-cmd --permanent  --new-service=storageos
                firewall-cmd --permanent --service=storageos --add-port=5700-5800/tcp
                firewall-cmd --add-service=storageos --zone=public --permanent
                firewall-cmd --reload


                For straight iptables:



                # Inbound traffic
                iptables -I INPUT -i lo -m comment --comment 'Permit loopback traffic' -j ACCEPT
                iptables -I INPUT -m state --state ESTABLISHED,RELATED -m comment --comment 'Permit established traffic' -j ACCEPT
                iptables -A INPUT -p tcp --dport 5701:5711 -m comment --comment 'StorageOS' -j ACCEPT
                iptables -A INPUT -p udp --dport 5711 -m comment --comment 'StorageOS' -j ACCEPT

                # Outbound traffic
                iptables -I OUTPUT -o lo -m comment --comment 'Permit loopback traffic' -j ACCEPT
                iptables -I OUTPUT -d 0.0.0.0/0 -m comment --comment 'Permit outbound traffic' -j ACCEPT


                Check also the troubleshooting page of storageos for this particular issue.
                https://docs.storageos.com/docs/platforms/openshift/troubleshoot/install#peer-discovery---networking



                In addition, less than 3 node cluster is not supported. You can have 1 node for testing or 3+. But having 2 nodes makes impossible to ensure quorum in a distributed environment unless you use StorageOS pointing the kv store to a external etcd.






                share|improve this answer













                I can see on your netstat output that StorageOS is bound to the port, not that they can communicate. In fact the Ncat shows that there is no route to host, so they can't connect. StorageOS needs to be able to communicate among its nodes.



                The StorageOS docs have a reference about the prerequisites of the ports and how to open them. https://docs.storageos.com/docs/prerequisites/firewalls



                It depends on your OpenShift installation if you use ufw, firewalld or straight ip tables.



                For ufw try this:



                ufw default allow outgoing
                ufw allow 5701:5711/tcp
                ufw allow 5711/udp


                For firewalld try this:



                firewall-cmd --permanent  --new-service=storageos
                firewall-cmd --permanent --service=storageos --add-port=5700-5800/tcp
                firewall-cmd --add-service=storageos --zone=public --permanent
                firewall-cmd --reload


                For straight iptables:



                # Inbound traffic
                iptables -I INPUT -i lo -m comment --comment 'Permit loopback traffic' -j ACCEPT
                iptables -I INPUT -m state --state ESTABLISHED,RELATED -m comment --comment 'Permit established traffic' -j ACCEPT
                iptables -A INPUT -p tcp --dport 5701:5711 -m comment --comment 'StorageOS' -j ACCEPT
                iptables -A INPUT -p udp --dport 5711 -m comment --comment 'StorageOS' -j ACCEPT

                # Outbound traffic
                iptables -I OUTPUT -o lo -m comment --comment 'Permit loopback traffic' -j ACCEPT
                iptables -I OUTPUT -d 0.0.0.0/0 -m comment --comment 'Permit outbound traffic' -j ACCEPT


                Check also the troubleshooting page of storageos for this particular issue.
                https://docs.storageos.com/docs/platforms/openshift/troubleshoot/install#peer-discovery---networking



                In addition, less than 3 node cluster is not supported. You can have 1 node for testing or 3+. But having 2 nodes makes impossible to ensure quorum in a distributed environment unless you use StorageOS pointing the kv store to a external etcd.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 23 '18 at 14:58









                Ferran Arau CastellFerran Arau Castell

                1013




                1013






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53273806%2fopenshift-3-11-storageos-networking-issue%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Berounka

                    Sphinx de Gizeh

                    Different font size/position of beamer's navigation symbols template's content depending on regular/plain...