Intel MPI benchmark fails when # bytes > 128: IMB-EXT












0














I just installed Linux and Intel MPI to two machines:



(1) Quite old (~8 years old) SuperMicro server, which has 24 cores (Intel Xeon X7542 X 4). 32 GB memory.
OS: CentOS 7.5



(2) New HP ProLiant DL380 server, which has 32 cores (Intel Xeon Gold 6130 X 2). 64 GB memory.
OS: OpenSUSE Leap 15



After installing OS and Intel MPI, I compiled intel MPI benchmark and ran it:



$ mpirun -np 4 ./IMB-EXT


It is quite surprising that I find the same error when running IMB-EXT and IMB-RMA, though I have a different OS and everything (even GCC version used to compile Intel MPI benchmark is different -- in CentOS, I used GCC 6.5.0, and in OpenSUSE, I used GCC 7.3.1).



On the CentOS machine, I get:



#---------------------------------------------------
# Benchmarking Unidir_Put
# #processes = 2
# ( 2 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#
# MODE: AGGREGATE
#
#bytes #repetitions t[usec] Mbytes/sec
0 1000 0.05 0.00
4 1000 30.56 0.13
8 1000 31.53 0.25
16 1000 30.99 0.52
32 1000 30.93 1.03
64 1000 30.30 2.11
128 1000 30.31 4.22


and on the OpenSUSE machine, I get



#---------------------------------------------------
# Benchmarking Unidir_Put
# #processes = 2
# ( 2 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#
# MODE: AGGREGATE
#
#bytes #repetitions t[usec] Mbytes/sec
0 1000 0.04 0.00
4 1000 14.40 0.28
8 1000 14.04 0.57
16 1000 14.10 1.13
32 1000 13.96 2.29
64 1000 13.98 4.58
128 1000 14.08 9.09


When I don't use mpirun (which means there is only one process to run IMB-EXT), the benchmark runs through, but Unidir_Put needs >=2 processes, so doesn't help so much, and I also find that the functions with MPI_Put and MPI_Get is extremely slower than I expected (from my experience). Also, using MVAPICH on the OpenSUSE machine did not help. The output is:



#---------------------------------------------------
# Benchmarking Unidir_Put
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#
# MODE: AGGREGATE
#
#bytes #repetitions t[usec] Mbytes/sec
0 1000 0.03 0.00
4 1000 17.37 0.23
8 1000 17.08 0.47
16 1000 17.23 0.93
32 1000 17.56 1.82
64 1000 17.06 3.75
128 1000 17.20 7.44

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 49213 RUNNING AT iron-0-1
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions


update: I tested OpenMPI, and it goes through smoothly (although my application does not recommend using openmpi, and I still don't understand why Intel MPI or MVAPICH doesn't work...)



#---------------------------------------------------
# Benchmarking Unidir_Put
# #processes = 2
# ( 2 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#
# MODE: AGGREGATE
#
#bytes #repetitions t[usec] Mbytes/sec
0 1000 0.06 0.00
4 1000 0.23 17.44
8 1000 0.22 35.82
16 1000 0.22 72.36
32 1000 0.22 144.98
64 1000 0.22 285.76
128 1000 0.30 430.29
256 1000 0.39 650.78
512 1000 0.51 1008.31
1024 1000 0.84 1214.42
2048 1000 1.86 1100.29
4096 1000 7.31 560.59
8192 1000 15.24 537.67
16384 1000 15.39 1064.82
32768 1000 15.70 2086.51
65536 640 12.31 5324.63
131072 320 10.24 12795.03
262144 160 12.49 20993.49
524288 80 30.21 17356.93
1048576 40 81.20 12913.67
2097152 20 199.20 10527.72
4194304 10 394.02 10644.77


Is there any chance that I am missing something in installing MPI, or installing OS in these servers? Actually, I assume that OS is the problem, but not sure where to start...



Thanks a lot in advance,



Jae










share|improve this question





























    0














    I just installed Linux and Intel MPI to two machines:



    (1) Quite old (~8 years old) SuperMicro server, which has 24 cores (Intel Xeon X7542 X 4). 32 GB memory.
    OS: CentOS 7.5



    (2) New HP ProLiant DL380 server, which has 32 cores (Intel Xeon Gold 6130 X 2). 64 GB memory.
    OS: OpenSUSE Leap 15



    After installing OS and Intel MPI, I compiled intel MPI benchmark and ran it:



    $ mpirun -np 4 ./IMB-EXT


    It is quite surprising that I find the same error when running IMB-EXT and IMB-RMA, though I have a different OS and everything (even GCC version used to compile Intel MPI benchmark is different -- in CentOS, I used GCC 6.5.0, and in OpenSUSE, I used GCC 7.3.1).



    On the CentOS machine, I get:



    #---------------------------------------------------
    # Benchmarking Unidir_Put
    # #processes = 2
    # ( 2 additional processes waiting in MPI_Barrier)
    #---------------------------------------------------
    #
    # MODE: AGGREGATE
    #
    #bytes #repetitions t[usec] Mbytes/sec
    0 1000 0.05 0.00
    4 1000 30.56 0.13
    8 1000 31.53 0.25
    16 1000 30.99 0.52
    32 1000 30.93 1.03
    64 1000 30.30 2.11
    128 1000 30.31 4.22


    and on the OpenSUSE machine, I get



    #---------------------------------------------------
    # Benchmarking Unidir_Put
    # #processes = 2
    # ( 2 additional processes waiting in MPI_Barrier)
    #---------------------------------------------------
    #
    # MODE: AGGREGATE
    #
    #bytes #repetitions t[usec] Mbytes/sec
    0 1000 0.04 0.00
    4 1000 14.40 0.28
    8 1000 14.04 0.57
    16 1000 14.10 1.13
    32 1000 13.96 2.29
    64 1000 13.98 4.58
    128 1000 14.08 9.09


    When I don't use mpirun (which means there is only one process to run IMB-EXT), the benchmark runs through, but Unidir_Put needs >=2 processes, so doesn't help so much, and I also find that the functions with MPI_Put and MPI_Get is extremely slower than I expected (from my experience). Also, using MVAPICH on the OpenSUSE machine did not help. The output is:



    #---------------------------------------------------
    # Benchmarking Unidir_Put
    # #processes = 2
    # ( 6 additional processes waiting in MPI_Barrier)
    #---------------------------------------------------
    #
    # MODE: AGGREGATE
    #
    #bytes #repetitions t[usec] Mbytes/sec
    0 1000 0.03 0.00
    4 1000 17.37 0.23
    8 1000 17.08 0.47
    16 1000 17.23 0.93
    32 1000 17.56 1.82
    64 1000 17.06 3.75
    128 1000 17.20 7.44

    ===================================================================================
    = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
    = PID 49213 RUNNING AT iron-0-1
    = EXIT CODE: 139
    = CLEANING UP REMAINING PROCESSES
    = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
    ===================================================================================
    YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
    This typically refers to a problem with your application.
    Please see the FAQ page for debugging suggestions


    update: I tested OpenMPI, and it goes through smoothly (although my application does not recommend using openmpi, and I still don't understand why Intel MPI or MVAPICH doesn't work...)



    #---------------------------------------------------
    # Benchmarking Unidir_Put
    # #processes = 2
    # ( 2 additional processes waiting in MPI_Barrier)
    #---------------------------------------------------
    #
    # MODE: AGGREGATE
    #
    #bytes #repetitions t[usec] Mbytes/sec
    0 1000 0.06 0.00
    4 1000 0.23 17.44
    8 1000 0.22 35.82
    16 1000 0.22 72.36
    32 1000 0.22 144.98
    64 1000 0.22 285.76
    128 1000 0.30 430.29
    256 1000 0.39 650.78
    512 1000 0.51 1008.31
    1024 1000 0.84 1214.42
    2048 1000 1.86 1100.29
    4096 1000 7.31 560.59
    8192 1000 15.24 537.67
    16384 1000 15.39 1064.82
    32768 1000 15.70 2086.51
    65536 640 12.31 5324.63
    131072 320 10.24 12795.03
    262144 160 12.49 20993.49
    524288 80 30.21 17356.93
    1048576 40 81.20 12913.67
    2097152 20 199.20 10527.72
    4194304 10 394.02 10644.77


    Is there any chance that I am missing something in installing MPI, or installing OS in these servers? Actually, I assume that OS is the problem, but not sure where to start...



    Thanks a lot in advance,



    Jae










    share|improve this question



























      0












      0








      0







      I just installed Linux and Intel MPI to two machines:



      (1) Quite old (~8 years old) SuperMicro server, which has 24 cores (Intel Xeon X7542 X 4). 32 GB memory.
      OS: CentOS 7.5



      (2) New HP ProLiant DL380 server, which has 32 cores (Intel Xeon Gold 6130 X 2). 64 GB memory.
      OS: OpenSUSE Leap 15



      After installing OS and Intel MPI, I compiled intel MPI benchmark and ran it:



      $ mpirun -np 4 ./IMB-EXT


      It is quite surprising that I find the same error when running IMB-EXT and IMB-RMA, though I have a different OS and everything (even GCC version used to compile Intel MPI benchmark is different -- in CentOS, I used GCC 6.5.0, and in OpenSUSE, I used GCC 7.3.1).



      On the CentOS machine, I get:



      #---------------------------------------------------
      # Benchmarking Unidir_Put
      # #processes = 2
      # ( 2 additional processes waiting in MPI_Barrier)
      #---------------------------------------------------
      #
      # MODE: AGGREGATE
      #
      #bytes #repetitions t[usec] Mbytes/sec
      0 1000 0.05 0.00
      4 1000 30.56 0.13
      8 1000 31.53 0.25
      16 1000 30.99 0.52
      32 1000 30.93 1.03
      64 1000 30.30 2.11
      128 1000 30.31 4.22


      and on the OpenSUSE machine, I get



      #---------------------------------------------------
      # Benchmarking Unidir_Put
      # #processes = 2
      # ( 2 additional processes waiting in MPI_Barrier)
      #---------------------------------------------------
      #
      # MODE: AGGREGATE
      #
      #bytes #repetitions t[usec] Mbytes/sec
      0 1000 0.04 0.00
      4 1000 14.40 0.28
      8 1000 14.04 0.57
      16 1000 14.10 1.13
      32 1000 13.96 2.29
      64 1000 13.98 4.58
      128 1000 14.08 9.09


      When I don't use mpirun (which means there is only one process to run IMB-EXT), the benchmark runs through, but Unidir_Put needs >=2 processes, so doesn't help so much, and I also find that the functions with MPI_Put and MPI_Get is extremely slower than I expected (from my experience). Also, using MVAPICH on the OpenSUSE machine did not help. The output is:



      #---------------------------------------------------
      # Benchmarking Unidir_Put
      # #processes = 2
      # ( 6 additional processes waiting in MPI_Barrier)
      #---------------------------------------------------
      #
      # MODE: AGGREGATE
      #
      #bytes #repetitions t[usec] Mbytes/sec
      0 1000 0.03 0.00
      4 1000 17.37 0.23
      8 1000 17.08 0.47
      16 1000 17.23 0.93
      32 1000 17.56 1.82
      64 1000 17.06 3.75
      128 1000 17.20 7.44

      ===================================================================================
      = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
      = PID 49213 RUNNING AT iron-0-1
      = EXIT CODE: 139
      = CLEANING UP REMAINING PROCESSES
      = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
      ===================================================================================
      YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
      This typically refers to a problem with your application.
      Please see the FAQ page for debugging suggestions


      update: I tested OpenMPI, and it goes through smoothly (although my application does not recommend using openmpi, and I still don't understand why Intel MPI or MVAPICH doesn't work...)



      #---------------------------------------------------
      # Benchmarking Unidir_Put
      # #processes = 2
      # ( 2 additional processes waiting in MPI_Barrier)
      #---------------------------------------------------
      #
      # MODE: AGGREGATE
      #
      #bytes #repetitions t[usec] Mbytes/sec
      0 1000 0.06 0.00
      4 1000 0.23 17.44
      8 1000 0.22 35.82
      16 1000 0.22 72.36
      32 1000 0.22 144.98
      64 1000 0.22 285.76
      128 1000 0.30 430.29
      256 1000 0.39 650.78
      512 1000 0.51 1008.31
      1024 1000 0.84 1214.42
      2048 1000 1.86 1100.29
      4096 1000 7.31 560.59
      8192 1000 15.24 537.67
      16384 1000 15.39 1064.82
      32768 1000 15.70 2086.51
      65536 640 12.31 5324.63
      131072 320 10.24 12795.03
      262144 160 12.49 20993.49
      524288 80 30.21 17356.93
      1048576 40 81.20 12913.67
      2097152 20 199.20 10527.72
      4194304 10 394.02 10644.77


      Is there any chance that I am missing something in installing MPI, or installing OS in these servers? Actually, I assume that OS is the problem, but not sure where to start...



      Thanks a lot in advance,



      Jae










      share|improve this question















      I just installed Linux and Intel MPI to two machines:



      (1) Quite old (~8 years old) SuperMicro server, which has 24 cores (Intel Xeon X7542 X 4). 32 GB memory.
      OS: CentOS 7.5



      (2) New HP ProLiant DL380 server, which has 32 cores (Intel Xeon Gold 6130 X 2). 64 GB memory.
      OS: OpenSUSE Leap 15



      After installing OS and Intel MPI, I compiled intel MPI benchmark and ran it:



      $ mpirun -np 4 ./IMB-EXT


      It is quite surprising that I find the same error when running IMB-EXT and IMB-RMA, though I have a different OS and everything (even GCC version used to compile Intel MPI benchmark is different -- in CentOS, I used GCC 6.5.0, and in OpenSUSE, I used GCC 7.3.1).



      On the CentOS machine, I get:



      #---------------------------------------------------
      # Benchmarking Unidir_Put
      # #processes = 2
      # ( 2 additional processes waiting in MPI_Barrier)
      #---------------------------------------------------
      #
      # MODE: AGGREGATE
      #
      #bytes #repetitions t[usec] Mbytes/sec
      0 1000 0.05 0.00
      4 1000 30.56 0.13
      8 1000 31.53 0.25
      16 1000 30.99 0.52
      32 1000 30.93 1.03
      64 1000 30.30 2.11
      128 1000 30.31 4.22


      and on the OpenSUSE machine, I get



      #---------------------------------------------------
      # Benchmarking Unidir_Put
      # #processes = 2
      # ( 2 additional processes waiting in MPI_Barrier)
      #---------------------------------------------------
      #
      # MODE: AGGREGATE
      #
      #bytes #repetitions t[usec] Mbytes/sec
      0 1000 0.04 0.00
      4 1000 14.40 0.28
      8 1000 14.04 0.57
      16 1000 14.10 1.13
      32 1000 13.96 2.29
      64 1000 13.98 4.58
      128 1000 14.08 9.09


      When I don't use mpirun (which means there is only one process to run IMB-EXT), the benchmark runs through, but Unidir_Put needs >=2 processes, so doesn't help so much, and I also find that the functions with MPI_Put and MPI_Get is extremely slower than I expected (from my experience). Also, using MVAPICH on the OpenSUSE machine did not help. The output is:



      #---------------------------------------------------
      # Benchmarking Unidir_Put
      # #processes = 2
      # ( 6 additional processes waiting in MPI_Barrier)
      #---------------------------------------------------
      #
      # MODE: AGGREGATE
      #
      #bytes #repetitions t[usec] Mbytes/sec
      0 1000 0.03 0.00
      4 1000 17.37 0.23
      8 1000 17.08 0.47
      16 1000 17.23 0.93
      32 1000 17.56 1.82
      64 1000 17.06 3.75
      128 1000 17.20 7.44

      ===================================================================================
      = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
      = PID 49213 RUNNING AT iron-0-1
      = EXIT CODE: 139
      = CLEANING UP REMAINING PROCESSES
      = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
      ===================================================================================
      YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
      This typically refers to a problem with your application.
      Please see the FAQ page for debugging suggestions


      update: I tested OpenMPI, and it goes through smoothly (although my application does not recommend using openmpi, and I still don't understand why Intel MPI or MVAPICH doesn't work...)



      #---------------------------------------------------
      # Benchmarking Unidir_Put
      # #processes = 2
      # ( 2 additional processes waiting in MPI_Barrier)
      #---------------------------------------------------
      #
      # MODE: AGGREGATE
      #
      #bytes #repetitions t[usec] Mbytes/sec
      0 1000 0.06 0.00
      4 1000 0.23 17.44
      8 1000 0.22 35.82
      16 1000 0.22 72.36
      32 1000 0.22 144.98
      64 1000 0.22 285.76
      128 1000 0.30 430.29
      256 1000 0.39 650.78
      512 1000 0.51 1008.31
      1024 1000 0.84 1214.42
      2048 1000 1.86 1100.29
      4096 1000 7.31 560.59
      8192 1000 15.24 537.67
      16384 1000 15.39 1064.82
      32768 1000 15.70 2086.51
      65536 640 12.31 5324.63
      131072 320 10.24 12795.03
      262144 160 12.49 20993.49
      524288 80 30.21 17356.93
      1048576 40 81.20 12913.67
      2097152 20 199.20 10527.72
      4194304 10 394.02 10644.77


      Is there any chance that I am missing something in installing MPI, or installing OS in these servers? Actually, I assume that OS is the problem, but not sure where to start...



      Thanks a lot in advance,



      Jae







      parallel-processing mpi openmpi mvapich2 intel-mpi






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 1 at 9:19

























      asked Nov 1 at 7:45









      Jae

      12




      12
























          1 Answer
          1






          active

          oldest

          votes


















          0














          Although this question is well written, you were not explicit about




          • Intel MPI benchmark (please add header)

          • Intel MPI

          • Open MPI

          • MVAPICH

          • supported host network fabrics - for each MPI distribution

          • selected fabric while running MPI benchmark

          • Compilation settings


          Debugging this kind of trouble with disparate host machines, multiple Linux distributions and compiler versions can be quite hard. Remote debugging on StackOverflow is even harder.



          First of all ensure reproducibility. This seems to be the case. One of many debugging approaches, the one I would recommend, is to reduce complexity of the system as a whole, test smaller sub-systems and start shifting responsibility to third parties. You may replace self-compiled executables with software packages provided by distribution software/package repositories or third parties like Conda.



          Intel recently started to provide its libraries through YUM/APT repos as well as for Conda and PyPI. I found that helps a lot with reproducible deployments of HPC clusters and even runtime/development environments. I recommend to use it for CentOS 7.5.



          YUM/APT repository for Intel MKL, Intel IPP, Intel DAAL, and Intel® Distribution for Python* (for Linux*):




          • Installing Intel® Performance Libraries and Intel® Distribution for Python* Using YUM Repository

          • Installing Intel® Performance Libraries and Intel® Distribution for Python* Using APT Repository


          Conda* package/ Anaconda Cloud* support (Intel MKL, Intel IPP, Intel DAAL, Intel Distribution for Python):




          • Installing Intel Distribution for Python and Intel Performance Libraries with Anaconda

          • Available Intel packages can be viewed here


          Install from the Python Package Index (PyPI) using pip (Intel MKL, Intel IPP, Intel DAAL)




          • Installing the Intel® Distribution for Python* and Intel® Performance Libraries with pip and PyPI


          I do not know much about OpenSUSE Leap 15.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53097065%2fintel-mpi-benchmark-fails-when-bytes-128-imb-ext%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            Although this question is well written, you were not explicit about




            • Intel MPI benchmark (please add header)

            • Intel MPI

            • Open MPI

            • MVAPICH

            • supported host network fabrics - for each MPI distribution

            • selected fabric while running MPI benchmark

            • Compilation settings


            Debugging this kind of trouble with disparate host machines, multiple Linux distributions and compiler versions can be quite hard. Remote debugging on StackOverflow is even harder.



            First of all ensure reproducibility. This seems to be the case. One of many debugging approaches, the one I would recommend, is to reduce complexity of the system as a whole, test smaller sub-systems and start shifting responsibility to third parties. You may replace self-compiled executables with software packages provided by distribution software/package repositories or third parties like Conda.



            Intel recently started to provide its libraries through YUM/APT repos as well as for Conda and PyPI. I found that helps a lot with reproducible deployments of HPC clusters and even runtime/development environments. I recommend to use it for CentOS 7.5.



            YUM/APT repository for Intel MKL, Intel IPP, Intel DAAL, and Intel® Distribution for Python* (for Linux*):




            • Installing Intel® Performance Libraries and Intel® Distribution for Python* Using YUM Repository

            • Installing Intel® Performance Libraries and Intel® Distribution for Python* Using APT Repository


            Conda* package/ Anaconda Cloud* support (Intel MKL, Intel IPP, Intel DAAL, Intel Distribution for Python):




            • Installing Intel Distribution for Python and Intel Performance Libraries with Anaconda

            • Available Intel packages can be viewed here


            Install from the Python Package Index (PyPI) using pip (Intel MKL, Intel IPP, Intel DAAL)




            • Installing the Intel® Distribution for Python* and Intel® Performance Libraries with pip and PyPI


            I do not know much about OpenSUSE Leap 15.






            share|improve this answer




























              0














              Although this question is well written, you were not explicit about




              • Intel MPI benchmark (please add header)

              • Intel MPI

              • Open MPI

              • MVAPICH

              • supported host network fabrics - for each MPI distribution

              • selected fabric while running MPI benchmark

              • Compilation settings


              Debugging this kind of trouble with disparate host machines, multiple Linux distributions and compiler versions can be quite hard. Remote debugging on StackOverflow is even harder.



              First of all ensure reproducibility. This seems to be the case. One of many debugging approaches, the one I would recommend, is to reduce complexity of the system as a whole, test smaller sub-systems and start shifting responsibility to third parties. You may replace self-compiled executables with software packages provided by distribution software/package repositories or third parties like Conda.



              Intel recently started to provide its libraries through YUM/APT repos as well as for Conda and PyPI. I found that helps a lot with reproducible deployments of HPC clusters and even runtime/development environments. I recommend to use it for CentOS 7.5.



              YUM/APT repository for Intel MKL, Intel IPP, Intel DAAL, and Intel® Distribution for Python* (for Linux*):




              • Installing Intel® Performance Libraries and Intel® Distribution for Python* Using YUM Repository

              • Installing Intel® Performance Libraries and Intel® Distribution for Python* Using APT Repository


              Conda* package/ Anaconda Cloud* support (Intel MKL, Intel IPP, Intel DAAL, Intel Distribution for Python):




              • Installing Intel Distribution for Python and Intel Performance Libraries with Anaconda

              • Available Intel packages can be viewed here


              Install from the Python Package Index (PyPI) using pip (Intel MKL, Intel IPP, Intel DAAL)




              • Installing the Intel® Distribution for Python* and Intel® Performance Libraries with pip and PyPI


              I do not know much about OpenSUSE Leap 15.






              share|improve this answer


























                0












                0








                0






                Although this question is well written, you were not explicit about




                • Intel MPI benchmark (please add header)

                • Intel MPI

                • Open MPI

                • MVAPICH

                • supported host network fabrics - for each MPI distribution

                • selected fabric while running MPI benchmark

                • Compilation settings


                Debugging this kind of trouble with disparate host machines, multiple Linux distributions and compiler versions can be quite hard. Remote debugging on StackOverflow is even harder.



                First of all ensure reproducibility. This seems to be the case. One of many debugging approaches, the one I would recommend, is to reduce complexity of the system as a whole, test smaller sub-systems and start shifting responsibility to third parties. You may replace self-compiled executables with software packages provided by distribution software/package repositories or third parties like Conda.



                Intel recently started to provide its libraries through YUM/APT repos as well as for Conda and PyPI. I found that helps a lot with reproducible deployments of HPC clusters and even runtime/development environments. I recommend to use it for CentOS 7.5.



                YUM/APT repository for Intel MKL, Intel IPP, Intel DAAL, and Intel® Distribution for Python* (for Linux*):




                • Installing Intel® Performance Libraries and Intel® Distribution for Python* Using YUM Repository

                • Installing Intel® Performance Libraries and Intel® Distribution for Python* Using APT Repository


                Conda* package/ Anaconda Cloud* support (Intel MKL, Intel IPP, Intel DAAL, Intel Distribution for Python):




                • Installing Intel Distribution for Python and Intel Performance Libraries with Anaconda

                • Available Intel packages can be viewed here


                Install from the Python Package Index (PyPI) using pip (Intel MKL, Intel IPP, Intel DAAL)




                • Installing the Intel® Distribution for Python* and Intel® Performance Libraries with pip and PyPI


                I do not know much about OpenSUSE Leap 15.






                share|improve this answer














                Although this question is well written, you were not explicit about




                • Intel MPI benchmark (please add header)

                • Intel MPI

                • Open MPI

                • MVAPICH

                • supported host network fabrics - for each MPI distribution

                • selected fabric while running MPI benchmark

                • Compilation settings


                Debugging this kind of trouble with disparate host machines, multiple Linux distributions and compiler versions can be quite hard. Remote debugging on StackOverflow is even harder.



                First of all ensure reproducibility. This seems to be the case. One of many debugging approaches, the one I would recommend, is to reduce complexity of the system as a whole, test smaller sub-systems and start shifting responsibility to third parties. You may replace self-compiled executables with software packages provided by distribution software/package repositories or third parties like Conda.



                Intel recently started to provide its libraries through YUM/APT repos as well as for Conda and PyPI. I found that helps a lot with reproducible deployments of HPC clusters and even runtime/development environments. I recommend to use it for CentOS 7.5.



                YUM/APT repository for Intel MKL, Intel IPP, Intel DAAL, and Intel® Distribution for Python* (for Linux*):




                • Installing Intel® Performance Libraries and Intel® Distribution for Python* Using YUM Repository

                • Installing Intel® Performance Libraries and Intel® Distribution for Python* Using APT Repository


                Conda* package/ Anaconda Cloud* support (Intel MKL, Intel IPP, Intel DAAL, Intel Distribution for Python):




                • Installing Intel Distribution for Python and Intel Performance Libraries with Anaconda

                • Available Intel packages can be viewed here


                Install from the Python Package Index (PyPI) using pip (Intel MKL, Intel IPP, Intel DAAL)




                • Installing the Intel® Distribution for Python* and Intel® Performance Libraries with pip and PyPI


                I do not know much about OpenSUSE Leap 15.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 22 at 9:37

























                answered Nov 20 at 11:25









                Sascha Gottfried

                2,4611224




                2,4611224






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53097065%2fintel-mpi-benchmark-fails-when-bytes-128-imb-ext%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Berounka

                    Different font size/position of beamer's navigation symbols template's content depending on regular/plain...

                    Sphinx de Gizeh