Python dictionary key value into dataframe where clause in Pyspark











up vote
1
down vote

favorite












How can I pass a Python dictionary key value into dataframe where clause in Pyspark ...



Python dictionary as below ...



column_dict= { 'email': 'customer_email_addr' ,
'addr_bill': 'crq_st_addr' ,
'addr_ship': 'ship_to_addr' ,
'zip_bill': 'crq_zip_cd' ,
'zip_ship': 'ship_to_zip' ,
'phone_bill': 'crq_cm_phone' ,
'phone_ship' : 'ship_to_phone'}


I've a spark dataframe with around 3 billion records. Dataframe as follows ...



source_sql= ("select cust_id, customer_email_addr, crq_st_addr, ship_to_addr,
crq_zip_cd,ship_to_zip,crq_cm_phone,ship_to_phone from odl.cust_master where
trans_dt >= '{}' and trans_dt <= '{}' ").format('2017-11-01','2018-10-31')

cust_id_m = hiveCtx.sql(source_sql)
cust_id.cache()


My intention to find out distinct valid customer's for Email, Addr, Zip and Phone and run in loop for above dictionary keys. For this when I test spark shell for one key value as below ...



>>> cust_id_risk_m=cust_id_m.selectExpr("cust_id").where( 
("cust_id_m.'{}'").format(column_dict['email']) != '' ).distinct()


I'm getting error ... Need experts assistance in resolving this.



Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/mapr/spark/spark-2.1.0/python/pyspark/sql/dataframe.py", line 1026, in filter
raise TypeError("condition should be string or Column")
TypeError: condition should be string or Column









share|improve this question







New contributor




Bharath R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
























    up vote
    1
    down vote

    favorite












    How can I pass a Python dictionary key value into dataframe where clause in Pyspark ...



    Python dictionary as below ...



    column_dict= { 'email': 'customer_email_addr' ,
    'addr_bill': 'crq_st_addr' ,
    'addr_ship': 'ship_to_addr' ,
    'zip_bill': 'crq_zip_cd' ,
    'zip_ship': 'ship_to_zip' ,
    'phone_bill': 'crq_cm_phone' ,
    'phone_ship' : 'ship_to_phone'}


    I've a spark dataframe with around 3 billion records. Dataframe as follows ...



    source_sql= ("select cust_id, customer_email_addr, crq_st_addr, ship_to_addr,
    crq_zip_cd,ship_to_zip,crq_cm_phone,ship_to_phone from odl.cust_master where
    trans_dt >= '{}' and trans_dt <= '{}' ").format('2017-11-01','2018-10-31')

    cust_id_m = hiveCtx.sql(source_sql)
    cust_id.cache()


    My intention to find out distinct valid customer's for Email, Addr, Zip and Phone and run in loop for above dictionary keys. For this when I test spark shell for one key value as below ...



    >>> cust_id_risk_m=cust_id_m.selectExpr("cust_id").where( 
    ("cust_id_m.'{}'").format(column_dict['email']) != '' ).distinct()


    I'm getting error ... Need experts assistance in resolving this.



    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/opt/mapr/spark/spark-2.1.0/python/pyspark/sql/dataframe.py", line 1026, in filter
    raise TypeError("condition should be string or Column")
    TypeError: condition should be string or Column









    share|improve this question







    New contributor




    Bharath R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






















      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      How can I pass a Python dictionary key value into dataframe where clause in Pyspark ...



      Python dictionary as below ...



      column_dict= { 'email': 'customer_email_addr' ,
      'addr_bill': 'crq_st_addr' ,
      'addr_ship': 'ship_to_addr' ,
      'zip_bill': 'crq_zip_cd' ,
      'zip_ship': 'ship_to_zip' ,
      'phone_bill': 'crq_cm_phone' ,
      'phone_ship' : 'ship_to_phone'}


      I've a spark dataframe with around 3 billion records. Dataframe as follows ...



      source_sql= ("select cust_id, customer_email_addr, crq_st_addr, ship_to_addr,
      crq_zip_cd,ship_to_zip,crq_cm_phone,ship_to_phone from odl.cust_master where
      trans_dt >= '{}' and trans_dt <= '{}' ").format('2017-11-01','2018-10-31')

      cust_id_m = hiveCtx.sql(source_sql)
      cust_id.cache()


      My intention to find out distinct valid customer's for Email, Addr, Zip and Phone and run in loop for above dictionary keys. For this when I test spark shell for one key value as below ...



      >>> cust_id_risk_m=cust_id_m.selectExpr("cust_id").where( 
      ("cust_id_m.'{}'").format(column_dict['email']) != '' ).distinct()


      I'm getting error ... Need experts assistance in resolving this.



      Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/opt/mapr/spark/spark-2.1.0/python/pyspark/sql/dataframe.py", line 1026, in filter
      raise TypeError("condition should be string or Column")
      TypeError: condition should be string or Column









      share|improve this question







      New contributor




      Bharath R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      How can I pass a Python dictionary key value into dataframe where clause in Pyspark ...



      Python dictionary as below ...



      column_dict= { 'email': 'customer_email_addr' ,
      'addr_bill': 'crq_st_addr' ,
      'addr_ship': 'ship_to_addr' ,
      'zip_bill': 'crq_zip_cd' ,
      'zip_ship': 'ship_to_zip' ,
      'phone_bill': 'crq_cm_phone' ,
      'phone_ship' : 'ship_to_phone'}


      I've a spark dataframe with around 3 billion records. Dataframe as follows ...



      source_sql= ("select cust_id, customer_email_addr, crq_st_addr, ship_to_addr,
      crq_zip_cd,ship_to_zip,crq_cm_phone,ship_to_phone from odl.cust_master where
      trans_dt >= '{}' and trans_dt <= '{}' ").format('2017-11-01','2018-10-31')

      cust_id_m = hiveCtx.sql(source_sql)
      cust_id.cache()


      My intention to find out distinct valid customer's for Email, Addr, Zip and Phone and run in loop for above dictionary keys. For this when I test spark shell for one key value as below ...



      >>> cust_id_risk_m=cust_id_m.selectExpr("cust_id").where( 
      ("cust_id_m.'{}'").format(column_dict['email']) != '' ).distinct()


      I'm getting error ... Need experts assistance in resolving this.



      Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/opt/mapr/spark/spark-2.1.0/python/pyspark/sql/dataframe.py", line 1026, in filter
      raise TypeError("condition should be string or Column")
      TypeError: condition should be string or Column






      python apache-spark pyspark pyspark-sql






      share|improve this question







      New contributor




      Bharath R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      Bharath R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      Bharath R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked Nov 21 at 3:45









      Bharath R

      61




      61




      New contributor




      Bharath R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Bharath R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Bharath R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





























          active

          oldest

          votes











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          Bharath R is a new contributor. Be nice, and check out our Code of Conduct.










           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53404978%2fpython-dictionary-key-value-into-dataframe-where-clause-in-pyspark%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown






























          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          Bharath R is a new contributor. Be nice, and check out our Code of Conduct.










           

          draft saved


          draft discarded


















          Bharath R is a new contributor. Be nice, and check out our Code of Conduct.













          Bharath R is a new contributor. Be nice, and check out our Code of Conduct.












          Bharath R is a new contributor. Be nice, and check out our Code of Conduct.















           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53404978%2fpython-dictionary-key-value-into-dataframe-where-clause-in-pyspark%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Berounka

          Sphinx de Gizeh

          Different font size/position of beamer's navigation symbols template's content depending on regular/plain...