Clean column from duplicates being in blocks












1















Question: How can I clean data from "duplicates" within blocks.
I use the term [blocks] to illustrate that 2 values (in same column) are equal and are positioned either above of below.



In column [c1] I have the values [2] and [3].



Value [2] should never have value [2] under.



Value [3] should never have value [3] under.



I cannot use a standard duplication removal function, because there will be duplicates in the column. It is not possible to delete the rows manual since they will be in amount of thousands.



If possible it would be good to solve without loading any R packages.



My R-file:



##########
# Test xts
##########
dates <- as.POSIXct(c
(
"2013-07-24 09:01:00",
"2013-07-24 09:02:00",
"2013-07-24 09:03:00",
"2013-07-24 09:04:00",
"2013-07-24 09:05:00",
"2013-07-24 09:06:00",
"2013-07-24 09:07:00"
)
)
c1 <- c(2,3,2,2,3,3,2) # Data in c1.
# c2 <- c(0,3,2,2,3,0,2) # Data in c2.
data <- data.frame(c1) # Create a dataframe.
xts9 <- xts(x=data, order.by=dates) # Create xts based on dataframe.


The result of running the R-file:



                    c1
2013-07-24 09:01:00 2
2013-07-24 09:02:00 3
2013-07-24 09:03:00 2
2013-07-24 09:04:00 2
2013-07-24 09:05:00 3
2013-07-24 09:06:00 3
2013-07-24 09:07:00 2


Comments of which lines should be deleted:



                    c1
2013-07-24 09:01:00 2
2013-07-24 09:02:00 3
2013-07-24 09:03:00 2
2013-07-24 09:04:00 2 # To be remove due to having a 2 above.
2013-07-24 09:05:00 3
2013-07-24 09:06:00 3 # To be remove due to having a 2 above.
2013-07-24 09:07:00 2









share|improve this question

























  • edited the answer with base R option.

    – Ronak Shah
    Nov 24 '18 at 8:26
















1















Question: How can I clean data from "duplicates" within blocks.
I use the term [blocks] to illustrate that 2 values (in same column) are equal and are positioned either above of below.



In column [c1] I have the values [2] and [3].



Value [2] should never have value [2] under.



Value [3] should never have value [3] under.



I cannot use a standard duplication removal function, because there will be duplicates in the column. It is not possible to delete the rows manual since they will be in amount of thousands.



If possible it would be good to solve without loading any R packages.



My R-file:



##########
# Test xts
##########
dates <- as.POSIXct(c
(
"2013-07-24 09:01:00",
"2013-07-24 09:02:00",
"2013-07-24 09:03:00",
"2013-07-24 09:04:00",
"2013-07-24 09:05:00",
"2013-07-24 09:06:00",
"2013-07-24 09:07:00"
)
)
c1 <- c(2,3,2,2,3,3,2) # Data in c1.
# c2 <- c(0,3,2,2,3,0,2) # Data in c2.
data <- data.frame(c1) # Create a dataframe.
xts9 <- xts(x=data, order.by=dates) # Create xts based on dataframe.


The result of running the R-file:



                    c1
2013-07-24 09:01:00 2
2013-07-24 09:02:00 3
2013-07-24 09:03:00 2
2013-07-24 09:04:00 2
2013-07-24 09:05:00 3
2013-07-24 09:06:00 3
2013-07-24 09:07:00 2


Comments of which lines should be deleted:



                    c1
2013-07-24 09:01:00 2
2013-07-24 09:02:00 3
2013-07-24 09:03:00 2
2013-07-24 09:04:00 2 # To be remove due to having a 2 above.
2013-07-24 09:05:00 3
2013-07-24 09:06:00 3 # To be remove due to having a 2 above.
2013-07-24 09:07:00 2









share|improve this question

























  • edited the answer with base R option.

    – Ronak Shah
    Nov 24 '18 at 8:26














1












1








1








Question: How can I clean data from "duplicates" within blocks.
I use the term [blocks] to illustrate that 2 values (in same column) are equal and are positioned either above of below.



In column [c1] I have the values [2] and [3].



Value [2] should never have value [2] under.



Value [3] should never have value [3] under.



I cannot use a standard duplication removal function, because there will be duplicates in the column. It is not possible to delete the rows manual since they will be in amount of thousands.



If possible it would be good to solve without loading any R packages.



My R-file:



##########
# Test xts
##########
dates <- as.POSIXct(c
(
"2013-07-24 09:01:00",
"2013-07-24 09:02:00",
"2013-07-24 09:03:00",
"2013-07-24 09:04:00",
"2013-07-24 09:05:00",
"2013-07-24 09:06:00",
"2013-07-24 09:07:00"
)
)
c1 <- c(2,3,2,2,3,3,2) # Data in c1.
# c2 <- c(0,3,2,2,3,0,2) # Data in c2.
data <- data.frame(c1) # Create a dataframe.
xts9 <- xts(x=data, order.by=dates) # Create xts based on dataframe.


The result of running the R-file:



                    c1
2013-07-24 09:01:00 2
2013-07-24 09:02:00 3
2013-07-24 09:03:00 2
2013-07-24 09:04:00 2
2013-07-24 09:05:00 3
2013-07-24 09:06:00 3
2013-07-24 09:07:00 2


Comments of which lines should be deleted:



                    c1
2013-07-24 09:01:00 2
2013-07-24 09:02:00 3
2013-07-24 09:03:00 2
2013-07-24 09:04:00 2 # To be remove due to having a 2 above.
2013-07-24 09:05:00 3
2013-07-24 09:06:00 3 # To be remove due to having a 2 above.
2013-07-24 09:07:00 2









share|improve this question
















Question: How can I clean data from "duplicates" within blocks.
I use the term [blocks] to illustrate that 2 values (in same column) are equal and are positioned either above of below.



In column [c1] I have the values [2] and [3].



Value [2] should never have value [2] under.



Value [3] should never have value [3] under.



I cannot use a standard duplication removal function, because there will be duplicates in the column. It is not possible to delete the rows manual since they will be in amount of thousands.



If possible it would be good to solve without loading any R packages.



My R-file:



##########
# Test xts
##########
dates <- as.POSIXct(c
(
"2013-07-24 09:01:00",
"2013-07-24 09:02:00",
"2013-07-24 09:03:00",
"2013-07-24 09:04:00",
"2013-07-24 09:05:00",
"2013-07-24 09:06:00",
"2013-07-24 09:07:00"
)
)
c1 <- c(2,3,2,2,3,3,2) # Data in c1.
# c2 <- c(0,3,2,2,3,0,2) # Data in c2.
data <- data.frame(c1) # Create a dataframe.
xts9 <- xts(x=data, order.by=dates) # Create xts based on dataframe.


The result of running the R-file:



                    c1
2013-07-24 09:01:00 2
2013-07-24 09:02:00 3
2013-07-24 09:03:00 2
2013-07-24 09:04:00 2
2013-07-24 09:05:00 3
2013-07-24 09:06:00 3
2013-07-24 09:07:00 2


Comments of which lines should be deleted:



                    c1
2013-07-24 09:01:00 2
2013-07-24 09:02:00 3
2013-07-24 09:03:00 2
2013-07-24 09:04:00 2 # To be remove due to having a 2 above.
2013-07-24 09:05:00 3
2013-07-24 09:06:00 3 # To be remove due to having a 2 above.
2013-07-24 09:07:00 2






r xts






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 23 '18 at 20:06







Toolbox

















asked Nov 23 '18 at 16:00









ToolboxToolbox

617311




617311













  • edited the answer with base R option.

    – Ronak Shah
    Nov 24 '18 at 8:26



















  • edited the answer with base R option.

    – Ronak Shah
    Nov 24 '18 at 8:26

















edited the answer with base R option.

– Ronak Shah
Nov 24 '18 at 8:26





edited the answer with base R option.

– Ronak Shah
Nov 24 '18 at 8:26












1 Answer
1






active

oldest

votes


















1














We can use rleid function from data.table and then use duplicated to remove the repeating rows.



library(data.table)
xts9[!duplicated(rleid(xts9)), ]

# c1
#2013-07-24 09:01:00 2
#2013-07-24 09:02:00 3
#2013-07-24 09:03:00 2
#2013-07-24 09:05:00 3
#2013-07-24 09:07:00 2




If you want to do this in base R, we can use rle instead using the same logic



x <- rle(rowSums(xts9))
xts9[!duplicated(rep(seq_along(x$values), x$lengths)), ]

# c1
#2013-07-24 09:01:00 2
#2013-07-24 09:02:00 3
#2013-07-24 09:03:00 2
#2013-07-24 09:05:00 3
#2013-07-24 09:07:00 2





share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53449744%2fclean-column-from-duplicates-being-in-blocks%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    We can use rleid function from data.table and then use duplicated to remove the repeating rows.



    library(data.table)
    xts9[!duplicated(rleid(xts9)), ]

    # c1
    #2013-07-24 09:01:00 2
    #2013-07-24 09:02:00 3
    #2013-07-24 09:03:00 2
    #2013-07-24 09:05:00 3
    #2013-07-24 09:07:00 2




    If you want to do this in base R, we can use rle instead using the same logic



    x <- rle(rowSums(xts9))
    xts9[!duplicated(rep(seq_along(x$values), x$lengths)), ]

    # c1
    #2013-07-24 09:01:00 2
    #2013-07-24 09:02:00 3
    #2013-07-24 09:03:00 2
    #2013-07-24 09:05:00 3
    #2013-07-24 09:07:00 2





    share|improve this answer






























      1














      We can use rleid function from data.table and then use duplicated to remove the repeating rows.



      library(data.table)
      xts9[!duplicated(rleid(xts9)), ]

      # c1
      #2013-07-24 09:01:00 2
      #2013-07-24 09:02:00 3
      #2013-07-24 09:03:00 2
      #2013-07-24 09:05:00 3
      #2013-07-24 09:07:00 2




      If you want to do this in base R, we can use rle instead using the same logic



      x <- rle(rowSums(xts9))
      xts9[!duplicated(rep(seq_along(x$values), x$lengths)), ]

      # c1
      #2013-07-24 09:01:00 2
      #2013-07-24 09:02:00 3
      #2013-07-24 09:03:00 2
      #2013-07-24 09:05:00 3
      #2013-07-24 09:07:00 2





      share|improve this answer




























        1












        1








        1







        We can use rleid function from data.table and then use duplicated to remove the repeating rows.



        library(data.table)
        xts9[!duplicated(rleid(xts9)), ]

        # c1
        #2013-07-24 09:01:00 2
        #2013-07-24 09:02:00 3
        #2013-07-24 09:03:00 2
        #2013-07-24 09:05:00 3
        #2013-07-24 09:07:00 2




        If you want to do this in base R, we can use rle instead using the same logic



        x <- rle(rowSums(xts9))
        xts9[!duplicated(rep(seq_along(x$values), x$lengths)), ]

        # c1
        #2013-07-24 09:01:00 2
        #2013-07-24 09:02:00 3
        #2013-07-24 09:03:00 2
        #2013-07-24 09:05:00 3
        #2013-07-24 09:07:00 2





        share|improve this answer















        We can use rleid function from data.table and then use duplicated to remove the repeating rows.



        library(data.table)
        xts9[!duplicated(rleid(xts9)), ]

        # c1
        #2013-07-24 09:01:00 2
        #2013-07-24 09:02:00 3
        #2013-07-24 09:03:00 2
        #2013-07-24 09:05:00 3
        #2013-07-24 09:07:00 2




        If you want to do this in base R, we can use rle instead using the same logic



        x <- rle(rowSums(xts9))
        xts9[!duplicated(rep(seq_along(x$values), x$lengths)), ]

        # c1
        #2013-07-24 09:01:00 2
        #2013-07-24 09:02:00 3
        #2013-07-24 09:03:00 2
        #2013-07-24 09:05:00 3
        #2013-07-24 09:07:00 2






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 24 '18 at 8:26

























        answered Nov 23 '18 at 16:08









        Ronak ShahRonak Shah

        34.9k103856




        34.9k103856






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53449744%2fclean-column-from-duplicates-being-in-blocks%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Berounka

            Different font size/position of beamer's navigation symbols template's content depending on regular/plain...

            Sphinx de Gizeh