distributed data parallel coding in spark
up vote
-1
down vote
favorite
Hi I have a master slave data parallel problem. On the master I create N decision trees and have a large data set that is too big for a single machine. I would like to copy the N trees and divide the data set between the slaves (map). Then apply the N trees to the data on each of the slaves. This would return a value which would be sent back to the master machine (reduced). The decision trees are created in a custom way so I can't use MLIB.
Could someone please point me to some literature on how this could be done. Thanks.
apache-spark pyspark apache-spark-mllib
add a comment |
up vote
-1
down vote
favorite
Hi I have a master slave data parallel problem. On the master I create N decision trees and have a large data set that is too big for a single machine. I would like to copy the N trees and divide the data set between the slaves (map). Then apply the N trees to the data on each of the slaves. This would return a value which would be sent back to the master machine (reduced). The decision trees are created in a custom way so I can't use MLIB.
Could someone please point me to some literature on how this could be done. Thanks.
apache-spark pyspark apache-spark-mllib
add a comment |
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
Hi I have a master slave data parallel problem. On the master I create N decision trees and have a large data set that is too big for a single machine. I would like to copy the N trees and divide the data set between the slaves (map). Then apply the N trees to the data on each of the slaves. This would return a value which would be sent back to the master machine (reduced). The decision trees are created in a custom way so I can't use MLIB.
Could someone please point me to some literature on how this could be done. Thanks.
apache-spark pyspark apache-spark-mllib
Hi I have a master slave data parallel problem. On the master I create N decision trees and have a large data set that is too big for a single machine. I would like to copy the N trees and divide the data set between the slaves (map). Then apply the N trees to the data on each of the slaves. This would return a value which would be sent back to the master machine (reduced). The decision trees are created in a custom way so I can't use MLIB.
Could someone please point me to some literature on how this could be done. Thanks.
apache-spark pyspark apache-spark-mllib
apache-spark pyspark apache-spark-mllib
edited 3 hours ago
Ram Ghadiyaram
15.4k53972
15.4k53972
asked 3 hours ago
Moses S
394
394
add a comment |
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53400793%2fdistributed-data-parallel-coding-in-spark%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown