Skip to main content
Filter by
Sorted by
Tagged with
0 votes
1 answer
59 views

I am trying to filter vendors activated/inactivated in last 24 hours. But I tried lot but not getting any results, whereas there are few vndors activated in last 24 hours in Netsuite account. Please ...
Maira S's user avatar
  • 121
0 votes
1 answer
50 views

I hope you are doing well! I am trying to unselect specific customer from sepcific item multiselect field in map reduce script. I am setting new customers in a field using setvalue. But my approach is ...
Maira S's user avatar
  • 121
0 votes
1 answer
47 views

I hope you are doing well! I have developed map reduce script to generate and email csv file for dataset results. I am processing data in batch and rescheduling the script. The problem is, there are ...
Maira S's user avatar
  • 121
0 votes
0 answers
36 views

I am facing challenge in rescheduling map reduce script and generating the csv file. I would appreciate an advice! challenge: If I reschedule script, I will create empty CSV file in first iteration ...
Maira S's user avatar
  • 121
1 vote
1 answer
101 views

I hope you are doing well! I have developed map reduce script to send dataset results as a csv file using map reduce script. If there is huge data, script exceeds usgae limit in the reduce stage. I ...
Maira S's user avatar
  • 121
0 votes
0 answers
33 views

I need inventory details as csv, there are total of 750k records and I need it in csv, Saved search is not loading in UI and getInputData() is stucked from past 15hrs. How can I do this? multiple csv ...
Jidnesh Madhavi's user avatar
0 votes
0 answers
68 views

I'm trying to run Apache Spark using Scala 2.11.12 and Java 21.0.6, but I keep running into an error related to accessing internal fields in the java.util.ArrayList class. Specifically, when I try to ...
aymane RIhane's user avatar
0 votes
0 answers
29 views

I am getting an error in my maven based map-reduce program, such that I am not able to receive anything in my reducer, which has only one instance for the top-k structure. The print statement in ...
Nagatski's user avatar
0 votes
1 answer
71 views

I hope you are welll. I am trying to store dataset results in a csv file. but I am getting error as, SSS_FILE_CONTENT_SIZE_EXCEEDED error. So my plan is to create multiple files. But, I need to check ...
Maira S's user avatar
  • 121
0 votes
1 answer
47 views

I was going through OSTEP's concurrency-mapreduce project which essentially involves building a toy MapReduce program which runs on a single machine using multiple threads. Towards the end, of the ...
Box Box Box Box's user avatar
1 vote
1 answer
64 views

I am passing search results from getinputdata to map stage using return. Ho can i pass recordid and script id to map? Please help! function getInputData() { try { var scriptObj = ...
Maira S's user avatar
  • 121
-1 votes
1 answer
54 views

The documentation of the Hadoop Job API gives as example: From https://hadoop.apache.org/docs/r3.3.5/api/org/apache/hadoop/mapreduce/Job.html Here is an example on how to submit a job: // ...
user1551605's user avatar
1 vote
0 answers
42 views

I have a dataproc cluster, we are running INSERT OVERWRITE QUERY through HIVE CLI which fails with OutOfMemoryError: Java heap space. We adjusted memory configurations for reducers and Tez tasks, ...
Parmeet Singh's user avatar
0 votes
1 answer
130 views

I just installed hadoop 3.3.6 and hive 4.0.0 with mysql as metastore. when running create table or select * from... it runs well. But when I try to do insert or select join, hive always fails. I'm ...
Dzaki Wicaksono's user avatar
1 vote
1 answer
38 views

I write a hadoop streaming job, that uses python code to transform the data.But the job occurred some error.when the input file is larger(e.g. 70M bytes), it will hange on the reduce stage.When I ...
Shellong's user avatar
  • 381
0 votes
1 answer
65 views

I'm providing the comma separated filenames to the FileInputFormat in MapReduce Job. My total size of the data is 30Gb compressed snappy orc files. When my map reduce job is starting immediately ...
Nikhil Lingam's user avatar
0 votes
1 answer
53 views

this is my first post so apologies for any confusion. I am attempting to run a DNA sequence analysis through Map Reduce. Here are the important parts of my mapper.sh script: while read line do ...
user26492029's user avatar
0 votes
1 answer
235 views

Our spark aggregation jobs are taking a lot of execution time to complete. It supposed to complete in 5 mins but taking 30 to 40 minutes to complete. dataproc cluster logging say it's trying to scan ...
Vikrant Singh Rana's user avatar
0 votes
0 answers
42 views

On every run jar file using hadoop it is always stuck here in the last line. Here, try the Foil Jar located in the hadoop file itself, but with the same result, it gets stuck in the last line, ...
Noor Khalil's user avatar
0 votes
0 answers
44 views

I need to solve a problem where a company wants to offer k different users free use (a kind of coupon) of their application for two months. The goal is to identify users who are likely to churn (leave ...
Yoel Ha's user avatar
0 votes
0 answers
107 views

I'm running a MR job in EMR and all my logs are in stderr section (when I go into the Job logs from the Resource Manager UI). How can I move them to stdout or syslog ?
Stefan Ss's user avatar
1 vote
0 answers
56 views

Let's say that I have a collection with documents of this form: { id: id1, name: foo, value: 64 }, { id: id1, name: bar, value: 37 }, { id: id1, name: bar, value: ...
Julio Sanz Rodríguez's user avatar
2 votes
0 answers
104 views

I am trying to understand how XGBoost distributed training works. The best explanation I've found so far is in this paper: https://ml-pai-learn.oss-cn-beijing.aliyuncs.com/%E6%9C%BA%E5%99%A8%E5%AD%A6%...
Altamash Rafiq's user avatar
0 votes
1 answer
48 views

grunt> joined_data = JOIN filtered_features BY (store, date), sales BY (store, date); 2024-04-02 13:19:05,110 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 16, column 46> ...
Md Arif Khan's user avatar
0 votes
1 answer
56 views

I know in spark you can run the driver program on the client machine if you specify `yarn-client` deployment mode. Or you can run it on a random machine in the cluster if you specify `yarn-cluster` ...
Youssef Alaa Etman's user avatar
1 vote
1 answer
46 views

I have a very confusing results when I run MapReduce on Hadoop. Here is the code (see below). As you can see, it is a very simple MapReduce operation. The input is 1 directory with 100 .lineperdoc ...
ztsv-av's user avatar
  • 109
0 votes
0 answers
45 views

I am trying to run simple pagerank labtask on my hadoop 3.3.6 installed on ubuntu virtual box but it is giving this error while all my commands are true and my instructor just tole me to download ...
Aminago's user avatar
0 votes
1 answer
72 views

I am running a hadoop code, and having problems. Notice the the commented lines "debug exception 1" and "debug exception 2" and the line below each of them. Since I can't print ...
Max's user avatar
  • 1
0 votes
1 answer
35 views

Context: I want to run an apache crunch job on AWS EMR this job is part of a pipeline of oozie java actions and oozie subworkflows (this particular job is part of a subworkflow). In oozie we have a ...
Stefan Ss's user avatar
2 votes
1 answer
21 views

I was trying to create a MapReduce application to for WordLengthCount as the below code public class WordLengthCount { public static class TokenizerMapper extends Mapper<Object, Text, ...
Kha Nguyễn Lê Hoàng's user avatar
0 votes
1 answer
77 views

Im running a Hadoop Mapreduce program to calculate the average, maximum and minimum temperature. Temperature is stored in input1.csv file with three columns Date in YYYY-MM-DD format, temperature in ...
Ashok Kumar's user avatar
0 votes
1 answer
373 views

I'm creating my first map reduce script. I'm running an item search in the get input stage that outputs: { "recordType": "assemblyitem", "id": "XXXXX", "...
Jbigger's user avatar
  • 11
0 votes
0 answers
45 views

Programming Environment and Brief Overview: I am working on one of my Big Data Assignments that involved finding the Strike Rate of Gamers using Hadoop Mapreduce 2.6.0 version. I am supposed to work ...
Kaivalya's user avatar
0 votes
0 answers
48 views

I have a Python code that calculates products based on combinations of keys from a correlation matrix. The code works well for when the dataframe have small numbers of columns (e.g., less than 95 ...
Starlord22's user avatar
-1 votes
1 answer
96 views

I'm trying to run the below Hadoop mapreduce program. public static class MovieFilterMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private Text movieId = new Text(); ...
Veen's user avatar
  • 161
1 vote
1 answer
37 views

I'm building a recommendation system for a webshop. The shopping history is saved in a couch-db database. I'm creating a view through map-reduce that emits for each user and product, the quantity of ...
Mathieu Rousseau's user avatar
0 votes
1 answer
95 views

I've hadoop-3.3.6 setup in the Kubernetes cluster, all the hadoop components are exposed via ClusterIP services, I'm able to telnet to the ports that are exposed from respective pods. But when I run ...
nobso's user avatar
  • 1
-1 votes
1 answer
16 views

I use hadoop-3.2.2,A Hadoop cluster has just been configured. When using mapreduce to calculate PI, an error is reported Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last ...
JiaRu Xu's user avatar
0 votes
1 answer
32 views

I have this code: @Override protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { Set<String> mySet = new ...
Kinyanjui Karanja's user avatar
1 vote
0 answers
40 views

I'm using GroupByTest example App as a benchmark for testing my shuffle manager implementation. The problem with using GroupByTest for someone who is not experienced with using Spark on real datasets ...
Brave's user avatar
  • 329
0 votes
0 answers
43 views

What am I trying to achieve? I am trying to create a CouchDB view that builds up aggregate data from my workouts document (specifically based on a custom date range), by returning a completed result ...
Bernhardt du Toit's user avatar
0 votes
0 answers
105 views

I am running Hadoop version 3.2.4 in windows and want to perform a WordCount operation on the file located in hadoop/share/hadoop/mapreduce/share/mapreduce-examples-3.2.4.jar. However, it failed, and ...
ryan's user avatar
  • 21
0 votes
1 answer
115 views

I'm new to using Hadoop, and I want to execute Hadoop syntax using WordCount to count words. However, why is it that when I try to display the output, it doesn't appear? I would appreciate an ...
ryan's user avatar
  • 21
0 votes
2 answers
73 views

I have RDD which shows up as ["2\t{'3': 1}", "3\t{'2': 2}", "4\t{'1': 1, '2': 1}", "5\t{'4': 3, '2': 1, '6': 1}", "6\t{'2': 1, '5': 2}", "7\...
Ram's user avatar
  • 870
1 vote
0 answers
42 views

I have following HQL excuted by MR engine where the source table has almost 800 million records select concat(upp_sys_id,'#',min(bhv_tm) over ssn,'#',ssn_seq_all) as ssn_id ,evt_drt ,row_number() over ...
yy zhao's user avatar
  • 11
1 vote
0 answers
39 views

Hadoop version: 2.10.2 JDK version: 1.8.0_291 I'm trying to start map_reduce using python. I've configured hadoop on new hduser_. After running this command in terminal: hadoop jar $HADOOP_HOME/share/...
Dorialean's user avatar
0 votes
1 answer
79 views

I have some errors when run WordCount command: 2023-10-06 15:55:35,005 INFO mapreduce.Job: Job job_1696606856991_0001 running in uber mode: false 2023-10-06 15:55:35,006 INFO mapreduce.Job: map 0% ...
Vũ Phan Bảo Anh's user avatar
0 votes
0 answers
15 views

I am new to MongoDB / mapreduce. I am trying to create a map function to show the horses who are female AND have been to less than 50 shows. var mapHorse1 = function(){ this.gender;if(this.gender='f'),...
kaylo's user avatar
  • 1
1 vote
0 answers
104 views

The glued together word "MapReduce" is supposed to cover a generic concept (distinct from functional programming map/reduce), originating from a conceptual paper from Google. It has an ...
ae1020's user avatar
  • 19
0 votes
1 answer
19 views

In this hive sql, when the quantity of data in table1 is big, t2.c will lost but it should be joined, how to exlpain this in the level of mapreduce? SELECT t1.a, t1.b, t2.c FROM table1 t1 LEFT ...
DaSH Tai's user avatar

1
2 3 4 5
243