Newest 'mapreduce' Questions

0 votes

1 answer

59 views

How to add last24hour filter in a Netsuite script saved search?

I am trying to filter vendors activated/inactivated in last 24 hours. But I tried lot but not getting any results, whereas there are few vndors activated in last 24 hours in Netsuite account. Please ...

Maira S

121

asked Sep 18 at 9:13

0 votes

1 answer

50 views

How to unselect value from multi-select type field in Netsuite map reduce script?

I hope you are doing well! I am trying to unselect specific customer from sepcific item multiselect field in map reduce script. I am setting new customers in a field using setvalue. But my approach is ...

Maira S

121

asked Jul 2 at 5:50

0 votes

1 answer

47 views

How to skip duplicate rows Netsuite map reduce script rescheduling?

I hope you are doing well! I have developed map reduce script to generate and email csv file for dataset results. I am processing data in batch and rescheduling the script. The problem is, there are ...

Maira S

121

asked Jun 23 at 16:08

0 votes

0 answers

36 views

How to reschedule netsuite map reduce script?

I am facing challenge in rescheduling map reduce script and generating the csv file. I would appreciate an advice! challenge: If I reschedule script, I will create empty CSV file in first iteration ...

Maira S

121

asked Jun 20 at 12:56

1 vote

1 answer

101 views

Map reduce script usage limit exceeds in the reduce stage in Netsuite

I hope you are doing well! I have developed map reduce script to send dataset results as a csv file using map reduce script. If there is huge data, script exceeds usgae limit in the reduce stage. I ...

Maira S

121

asked Jun 13 at 6:01

0 votes

0 answers

33 views

Netsuite - I need to process 750K records, and create CSV for the same

I need inventory details as csv, there are total of 750k records and I need it in csv, Saved search is not loading in UI and getInputData() is stucked from past 15hrs. How can I do this? multiple csv ...

Jidnesh Madhavi

1

asked Apr 3 at 11:49

0 votes

0 answers

68 views

InaccessibleObjectException when using Spark with Java 21 and Scala 2.11.12

I'm trying to run Apache Spark using Scala 2.11.12 and Java 21.0.6, but I keep running into an error related to accessing internal fields in the java.util.ArrayList class. Specifically, when I try to ...

aymane RIhane

1

asked Mar 30 at 1:12

0 votes

0 answers

29 views

Map Reduce Program Error for Top-K Structure

I am getting an error in my maven based map-reduce program, such that I am not able to receive anything in my reducer, which has only one instance for the top-k structure. The print statement in ...

Nagatski

1

asked Mar 9 at 16:55

0 votes

1 answer

71 views

How to create multiple CSV files to avoid 10MB file content limit using map reduce script?

I hope you are welll. I am trying to store dataset results in a csv file. but I am getting error as, SSS_FILE_CONTENT_SIZE_EXCEEDED error. So my plan is to create multiple files. But, I need to check ...

Maira S

121

asked Feb 28 at 6:06

0 votes

1 answer

47 views

MapReduce - round-robin scheduling of mappers?

I was going through OSTEP's concurrency-mapreduce project which essentially involves building a toy MapReduce program which runs on a single machine using multiple threads. Towards the end, of the ...

Box Box Box Box

5,388

asked Jan 8 at 5:50

1 vote

1 answer

64 views

how to pass record id and script id from getinputdata to map stage in Netsuite map reduce script?

I am passing search results from getinputdata to map stage using return. Ho can i pass recordid and script id to map? Please help! function getInputData() { try { var scriptObj = ...

Maira S

121

asked Dec 22, 2024 at 16:57

-1 votes

1 answer

54 views

Is the Hadoop documentation wrong for set

The documentation of the Hadoop Job API gives as example: From https://hadoop.apache.org/docs/r3.3.5/api/org/apache/hadoop/mapreduce/Job.html Here is an example on how to submit a job: // ...

user1551605

303

asked Nov 22, 2024 at 10:51

1 vote

0 answers

42 views

Dataproc Hive Job - OutOfMemoryError: Java heap space

I have a dataproc cluster, we are running INSERT OVERWRITE QUERY through HIVE CLI which fails with OutOfMemoryError: Java heap space. We adjusted memory configurations for reducers and Tez tasks, ...

Parmeet Singh

11

asked Nov 11, 2024 at 19:56

0 votes

1 answer

130 views

Hive Always Fails at Mapreduce

I just installed hadoop 3.3.6 and hive 4.0.0 with mysql as metastore. when running create table or select * from... it runs well. But when I try to do insert or select join, hive always fails. I'm ...

Dzaki Wicaksono

1

asked Sep 23, 2024 at 12:09

1 vote

1 answer

38 views

hadoop streaming job hanged at reduce side merge stage

I write a hadoop streaming job, that uses python code to transform the data.But the job occurred some error.when the input file is larger(e.g. 70M bytes), it will hange on the reduce stage.When I ...

Shellong

381

asked Aug 23, 2024 at 2:35

0 votes

1 answer

65 views

Map Reduce Job Failing with OOM [org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster]

I'm providing the comma separated filenames to the FileInputFormat in MapReduce Job. My total size of the data is 30Gb compressed snappy orc files. When my map reduce job is starting immediately ...

Nikhil Lingam

169

asked Aug 5, 2024 at 7:37

0 votes

1 answer

53 views

Hadoop Truncating Strings at 256,512,1024 Characters Arbitrarily

this is my first post so apologies for any confusion. I am attempting to run a DNA sequence analysis through Map Reduce. Here are the important parts of my mapper.sh script: while read line do ...

user26492029

1

asked Jul 23, 2024 at 21:16

0 votes

1 answer

235 views

Error while scanning intermediate done dir - dataproc spark job

Our spark aggregation jobs are taking a lot of execution time to complete. It supposed to complete in 5 mins but taking 30 to 40 minutes to complete. dataproc cluster logging say it's trying to scan ...

Vikrant Singh Rana

4,749

asked Jun 16, 2024 at 2:06

0 votes

0 answers

42 views

On every run jar file using hadoop it is always stuck

On every run jar file using hadoop it is always stuck here in the last line. Here, try the Foil Jar located in the hadoop file itself, but with the same result, it gets stuck in the last line, ...

Noor Khalil

21

asked Jun 15, 2024 at 2:33

0 votes

0 answers

44 views

PySpark with RDD - How to calculate and compare averages?

I need to solve a problem where a company wants to offer k different users free use (a kind of coupon) of their application for two months. The goal is to identify users who are likely to churn (leave ...

Yoel Ha

1

asked Jun 14, 2024 at 11:38

0 votes

0 answers

107 views

AWS Emr Map Reduce job logs are in stderr

I'm running a MR job in EMR and all my logs are in stderr section (when I go into the Job logs from the Resource Manager UI). How can I move them to stdout or syslog ?

Stefan Ss

65

asked May 13, 2024 at 7:45

1 vote

0 answers

56 views

Mongodb Map-Reduce perform multiple aggregations

Let's say that I have a collection with documents of this form: { id: id1, name: foo, value: 64 }, { id: id1, name: bar, value: 37 }, { id: id1, name: bar, value: ...

Julio Sanz Rodríguez

41

asked May 12, 2024 at 11:59

2 votes

0 answers

104 views

How does XGBoost aggregate models being trained in a distributed fashion across n machines?

I am trying to understand how XGBoost distributed training works. The best explanation I've found so far is in this paper: https://ml-pai-learn.oss-cn-beijing.aliyuncs.com/%E6%9C%BA%E5%99%A8%E5%AD%A6%...

Altamash Rafiq

359

asked May 1, 2024 at 21:55

0 votes

1 answer

48 views

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 16, column 46> mismatched input ',' expecting LEFT_PAREN

grunt> joined_data = JOIN filtered_features BY (store, date), sales BY (store, date); 2024-04-02 13:19:05,110 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 16, column 46> ...

Md Arif Khan

59

asked Apr 2, 2024 at 8:01

0 votes

1 answer

56 views

Spark Driver vs MapReduce Driver on YARN

I know in spark you can run the driver program on the client machine if you specify `yarn-client` deployment mode. Or you can run it on a random machine in the cluster if you specify `yarn-cluster` ...

Youssef Alaa Etman

67

asked Mar 28, 2024 at 14:07

1 vote

1 answer

46 views

Hadoop MapReduce WordPairsCount produces inconsistent results

I have a very confusing results when I run MapReduce on Hadoop. Here is the code (see below). As you can see, it is a very simple MapReduce operation. The input is 1 directory with 100 .lineperdoc ...

ztsv-av

109

asked Mar 25, 2024 at 21:37

0 votes

0 answers

45 views

Java lang runtime exception or jar file does not exist error

I am trying to run simple pagerank labtask on my hadoop 3.3.6 installed on ubuntu virtual box but it is giving this error while all my commands are true and my instructor just tole me to download ...

Aminago

1

asked Mar 13, 2024 at 15:28

0 votes

1 answer

72 views

Hadoop is writing to file using context.write() but output file turns out empty

I am running a hadoop code, and having problems. Notice the the commented lines "debug exception 1" and "debug exception 2" and the line below each of them. Since I can't print ...

Max

1

asked Mar 7, 2024 at 7:54

0 votes

1 answer

35 views

Apache Crunch Job On AWS EMR using Oozie

Context: I want to run an apache crunch job on AWS EMR this job is part of a pipeline of oozie java actions and oozie subworkflows (this particular job is part of a subworkflow). In oozie we have a ...

Stefan Ss

65

asked Mar 6, 2024 at 12:44

2 votes

1 answer

21 views

Hadoop MapReducee WordCountLength - Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.IntWritable

I was trying to create a MapReduce application to for WordLengthCount as the below code public class WordLengthCount { public static class TokenizerMapper extends Mapper<Object, Text, ...

Kha Nguyễn Lê Hoàng

21

asked Feb 28, 2024 at 10:25

0 votes

1 answer

77 views

Error: java.io.IOException: wrong value class: class org.apache.hadoop.io.Text is not class org.apache.hadoop.io.FloatWritable

Im running a Hadoop Mapreduce program to calculate the average, maximum and minimum temperature. Temperature is stored in input1.csv file with three columns Date in YYYY-MM-DD format, temperature in ...

Ashok Kumar

59

asked Feb 21, 2024 at 10:13

0 votes

1 answer

373 views

I'm having trouble with a map reduce script

I'm creating my first map reduce script. I'm running an item search in the get input stage that outputs: { "recordType": "assemblyitem", "id": "XXXXX", "...

Jbigger

11

asked Feb 17, 2024 at 0:13

0 votes

0 answers

45 views

No Output for MapReduce Program even after successful job completion on Cloudera VM

Programming Environment and Brief Overview: I am working on one of my Big Data Assignments that involved finding the Strike Rate of Gamers using Hadoop Mapreduce 2.6.0 version. I am supposed to work ...

Kaivalya

1

asked Feb 13, 2024 at 17:56

0 votes

0 answers

48 views

Optimizing Code for Computing Products from Correlation Matrix

I have a Python code that calculates products based on combinations of keys from a correlation matrix. The code works well for when the dataframe have small numbers of columns (e.g., less than 95 ...

Starlord22

159

asked Jan 16, 2024 at 6:55

-1 votes

1 answer

96 views

Hadoop mapreduce code failed with state FAILED due to: NA

I'm trying to run the below Hadoop mapreduce program. public static class MovieFilterMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private Text movieId = new Text(); ...

Veen

161

asked Dec 25, 2023 at 8:36

1 vote

1 answer

37 views

recommendation engine with apache couch-db and nano: filter a view for a specific user

I'm building a recommendation system for a webshop. The shopping history is saved in a couch-db database. I'm creating a view through map-reduce that emits for each user and product, the quantity of ...

Mathieu Rousseau

307

asked Dec 12, 2023 at 9:31

0 votes

1 answer

95 views

YARN job fails due to the connection issue

I've hadoop-3.3.6 setup in the Kubernetes cluster, all the hadoop components are exposed via ClusterIP services, I'm able to telnet to the ports that are exposed from respective pods. But when I run ...

nobso

1

asked Dec 8, 2023 at 21:33

-1 votes

1 answer

16 views

MapReduce error：The main class could not be found or loaded

I use hadoop-3.2.2,A Hadoop cluster has just been configured. When using mapreduce to calculate PI, an error is reported Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last ...

JiaRu Xu

1

asked Dec 4, 2023 at 9:33

0 votes

1 answer

32 views

My hadoop reducer writes the output to the context only if I write the original value to the context

I have this code: @Override protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { Set<String> mySet = new ...

Kinyanjui Karanja

46

asked Nov 23, 2023 at 13:00

1 vote

0 answers

40 views

Spark - Choosing the right number of partitions

I'm using GroupByTest example App as a benchmark for testing my shuffle manager implementation. The problem with using GroupByTest for someone who is not experienced with using Spark on real datasets ...

Brave

329

asked Nov 21, 2023 at 6:58

0 votes

0 answers

43 views

CouchDB Flatten Data in a View

What am I trying to achieve? I am trying to create a CouchDB view that builds up aggregate data from my workouts document (specifically based on a custom date range), by returning a completed result ...

Bernhardt du Toit

1

asked Nov 21, 2023 at 4:17

0 votes

0 answers

105 views

issue running hadoop mapreduce wordcount

I am running Hadoop version 3.2.4 in windows and want to perform a WordCount operation on the file located in hadoop/share/hadoop/mapreduce/share/mapreduce-examples-3.2.4.jar. However, it failed, and ...

ryan

21

asked Nov 10, 2023 at 7:58

0 votes

1 answer

115 views

Trouble Show output using hadoop word count

I'm new to using Hadoop, and I want to execute Hadoop syntax using WordCount to count words. However, why is it that when I try to display the output, it doesn't appear? I would appreciate an ...

ryan

21

asked Nov 7, 2023 at 17:16

0 votes

2 answers

73 views

pyspark RDD count nodes in a DAG

I have RDD which shows up as ["2\t{'3': 1}", "3\t{'2': 2}", "4\t{'1': 1, '2': 1}", "5\t{'4': 3, '2': 1, '6': 1}", "6\t{'2': 1, '5': 2}", "7\...

Ram

870

asked Nov 4, 2023 at 4:41

1 vote

0 answers

42 views

Hive HQL - optimizing WINDOW function

I have following HQL excuted by MR engine where the source table has almost 800 million records select concat(upp_sys_id,'#',min(bhv_tm) over ssn,'#',ssn_seq_all) as ssn_id ,evt_drt ,row_number() over ...

yy zhao

11

asked Oct 20, 2023 at 15:01

1 vote

0 answers

39 views

Hadoop mapreduce doesn't use copyied file

Hadoop version: 2.10.2 JDK version: 1.8.0_291 I'm trying to start map_reduce using python. I've configured hadoop on new hduser_. After running this command in terminal: hadoop jar $HADOOP_HOME/share/...

Dorialean

99

asked Oct 18, 2023 at 14:49

0 votes

1 answer

79 views

NoClassDefFoundError: org/apache/hadoop/yarn/util/Clock

I have some errors when run WordCount command: 2023-10-06 15:55:35,005 INFO mapreduce.Job: Job job_1696606856991_0001 running in uber mode: false 2023-10-06 15:55:35,006 INFO mapreduce.Job: map 0% ...

Vũ Phan Bảo Anh

1

asked Oct 17, 2023 at 15:34

0 votes

0 answers

15 views

Mapreduce MongoDB

I am new to MongoDB / mapreduce. I am trying to create a map function to show the horses who are female AND have been to less than 50 shows. var mapHorse1 = function(){ this.gender;if(this.gender='f'),...

kaylo

1

asked Oct 10, 2023 at 9:05

1 vote

0 answers

104 views

MapReduce Frameworks That Call Reduce Once vs. 0...N Times

The glued together word "MapReduce" is supposed to cover a generic concept (distinct from functional programming map/reduce), originating from a conceptual paper from Google. It has an ...

ae1020

19

asked Oct 6, 2023 at 1:19

0 votes

1 answer

19 views

Miss join da in left join when data quantity increase in Hive

In this hive sql, when the quantity of data in table1 is big, t2.c will lost but it should be joined, how to exlpain this in the level of mapreduce? SELECT t1.a, t1.b, t2.c FROM table1 t1 LEFT ...

DaSH Tai

1

asked Sep 25, 2023 at 9:53

Collectives™ on Stack Overflow