Skip to main content
Filter by
Sorted by
Tagged with
1 vote
1 answer
102 views

I have several ndjson files that are nearly 800GB. They come from parsing the Wikipedia dump. I would like to remove duplicate HTML. As such, I group by "html" and pick the JSON with the ...
Akira's user avatar
  • 2,886
-3 votes
1 answer
120 views

I am doing some tests comparing DuckDB usage among different languages etc, and I've noticed something strange. In python you can do the following: duckdb.read_csv(inputFile, max_line_size=10000000, ...
Tijmen's user avatar
  • 29
1 vote
1 answer
131 views

I have several ndjson files that are nearly 800GB. They come from parsing the Wikipedia dump. I would like to remove duplicates html. As such, I group by "html" and pick the json with the ...
Akira's user avatar
  • 2,886
Best practices
1 vote
0 replies
42 views

To meet certain data analysis requirements, I am migrating from a self-hosted local MySQL database to PolarDB. During the migration, I discovered that many data analysis tools offer a technique called ...
梁宇坤's user avatar
1 vote
0 answers
68 views

I'm using the DuckDB CLI, version 1.4.2 on macOS. The plan is to use duckdb as a part of a CLI pipeline, not from inside of a Python script, etc. I've already got the tooling to build out the script ...
Morris de Oryx's user avatar
Advice
0 votes
3 replies
38 views

I have a table export like this: id file_name json_content 1 out_1.json {...} 2 out_2.json {...} Now I want to do a COPY (SELECT json_content FROM export) TO file_name for each row. At first I thought ...
Sascha's user avatar
  • 10.4k
-1 votes
0 answers
94 views

Using DataGrid (Data Grid – Shiny for Python), how can LaTex/Markdown notation, which strings are stored in the cells of a DuckDB database file, be displayed in the Shiny webapp as human readible ...
user31952284's user avatar
0 votes
1 answer
77 views

Unable to use DuckDB in an incremental lightweight xform. The docs read to access the duckdb object from the context, but it fails to do so. from transforms.api import transform, incremental, Input, ...
Stefano Munarini's user avatar
0 votes
0 answers
32 views

I'm trying to create output using gtsummary::tbl_summary, which I've done many times before like this: library(dplyr) library(gtsummary) iris |> tbl_summary() I want the species variable to ...
jrdusen's user avatar
  • 11
0 votes
1 answer
66 views

Edit: Since the question is off-topic here, I marked it for closing/migration to SuperUser. It was not migrated so far, so I recreated this question and answer at SuperUser. I cannot get non-ASCII ...
miroxlav's user avatar
  • 12.4k
1 vote
1 answer
57 views

I am trying to insert a list of values to a column INTEGER[]. For that I am creating list value as I need to use appender. But duckdb_create_list_value is always returning NULL. I am using v 1.4.1 C ...
rg665n's user avatar
  • 235
1 vote
0 answers
111 views

I'm on DuckDB 1.4.1 experiencing difficulty updating a Postgres 17.6 ENUM field status: CREATE TYPE mystatus_enum AS ENUM ( 'IN_STOCK', 'OUT_OF_STOCK', 'NOT_FOUND', 'NOT_A_PRODUCT' ); CREATE ...
RuiDC's user avatar
  • 9,193
1 vote
1 answer
165 views

I'm working with DuckDB and have several client-provided SQL expressions that use DECIMAL(38,10) columns (fixed precision with 10 digits after the decimal point). For example: SELECT S1__AMOUNT * ...
Igor Atsberger's user avatar
0 votes
1 answer
113 views

I don't know how to check or increase the memory limitation of duckdb wasm. I'm using chrome and I import some parquet into the browser, one of them has 234Mb of data I did my research and the limit ...
Antho's user avatar
  • 1
3 votes
1 answer
112 views

I'm working with dbplyr and DuckDB to process very large Parquet files using limited system resources. To make my workflow more efficient, I want to create a custom function that can be seamlessly ...
A.N. O'Nyme's user avatar
0 votes
0 answers
93 views

I'm trying to use the rfishbase package in R on Windows: library(rfishbase) fishNAMES <- rfishbase::load_taxa() But I get the following error: Error in (function (cond) : erro na avaliação do ...
Valentim's user avatar
2 votes
1 answer
141 views

Running through python - no tables needed. See below query and result: import duckdb sampling_period_sec = 13 date_range = ('2023-01-01', '2023-01-02') db_conn = duckdb.connect() db_conn.query( ...
AOK's user avatar
  • 573
0 votes
1 answer
113 views

I need to append candlestick bars to Duckdb that have a timestamp data type. Since I don't know how I have used varchar for time instead. How do I do it properly so that I can query based on timestamp?...
lele1c's user avatar
  • 72
1 vote
1 answer
219 views

I have a dataset (originally large CSV) that I filtered using duckdb and dbplyr. This is a small script that get to my idea : library(duckdb) library(DBI) library(dplyr) library(dbplyr) ...
M. Beausoleil's user avatar
2 votes
1 answer
241 views

I have a large CSV that was generated from GBIF (So modifying the raw csv is not what I'm looking for). Within the CSV, there are lines where there are double double quotes (e.g., "Henry "&...
M. Beausoleil's user avatar
3 votes
3 answers
152 views

I want to change the data type of a column of a table in a DuckDB Database. With query2_c= ALTER TABLE populationShort ALTER Year SET DATA TYPE DATE; (C Language Binding) I get Segmentation fault (...
Raphael10's user avatar
  • 3,246
0 votes
0 answers
38 views

I have a server that loads data from a CSV into DuckDB. I'm using 1.3.2 duckdb https://github.com/duckdb/duckdb-node A Node cron job refreshes this data every x minutes by reloading the CSV, dropping ...
Drew Scatterday's user avatar
1 vote
1 answer
99 views

I want to pass the path to a CSV file to the DuckDB SQL Statement with C-Language API: In this way: std::cout << "fileName= " << fileName << std::endl; // Output: fileName= ...
Raphael10's user avatar
  • 3,246
2 votes
1 answer
72 views

I'm trying to connect to GBIF's occurence data https://aws.amazon.com/marketplace/pp/prodview-dvyemtksskta2. I wondering why is it not finding the data. It should be accessible without credentials ...
M. Beausoleil's user avatar
4 votes
1 answer
138 views

I am working with inserting and manipulating data in a DuckDB database using Python. The incoming data is staged in several temporary tables, before it is moved to permanent tables (largely for ...
maylinnp's user avatar
2 votes
0 answers
54 views

I'm working on a Parcel frontend project using DuckDB, and it runs well in the Production environment. However, in my local machine the browser displays this error. And as I've just mentioned, this ...
Cool Weather Here's user avatar
0 votes
1 answer
86 views

Is there a way to perform ASOF join in Snowflake / DuckDB / CedarDB based on nearest time in the row in the right table whose 'on' key is closest in absolute distance to the left table. Currently I ...
Saqib Ali's user avatar
  • 4,551
0 votes
1 answer
203 views

I am using Duckdb with the VSS extension to store document embeddings, it works normally however when I try to do a similarity search to a vector passed from Python, I am getting a TypeError. The ...
fccoelho's user avatar
  • 6,262
2 votes
1 answer
222 views

I'm working on a TypeScript project and need to save a large list of records in Parquet format using SNAPPY compression with an explicit schema. I've successfully written data using both parquetjs and ...
Vince M's user avatar
  • 1,288
0 votes
1 answer
70 views

I'm working with an omics dataset (1000+ files) which is a folder of about ~1GB of .txt.gz files which are tab separated. They each look roughly like this for a patient ABC: pos ABC_count1 ABC_count2 ...
AnthonyML's user avatar
3 votes
1 answer
59 views

I consider jOOQ over DuckDB over Parquet files. The type of parquet columns are not know before hand. Let's consider some column may be an integer or a double. I want to SUM over given column, ...
blacelle's user avatar
  • 2,239
0 votes
0 answers
67 views

Would be useful to know which columns use the most disk space in a table. As in, which columns are taking up much storage but are rarely used and could be a candidate for elimination. Heuristics don't ...
0xZ3RR0's user avatar
  • 644
3 votes
2 answers
474 views

How do I upsert data into Ducklake? If I have a simple table definition: CREATE TABLE ducklakeexample.demo ( "Date" TIMESTAMP WITH TIME ZONE, "Id" UUID, "Title" ...
dimButTries's user avatar
0 votes
1 answer
118 views

I'm aggregating some numbers with SUM, AVG, MAX and the former 2 automatically display the results with thousand comma separators when cast to Integer, but MAX does not. I'm able to format numbers to ...
Byofuel's user avatar
  • 448
1 vote
2 answers
422 views

The DuckDB documentation states that you can create a table starting from a JSON file, with DuckDB automatically constructing the table schema, for instance: CREATE TABLE todos AS SELECT * FROM '...
robertspierre's user avatar
3 votes
1 answer
67 views

Why DuckDB tells me there is no column named that way when I try to drop a column? D DESCRIBE oa_pub; ┌──────────────┬─────────────┬─────────┬─────────┬──────────────────────────┬─────────┐ │ ...
robertspierre's user avatar
2 votes
2 answers
110 views

I am trying to translate some PostgreSQL to DuckDb but the latter's docs are really lacking, and the ability to pass an array or list of doubles is the problem. This works fine in Postgres: SELECT ...
Byofuel's user avatar
  • 448
0 votes
0 answers
23 views

Hye, I attached S3tables to my local DuckDB. And it works fine func OpenDuckDB(ctx context.Context, logger *zerolog.Logger) (*sql.DB, *duckdb.Connector, error) { os.Remove("duckdb.db") ...
Muhammad Najid's user avatar
0 votes
0 answers
52 views

Im trying to test out whether or not DuckDB able to synchronize to s3Tables for long running process. I attached to s3Tables as per doc and it works fine. query = fmt.Sprintf(` ATTACH '%s' AS ...
Muhammad Najid's user avatar
1 vote
0 answers
195 views

I'm using DuckDB to process data stored in Parquet files, organized in a Hive-style directory structure partitioned by year, month, day, and hour. Each Parquet file contains around 150 columns, and I ...
Deepank Dhillon's user avatar
1 vote
0 answers
143 views

I woud like to access S3 resources by httpfs on DuckDB shell. I installed httpfs extension. But i always meet that error : " Invalid Input Error: Secret type 's3' does not exist, but it exists in ...
Minh Duc Ha's user avatar
0 votes
0 answers
68 views

I was reading DuckDB lightweight compression article. To find out the compression method used by DuckDB I would run: SELECT * EXCLUDE (column_path, segment_id, start, stats, persistent, block_id, ...
robertspierre's user avatar
1 vote
0 answers
198 views

-- Creates a CTE that will join the necessary -- values from the dimension tables to the fact -- table WITH MergedCDI AS ( SELECT c.LogID, c.DataValueUnit, c.DataValue, c.YearStart, ...
Mig Rivera Cueva's user avatar
0 votes
1 answer
106 views

I'm trying to create an amalgamation for duckdb with sqlite_scanner as a static extension. I'm a complete novice to building duckdb. I know that I can build the amalgamation with a command similar to ...
thechao's user avatar
  • 387
1 vote
1 answer
79 views

Example: In [1]: import duckdb In [2]: import narwhals as nw In [3]: duckdb.sql("""set timezone = 'Europe/Amsterdam'""") In [5]: duckdb.sql("""from rel ...
ignoring_gravity's user avatar
1 vote
1 answer
275 views

I want to implement SCD-Type2, and keep track of historized data, I am using for this task DuckDB, but I found out that DuckDB does not support Merge Statement. The idea I have is to have two separate ...
MuGh's user avatar
  • 165
0 votes
0 answers
71 views

Environment: Python 3.9.21 DuckDB 1.1.3 pyarrow 18.1.0 deltalake 18.1.0 Behavior explanation: add and udpate string fields in a struct inside a list under the root of the table works fine. update ...
Ahmed Kamal ELSaman's user avatar
3 votes
1 answer
214 views

I'm unable to install the h3 community extension and I couldn't figure out the reason. See my code and error message below. Any idea what's causing the problem? library(duckdb) con <- duckdb::...
rafa.pereira's user avatar
1 vote
0 answers
241 views

I have a dummy database file named hr.sql that I want to import into DuckDB. I'm trying to use DuckDB from the command-line interface (CLI), and my goal is to load and use this database similar to how ...
KhushbooQ's user avatar
2 votes
1 answer
165 views

I have loaded two fact tables CDI and Population and a couple dimension tables in DuckDB. I did joins on the CDI fact table and its respective dimension tables which yields a snippet of the table ...
Mig Rivera Cueva's user avatar

1
2 3 4 5
11