505 questions
1
vote
1
answer
102
views
DuckDB: how to fine tune parameters?
I have several ndjson files that are nearly 800GB. They come from parsing the Wikipedia dump. I would like to remove duplicate HTML. As such, I group by "html" and pick the JSON with the ...
-3
votes
1
answer
120
views
Is there a way to directly convert CSV to Parquet with DuckDB in Java?
I am doing some tests comparing DuckDB usage among different languages etc, and I've noticed something strange.
In python you can do the following:
duckdb.read_csv(inputFile, max_line_size=10000000, ...
1
vote
1
answer
131
views
DuckDB: out-of-memory problem of groupby-max
I have several ndjson files that are nearly 800GB. They come from parsing the Wikipedia dump. I would like to remove duplicates html. As such, I group by "html" and pick the json with the ...
Best practices
1
vote
0
replies
42
views
Dictionary Encoding VS ENUM type
To meet certain data analysis requirements, I am migrating from a self-hosted local MySQL database to PolarDB. During the migration, I discovered that many data analysis tools offer a technique called ...
1
vote
0
answers
68
views
How do I get DuckDB CLI to successfully log errors?
I'm using the DuckDB CLI, version 1.4.2 on macOS.
The plan is to use duckdb as a part of a CLI pipeline, not from inside of a Python script, etc.
I've already got the tooling to build out the script ...
Advice
0
votes
3
replies
38
views
Issue statement per row
I have a table export like this:
id
file_name
json_content
1
out_1.json
{...}
2
out_2.json
{...}
Now I want to do a COPY (SELECT json_content FROM export) TO file_name for each row.
At first I thought ...
-1
votes
0
answers
94
views
Markdown notation stored in DuckDB, displayed in the "Shiny" WebApp as mathematical expression
Using DataGrid (Data Grid – Shiny for Python), how can LaTex/Markdown notation, which strings are stored in the cells of a DuckDB database file, be displayed in the Shiny webapp as human readible ...
0
votes
1
answer
77
views
DuckDB in lightweight incremental Foundry transform
Unable to use DuckDB in an incremental lightweight xform. The docs read to access the duckdb object from the context, but it fails to do so.
from transforms.api import transform, incremental, Input, ...
0
votes
0
answers
32
views
tbl_summary using duckplyr verbs leads to incorrectly sorted outputs
I'm trying to create output using gtsummary::tbl_summary, which I've done many times before like this:
library(dplyr)
library(gtsummary)
iris |>
tbl_summary()
I want the species variable to ...
0
votes
1
answer
66
views
In DuckDB, can there be proper UTF-8 output in duckbox mode to Windows console? [closed]
Edit:
Since the question is off-topic here, I marked it for closing/migration to SuperUser. It was not migrated so far, so I recreated this question and answer at SuperUser.
I cannot get non-ASCII ...
1
vote
1
answer
57
views
C API duckdb_create_list_value is always returning NULL [closed]
I am trying to insert a list of values to a column INTEGER[]. For that I am creating list value as I need to use appender. But duckdb_create_list_value is always returning NULL. I am using v 1.4.1 C ...
1
vote
0
answers
111
views
Problem updating Postgres ENUM from DuckDB
I'm on DuckDB 1.4.1 experiencing difficulty updating a Postgres 17.6 ENUM field status:
CREATE TYPE mystatus_enum AS ENUM (
'IN_STOCK', 'OUT_OF_STOCK', 'NOT_FOUND', 'NOT_A_PRODUCT'
);
CREATE ...
1
vote
1
answer
165
views
Avoiding duckdb OutOfRangeException when multiplying Decimals
I'm working with DuckDB and have several client-provided SQL expressions that use DECIMAL(38,10) columns (fixed precision with 10 digits after the decimal point).
For example:
SELECT S1__AMOUNT * ...
0
votes
1
answer
113
views
Duckdb Wasm limitation
I don't know how to check or increase the memory limitation of duckdb wasm.
I'm using chrome and I import some parquet into the browser, one of them has 234Mb of data
I did my research and the limit ...
3
votes
1
answer
112
views
Creating custom dbplyr compatible function in SQL
I'm working with dbplyr and DuckDB to process very large Parquet files using limited system resources. To make my workflow more efficient, I want to create a custom function that can be seamlessly ...
0
votes
0
answers
93
views
rfishbase::load_taxa() fails with DuckDB spatial extension HTTP 403 on Windows
I'm trying to use the rfishbase package in R on Windows:
library(rfishbase)
fishNAMES <- rfishbase::load_taxa()
But I get the following error:
Error in (function (cond) :
erro na avaliação do ...
2
votes
1
answer
141
views
DuckDB query that works with time intervals produces incorrect values
Running through python - no tables needed. See below query and result:
import duckdb
sampling_period_sec = 13
date_range = ('2023-01-01', '2023-01-02')
db_conn = duckdb.connect()
db_conn.query(
...
0
votes
1
answer
113
views
How do I append a timestamp data type with zuckdb/duckdb appender in zig
I need to append candlestick bars to Duckdb that have a timestamp data type. Since I don't know how I have used varchar for time instead. How do I do it properly so that I can query based on timestamp?...
1
vote
1
answer
219
views
How to export a tbl_duckdb_connection object to CSV from duckdb without collect()?
I have a dataset (originally large CSV) that I filtered using duckdb and dbplyr.
This is a small script that get to my idea :
library(duckdb)
library(DBI)
library(dplyr)
library(dbplyr)
...
2
votes
1
answer
241
views
`duckdb_read_csv` not working when there are double quotes ""example"" in CSV
I have a large CSV that was generated from GBIF (So modifying the raw csv is not what I'm looking for). Within the CSV, there are lines where there are double double quotes (e.g., "Henry "&...
3
votes
3
answers
152
views
How to alter the datatype of a Column?
I want to change the data type of a column of a table in a DuckDB Database.
With
query2_c= ALTER TABLE populationShort ALTER Year SET DATA TYPE DATE;
(C Language Binding) I get Segmentation fault (...
0
votes
0
answers
38
views
Handling internal C++ errors in duckdb node js
I have a server that loads data from a CSV into DuckDB.
I'm using 1.3.2 duckdb https://github.com/duckdb/duckdb-node
A Node cron job refreshes this data every x minutes by reloading the CSV, dropping ...
1
vote
1
answer
99
views
How to pass the path to a CSV file to the DuckDB SQL Statement with C-Language API?
I want to pass the path to a CSV file to the DuckDB SQL Statement with C-Language API:
In this way:
std::cout << "fileName= " << fileName << std::endl; // Output: fileName= ...
2
votes
1
answer
72
views
Reading GBIF from s3 bucket using duckdb
I'm trying to connect to GBIF's occurence data https://aws.amazon.com/marketplace/pp/prodview-dvyemtksskta2. I wondering why is it not finding the data. It should be accessible without credentials ...
4
votes
1
answer
138
views
Inserting to a (temp) table from an insert statement with returning clause in duckdb
I am working with inserting and manipulating data in a DuckDB database using Python. The incoming data is staged in several temporary tables, before it is moved to permanent tables (largely for ...
2
votes
0
answers
54
views
DuckDB bundle issue on my Parcel application
I'm working on a Parcel frontend project using DuckDB, and it runs well in the Production environment. However, in my local machine the browser displays this error.
And as I've just mentioned, this ...
0
votes
1
answer
86
views
ASOF join snowflake using 'nearest', similar to pandas.merge_asof
Is there a way to perform ASOF join in Snowflake / DuckDB / CedarDB based on nearest time in the row in the right table whose 'on' key is closest in absolute distance to the left table.
Currently I ...
0
votes
1
answer
203
views
How to pass vector from Python to Duckdb for vector similarity search
I am using Duckdb with the VSS extension to store document embeddings, it works normally however when I try to do a similarity search to a vector passed from Python, I am getting a TypeError. The ...
2
votes
1
answer
222
views
Why is parquetjs so much slower than duckdb for writing Parquet files, and how can I speed it up?
I'm working on a TypeScript project and need to save a large list of records in Parquet format using SNAPPY compression with an explicit schema.
I've successfully written data using both parquetjs and ...
0
votes
1
answer
70
views
Dask large outer join with gzip files
I'm working with an omics dataset (1000+ files) which is a folder of about ~1GB of .txt.gz files which are tab separated. They each look roughly like this for a patient ABC:
pos
ABC_count1
ABC_count2
...
3
votes
1
answer
59
views
jOOQ dynamic aggregated types
I consider jOOQ over DuckDB over Parquet files. The type of parquet columns are not know before hand. Let's consider some column may be an integer or a double.
I want to SUM over given column, ...
0
votes
0
answers
67
views
How to find storage size of column(s) in DuckDB?
Would be useful to know which columns use the most disk space in a table. As in, which columns are taking up much storage but are rarely used and could be a candidate for elimination.
Heuristics don't ...
3
votes
2
answers
474
views
How to do Upserts with DuckDb and Ducklake
How do I upsert data into Ducklake?
If I have a simple table definition:
CREATE TABLE ducklakeexample.demo (
"Date" TIMESTAMP WITH TIME ZONE,
"Id" UUID,
"Title" ...
0
votes
1
answer
118
views
Why do DuckDB aggregation functions have inconsistent formatting results
I'm aggregating some numbers with SUM, AVG, MAX and the former 2 automatically display the results with thousand comma separators when cast to Integer, but MAX does not. I'm able to format numbers to ...
1
vote
2
answers
422
views
Create table from JSON
The DuckDB documentation states that you can create a table starting from a JSON file, with DuckDB automatically constructing the table schema, for instance:
CREATE TABLE todos AS
SELECT * FROM '...
3
votes
1
answer
67
views
DuckDB drop column: no column named that way
Why DuckDB tells me there is no column named that way when I try to drop a column?
D DESCRIBE oa_pub;
┌──────────────┬─────────────┬─────────┬─────────┬──────────────────────────┬─────────┐
│ ...
2
votes
2
answers
110
views
Generate list of doubles to pass to quantile_cont
I am trying to translate some PostgreSQL to DuckDb but the latter's docs are really lacking, and the ability to pass an array or list of doubles is the problem.
This works fine in Postgres:
SELECT
...
0
votes
0
answers
23
views
DuckDB Slow Creating Prepared Statement when attached S3Tables (Go SDK)
Hye,
I attached S3tables to my local DuckDB. And it works fine
func OpenDuckDB(ctx context.Context, logger *zerolog.Logger) (*sql.DB, *duckdb.Connector, error) {
os.Remove("duckdb.db")
...
0
votes
0
answers
52
views
Synchronizing DuckDB to S3Tables for long running process. Golang SDK
Im trying to test out whether or not DuckDB able to synchronize to s3Tables for long running process.
I attached to s3Tables as per doc and it works fine.
query = fmt.Sprintf(`
ATTACH '%s' AS ...
1
vote
0
answers
195
views
"Out of Memory Error: Failed to allocate block of Bytes" using DuckDB
I'm using DuckDB to process data stored in Parquet files, organized in a Hive-style directory structure partitioned by year, month, day, and hour. Each Parquet file contains around 150 columns, and I ...
1
vote
0
answers
143
views
Why can't i install httpfs on Duck DB shell?
I woud like to access S3 resources by httpfs on DuckDB shell. I installed httpfs extension. But i always meet that error : "
Invalid Input Error: Secret type 's3' does not exist, but it exists in ...
0
votes
0
answers
68
views
DuckDB multiple row groups with the same ID
I was reading DuckDB lightweight compression article.
To find out the compression method used by DuckDB I would run:
SELECT * EXCLUDE (column_path, segment_id, start, stats, persistent, block_id, ...
1
vote
0
answers
198
views
Can't update table with CTE in DuckDB. Raises Parser error
-- Creates a CTE that will join the necessary
-- values from the dimension tables to the fact
-- table
WITH MergedCDI AS (
SELECT
c.LogID,
c.DataValueUnit,
c.DataValue,
c.YearStart,
...
0
votes
1
answer
106
views
Build duckdb amalgamation with sqlite_scanner
I'm trying to create an amalgamation for duckdb with sqlite_scanner as a static extension. I'm a complete novice to building duckdb. I know that I can build the amalgamation with a command similar to ...
1
vote
1
answer
79
views
Truncate timestamptz to the start of the month (in local time)
Example:
In [1]: import duckdb
In [2]: import narwhals as nw
In [3]: duckdb.sql("""set timezone = 'Europe/Amsterdam'""")
In [5]: duckdb.sql("""from rel ...
1
vote
1
answer
275
views
Implement Slowly Changing Dimensions (SCD) - Type 2 Using DuckDB
I want to implement SCD-Type2, and keep track of historized data, I am using for this task DuckDB, but I found out that DuckDB does not support Merge Statement.
The idea I have is to have two separate ...
0
votes
0
answers
71
views
DuckDB raise exception while query DeltaLake table after merge process updates fields |table->struct->list->struct->field|
Environment:
Python 3.9.21
DuckDB 1.1.3
pyarrow 18.1.0
deltalake 18.1.0
Behavior explanation:
add and udpate string fields in a struct inside a list under the root of the table works fine.
update ...
3
votes
1
answer
214
views
Unable to install h3 extension in R
I'm unable to install the h3 community extension and I couldn't figure out the reason. See my code and error message below. Any idea what's causing the problem?
library(duckdb)
con <- duckdb::...
1
vote
0
answers
241
views
How can I import a .sql dump file (e.g., hr.sql) into DuckDB using the CLI?
I have a dummy database file named hr.sql that I want to import into DuckDB. I'm trying to use DuckDB from the command-line interface (CLI), and my goal is to load and use this database similar to how ...
2
votes
1
answer
165
views
How do I do a specific aggregation on a table based on row column values on another table (SQL)?
I have loaded two fact tables CDI and Population and a couple dimension tables in DuckDB. I did joins on the CDI fact table and its respective dimension tables which yields a snippet of the table ...