Newest 'document-conversion' Questions

0 votes

1 answer

845 views

Converting Document Docx with Comments to markit using markitdown

There is a new open source python library from Microsoft markitdown https://github.com/microsoft/markitdown It basically works fine on my Docx documents (if anyone uses it, make sure you use it on ...

Bogdan_Ch

3,356

asked Jan 4 at 18:24

3 votes

1 answer

650 views

Convert Word to PDF in Android Studio Kotlin

I'm doing an Android Kotlin project which will auto-generate certificates when we enter some details in the EditTexts. I have word files (.docx) in my assets folder which has some variables which will ...

Rushi Mayur

41

asked Dec 22, 2024 at 20:23

1 vote

0 answers

96 views

Libreoffice not respecting docxjs auto table width

It's like the title states. If I omit the table width, it ends up with 0, and doesn't show up at all. The only way to make it show up at all is by doing: width: { size: 100, type: Docx.WidthType....

Bosko Sinobad

119

asked Sep 4, 2024 at 15:06

1 vote

1 answer

11k views

Converting PDF to Markdown in Python with structure preservation [closed]

I need to convert a PDF text document to Markdown and maintaining its structure (ie. indexed numbered headers and subheaders should have their correspective number of hashtags # in markdown to keep ...

Guido

503

asked Jan 17, 2024 at 16:45

0 votes

1 answer

43 views

Tiff Output is not as expected for Black and white 1200dpi LZW test file created using Universal Document Converter 6.7 & 6.8 versions

Respected Sir/Madam, I have a doubt regarding LZW BW 1200dpi tiff file creation using “UDC driver 6.7/6.8 version”. If we disable “'Perform High-Quality Smoothing”, then output data are not visible in ...

Shant

11

asked Apr 8, 2021 at 6:32

0 votes

1 answer

1k views

Not able to read file in Pypandoc

I am trying to covert a pdf to html using Pandoc. I have installed pandoc binary , added the environment variable path and then using import pypandoc import os os.environ.setdefault('PYPANDOC_PANDOC',...

SUBHRA SANKHA

158

asked Aug 2, 2020 at 11:00

0 votes

0 answers

814 views

How to Run Openoffice in a server and perform conversion from my local system

I have a requirement where I need to use openoffice in a standalone server and use a Java program for Document conversion. Right now, I have a setup where I have started openoffice in my linux ...

BlackViper

73

asked Jul 14, 2020 at 6:03

0 votes

1 answer

66 views

c# Word-AddIn convert activeDocument to a virtual PDF and merge them into one PDF document

I am creating multiple virtual documents and then I want to merge them into one PDF, without saving them somewhere. All I found for now are guides, in which they save the document as a PDF somewhere ...

GuterProgrammierer

73

asked Feb 6, 2020 at 9:56

-1 votes

1 answer

652 views

How to convert docx to pdf using python3? [closed]

I want to display a preview of files uploaded by a users. For this reason, I have to convert docx-files to pdf using python 3.7. When looking for a library to do the job I found the following: ...

Jekson

3,322

asked Aug 29, 2019 at 15:25

1 vote

1 answer

418 views

Converting adobe inDesign to pptx (is it even possible?)

I'm struggling to find a solution. I have a bulk of Adobe inDesign files I'm trying to convert over as PDFs I know you can export to inDesign -> PDF then from Acrobat PDF -> PPTX. This would work ...

Joshua Jones

109

asked Jan 3, 2019 at 21:54

9 votes

1 answer

15k views

R markdown pandoc document conversion failed with error 1 after updating pandoc from 1.19 to 2.4

I recently installed pandoc 2.4 on Windows and the conversion failed with error 1 occurs for all knitting. I can't knit html, word, and pdf. The error says output file: template.knitmd pandoc.exe: ...

Moses Kim

93

asked Nov 19, 2018 at 23:24

0 votes

1 answer

3k views

Conversion of PDF to EPub

I am creating an application to convert HTML Pages to an ePub format. I tried converting the file to PDF Since I require Table Of Contents as the first page of the ePub file. I have used Spire PDF and ...

Shubashree Ravi

271

asked Feb 22, 2018 at 12:20

0 votes

1 answer

801 views

Ruby: parse/extract images and objects from docx file

I am trying to open and read a .docx file using Ruby, and extract portions of the text and objects/images and save into another (non .docx) file. Using Nokogiri, I am able to properly extract text ...

Noel Euzebe

11

asked Sep 18, 2017 at 15:14

0 votes

1 answer

718 views

Splitting complex PDF files using Watson Document Conversion Service

We are implementing Question & Answering System using Watson Discovery Service(WDS). We required each answer unit available in single document. We have complex PDF files as corpus. The PDF files ...

Prashanth M

1

asked Sep 18, 2017 at 5:41

0 votes

1 answer

720 views

I need to convert DOC/TXT files to PDF in large batches

We are changing systems and the new system only outputs .DOC or .TXT files for reports. Several of the reports that come out need to be converted to PDF so they are available for our web users on a ...

David M

43

asked Aug 24, 2017 at 20:39

0 votes

1 answer

119 views

getting "415:Media not supported" Error when passing pdf to IBM watson in Salesforce

I am planning to integrate the IBM Watson Document Conversion service with Salesforce. From there I am unable to send my pdf file directly to Watson and I'm getting Media Type not supported. I am ...

Umang

1

asked Jul 18, 2017 at 9:26

0 votes

1 answer

194 views

IBM Watson Document Conversion responding with 415 error even though I ingest a PDF?

I have an html form that allows users to upload a file, which then uses IBM Watson's document conversion API to convert the text of the document into normalized text which is then inserted into a ...

Daniel La

1

asked Jun 15, 2017 at 16:14

0 votes

1 answer

155 views

IBM Watson Document Conversion not working

I recently implemented the Document Conversion API from IBM Watson. I always get an encoding error for converting pdf document!!! #!/usr/bin/env python #coding: utf-8 import json from ...

Ikmel

11

asked Jun 7, 2017 at 8:48

0 votes

1 answer

82 views

Document Converstion for PDF form (eg. w2/1040/etc) as key/values instead of a single string based on font information

Trying to use the Document Conversion service to capture the json key/value pairs for the pdf documents such as (w2/1040/etc forms.) Content of such forms in json response are coming as part of the "...

user7981462

1

asked May 18, 2017 at 21:23

0 votes

1 answer

177 views

What is the rate limit for IBM's Document Conversion Service and how do I increase it?

We use IBM's Document Conversion service as a core part of our Watson-based AI system. Recently I have been getting a lot of this error whilst building our corpus: Error SLM-THROTTLE occurred when ...

David Powell

547

asked May 17, 2017 at 22:31

0 votes

0 answers

613 views

Receiving a "String index out of range: 0" when trying to convert a PDF in the IBM Document Conversion service

I am trying to convert a document using IBM's Document Conversion service. It is a basic PDF, 116 pages,1.1MB file. Nothing special about it that I can see, but the DC service returns the error "...

David Powell

547

asked Apr 22, 2017 at 0:50

0 votes

1 answer

105 views

Why do I get "Could not push back" error when trying to use the IBM Bluemix Document Conversion service?

I am trying to convert documents using the Bluemix Document Conversion service with a Node.js application. I am getting nothing but errors in my app, but the test document I'm using converts fine ...

David Powell

547

asked Mar 16, 2017 at 15:27

0 votes

1 answer

61 views

Partial response of documentconversionV1()

I am trying to use DocumentConversionV1 function of watson_developer_cloud API on python , However the response in my case comes only as "<"Response 200">". import sys import os as o import json ...

Sanjay Josh

45

asked Mar 7, 2017 at 14:24

0 votes

2 answers

64 views

How to use webfiles in document conversion of watson

We recently implemented the Document Conversion API from IBM Watson.In this can I use web files (www.something.com) as input. curl -X POST -u "username":"password" -F config="{\"conversion_target\":\"...

user94

419

asked Feb 28, 2017 at 9:19

0 votes

1 answer

179 views

How to break up large document into smaller answer units on Retrieve and Rank?

I am still very new to Retrieve and Rank, and Document Conversion services, so I have been playing around with that lately. I encountered a problem where when I upload a large document (100+ pages) - ...

Ngoodles

1

asked Feb 22, 2017 at 0:52

0 votes

0 answers

207 views

IBM Watson Document Conversion not working at all

We recently implemented the Document Conversion API from IBM Watson. We always get the error, even though we specify the document type: 415 Unsupported Media Type - The media type of the input file ...

OSX55

170

asked Dec 28, 2016 at 10:10

0 votes

1 answer

120 views

Getting a strange error from Watson's Document Conversion service

I am trying to convert some documents into answer units with Watson's Document Conversion service, using the watson-developer-cloud Javascript library in Node.js. Certain ones (an example is at IBM ...

David Powell

547

asked Nov 7, 2016 at 16:37

0 votes

1 answer

96 views

Having trouble getting usable results from Watson's Document Conversion service

When I try to convert this document https://public.dhe.ibm.com/common/ssi/ecm/po/en/poq12347usen/POQ12347USEN.PDF with Watson's Document Conversion service, all I get is four answer units, one for ...

David Powell

547

asked Nov 7, 2016 at 16:21

1 vote

1 answer

131 views

Document Conversion Code 400

When I do this command: C:\curl -X POST -u "User":"Pass" -F config="{\"conversion_target\":\"answer_units\"}" -F file="D:\PATH\QeA.pdf;type=application/pdf" "https://gateway.watsonplatform.net/...

Marco Oliveira

167

asked Oct 11, 2016 at 19:38

1 vote

2 answers

133 views

Can the answer unit content array returned by the Watson Document Conversion service ever have more than one element?

I am writing a program that takes advantage of IBM Watson's Document Conversion service to convert documents of various types into answer units. Each answer unit that is returned by the service ...

David Powell

547

asked Sep 8, 2016 at 19:17

0 votes

1 answer

83 views

Does IBM Watson Document Conversion ignores header?

We are trying to use the IBM Watson Document Conversion service on Word documents and have noticed that text that is in the header (and is displayed when the doc file is viewed) is not returned by the ...

Christopher Hyland

21

asked Aug 4, 2016 at 15:24

0 votes

1 answer

100 views

How to Handle Document Conversion from DocX and Other FileFormats to a Specific XSD?

We are trying to convert a .docx – and later other potential file formats – into a kind of standard XML. This XML is going to be mapped through an XSLT to the XML of our choice (xsd). For the ...

sbadea

1

asked Jul 18, 2016 at 15:12

0 votes

2 answers

159 views

While using document conversion with html in node-red, getting Error: Lost connection to server

Trying to use Watson Document Conversion service from Node-Red with following payload setup and to feed into 'Convert' node, it always returns "Error: Lost connect to server". I'd think the setup is ...

nyker

57

asked Jul 10, 2016 at 15:44

0 votes

1 answer

207 views

Bluemix PDF Document Conversion

I'm trying to convert a PDF document but I am having problems regarding the accents in words. The PDF is in Portuguese-Brazil language. This is the command i'm running: curl -X POST -u "OMITTED":"...

Fred Miranda

31

asked Jun 3, 2016 at 3:20

0 votes

1 answer

75 views

Watson Document Conversion service timeout on pdf redbook files

When trying the Watson Document Conversion service on the following redbook: http://www.redbooks.ibm.com/redbooks/pdfs/ga195486.pdf, I get timeout error. I verified the size is less than 50 MB. Any ...

joe4k

21

asked Jun 1, 2016 at 1:11

2 votes

1 answer

14k views

<Response [200]> is not JSON serializable

following the document conversion API example trying to use Flask to convert msword document to text, but it does not work. Here is the code import os, json, requests from flask import Flask, ...

user6332732

23

asked May 25, 2016 at 19:20

-1 votes

2 answers

408 views

How do I send a PDF to Watson's Document Conversion service without writing it to disk first?

I am trying to convert this document (http://www.redbooks.ibm.com/redbooks/pdfs/ga195486.pdf) to answer units in Watson's Document Conversion service using the watson-developer-cloud node.js library. ...

David Powell

547

asked May 23, 2016 at 16:48

0 votes

1 answer

102 views

Getting cryptic errors from Bluemix Document Conversion service

I am trying to convert this document: http://www.redbooks.ibm.com/redpapers/pdfs/redp5213.pdf to JSON answer units, but it (and many similar others) just won't process through the service. If I try ...

David Powell

547

asked May 13, 2016 at 20:11

1 vote

1 answer

130 views

Creating classes for using Document Conversion and Concept Insights in Java

So I want to make classes for using Concept Insights on HTML documents converted from PDF thanks to Document Conversion. I am using an Eclipse IDE with a view of my Git directory. When I run it, I get ...

Tara E

13

asked Mar 27, 2016 at 20:00

-1 votes

1 answer

211 views

Bluemix document conversion service - how to convert multiple documents [duplicate]

My goal is a single file of documents in JSON format, that would come from 50-100 MS Word or PDF documents. Is there a way to supply multiple documents to the "convert_document" command? I've tried ...

ralphearle

1,684

asked Nov 25, 2015 at 15:46

1 vote

4 answers

1k views

Converting MS Office Docx with a good compatibility

After spending hours and hours on StackOverflow and programmers forum, i've decided to use the SyncFusion on our project. Our main target is : convert to PDF/directly print existing Doc And Docx this ...

sstassin

388

asked Nov 12, 2015 at 10:17

3 votes

1 answer

340 views

How to convert multiple documents using the Document Conversion service ina script bash?

How can I convert more than one document using the Document Conversion service. I have between 50-100 MS Word and PDF documents that I want to convert using the convert_document API method? For ...

German Attanasio

23.9k

asked Nov 6, 2015 at 1:28

1 vote

1 answer

113 views

How to add a custom footer to pdfs created by Liferay DocumentConversionUtil (and open office)

I am trying to add a custom footer to pdfs created from docx files on my liferay6.2 installation. Specifically I have linked up open office, and I am successfully converting the documents from docx ...

Joe Andersen

11

asked Oct 22, 2015 at 3:09

1 vote

0 answers

525 views

LibreOffice(4.4.3) Headless PDF Conversion issue for some MSWords documents

I am able to convert most of the word documents(doc & docx) to PDF on windows. "soffice.exe" --headless --convert-to pdf --outdir "C:\Ok" "C:\Ok\Test_Original.doc" But a few documents are not ...

pingu

695

asked Jun 30, 2015 at 23:44

0 votes

1 answer

734 views

Getting strange character translations using unoconv to convert from docx/doc to pdf

I am using unoconv (https://github.com/dagwieers/unoconv) to convert DOCX and DOC file to PDF, but will often get strange results on certain characters when they are rendered in the PDF. One ...

rkp333

391

asked Apr 18, 2015 at 23:04

1 vote

2 answers

1k views

How does Apache commons IO convert my XML header from UTF-8 to UTF-16?

I’m using Java 6. I have an XML template, which begins like so <?xml version="1.0" encoding="UTF-8"?> However, I notice when I parse and output it with the following code (using Apache Commons-...

Dave A

2,850

asked Feb 16, 2015 at 17:04

0 votes

1 answer

293 views

Inappropriate ioctl for device at when calling unoconv from perl script

I'm triggering a perl script from an postfix email server every time when an email is received for a specified domain. The perl script basically extracts all attachments and then calls unoconv to ...

markus

6,638

asked Jan 26, 2015 at 10:38

-2 votes

1 answer

47 views

Converting high volume of .pdf's into .html or .doc

I'm looking for either a code snippet or other solution capable of converting a high volume (thousands) of .pdf's into .html or .doc while at the same time: maintaining hierarchical structure of ...

Cognitivity

31

asked Dec 31, 2014 at 16:26

1 vote

4 answers

6k views

Which PHP API or library is the best for converting from HTML to PDF and DOCX? [closed]

First, I tried to use Cloudconvert. It can convert between so many fyletypes, but its PHP API causes memory leaks almost at all times. The second I tried was Pdfcrowd. It works perfectly, but it can ...

aleskva

1,845

asked May 11, 2014 at 15:09

1 vote

0 answers

1k views

Formatting lost after converting pdf file to docx file

I am using the following code to convert a PDF file into MS Word Document using the following code snippet. import java.io.FileOutputStream; import org.apache.poi.xwpf.usermodel.BreakType; import ...

Bhagyesh Jain

333

asked Feb 20, 2014 at 10:30

Collectives™ on Stack Overflow