71 questions
0
votes
1
answer
845
views
Converting Document Docx with Comments to markit using markitdown
There is a new open source python library from Microsoft markitdown https://github.com/microsoft/markitdown
It basically works fine on my Docx documents (if anyone uses it, make sure you use it on ...
3
votes
1
answer
650
views
Convert Word to PDF in Android Studio Kotlin
I'm doing an Android Kotlin project which will auto-generate certificates when we enter some details in the EditTexts. I have word files (.docx) in my assets folder which has some variables which will ...
1
vote
0
answers
96
views
Libreoffice not respecting docxjs auto table width
It's like the title states.
If I omit the table width, it ends up with 0, and doesn't show up at all.
The only way to make it show up at all is by doing:
width: {
size: 100,
type: Docx.WidthType....
1
vote
1
answer
11k
views
Converting PDF to Markdown in Python with structure preservation [closed]
I need to convert a PDF text document to Markdown and maintaining its structure (ie. indexed numbered headers and subheaders should have their correspective number of hashtags # in markdown to keep ...
0
votes
1
answer
43
views
Tiff Output is not as expected for Black and white 1200dpi LZW test file created using Universal Document Converter 6.7 & 6.8 versions
Respected Sir/Madam,
I have a doubt regarding LZW BW 1200dpi tiff file creation using “UDC driver 6.7/6.8 version”.
If we disable “'Perform High-Quality Smoothing”, then output data are not visible in ...
0
votes
1
answer
1k
views
Not able to read file in Pypandoc
I am trying to covert a pdf to html using Pandoc. I have installed pandoc binary , added the environment variable path and then using
import pypandoc
import os
os.environ.setdefault('PYPANDOC_PANDOC',...
0
votes
0
answers
814
views
How to Run Openoffice in a server and perform conversion from my local system
I have a requirement where I need to use openoffice in a standalone server and use a Java program for Document conversion.
Right now, I have a setup where I have started openoffice in my linux ...
0
votes
1
answer
66
views
c# Word-AddIn convert activeDocument to a virtual PDF and merge them into one PDF document
I am creating multiple virtual documents and then I want to merge them into one PDF, without saving them somewhere. All I found for now are guides, in which they save the document as a PDF somewhere ...
-1
votes
1
answer
652
views
How to convert docx to pdf using python3? [closed]
I want to display a preview of files uploaded by a users.
For this reason, I have to convert docx-files to pdf using python 3.7.
When looking for a library to do the job I found the following:
...
1
vote
1
answer
418
views
Converting adobe inDesign to pptx (is it even possible?)
I'm struggling to find a solution. I have a bulk of Adobe inDesign files I'm trying to convert over as PDFs
I know you can export to inDesign -> PDF then from Acrobat PDF -> PPTX. This would work ...
9
votes
1
answer
15k
views
R markdown pandoc document conversion failed with error 1 after updating pandoc from 1.19 to 2.4
I recently installed pandoc 2.4 on Windows and the conversion failed with error 1 occurs for all knitting. I can't knit html, word, and pdf.
The error says
output file: template.knitmd
pandoc.exe: ...
0
votes
1
answer
3k
views
Conversion of PDF to EPub
I am creating an application to convert HTML Pages to an ePub format. I tried converting the file to PDF Since I require Table Of Contents as the first page of the ePub file. I have used Spire PDF and ...
0
votes
1
answer
801
views
Ruby: parse/extract images and objects from docx file
I am trying to open and read a .docx file using Ruby, and extract portions of the text and objects/images and save into another (non .docx) file.
Using Nokogiri, I am able to properly extract text ...
0
votes
1
answer
718
views
Splitting complex PDF files using Watson Document Conversion Service
We are implementing Question & Answering System using Watson Discovery Service(WDS). We required each answer unit available in single document. We have complex PDF files as corpus. The PDF files ...
0
votes
1
answer
720
views
I need to convert DOC/TXT files to PDF in large batches
We are changing systems and the new system only outputs .DOC or .TXT files for reports. Several of the reports that come out need to be converted to PDF so they are available for our web users on a ...
0
votes
1
answer
119
views
getting "415:Media not supported" Error when passing pdf to IBM watson in Salesforce
I am planning to integrate the IBM Watson Document Conversion service
with Salesforce.
From there I am unable to send my pdf file directly to Watson and I'm getting Media Type not supported.
I am ...
0
votes
1
answer
194
views
IBM Watson Document Conversion responding with 415 error even though I ingest a PDF?
I have an html form that allows users to upload a file, which then uses IBM Watson's document conversion API to convert the text of the document into normalized text which is then inserted into a ...
0
votes
1
answer
155
views
IBM Watson Document Conversion not working
I recently implemented the Document Conversion API from IBM Watson. I always get an encoding error for converting pdf document!!!
#!/usr/bin/env python
#coding: utf-8
import json
from ...
0
votes
1
answer
82
views
Document Converstion for PDF form (eg. w2/1040/etc) as key/values instead of a single string based on font information
Trying to use the Document Conversion service to capture the json key/value pairs for the pdf documents such as (w2/1040/etc forms.)
Content of such forms in json response are coming as part of the "...
0
votes
1
answer
177
views
What is the rate limit for IBM's Document Conversion Service and how do I increase it?
We use IBM's Document Conversion service as a core part of our Watson-based AI system. Recently I have been getting a lot of this error whilst building our corpus:
Error SLM-THROTTLE occurred when ...
0
votes
0
answers
613
views
Receiving a "String index out of range: 0" when trying to convert a PDF in the IBM Document Conversion service
I am trying to convert a document using IBM's Document Conversion service. It is a basic PDF, 116 pages,1.1MB file. Nothing special about it that I can see, but the DC service returns the error "...
0
votes
1
answer
105
views
Why do I get "Could not push back" error when trying to use the IBM Bluemix Document Conversion service?
I am trying to convert documents using the Bluemix Document Conversion service with a Node.js application. I am getting nothing but errors in my app, but the test document I'm using converts fine ...
0
votes
1
answer
61
views
Partial response of documentconversionV1()
I am trying to use DocumentConversionV1 function of watson_developer_cloud API on python , However the response in my case comes only as "<"Response 200">".
import sys
import os as o
import json
...
0
votes
2
answers
64
views
How to use webfiles in document conversion of watson
We recently implemented the Document Conversion API from IBM Watson.In this can I use web files (www.something.com) as input.
curl -X POST -u "username":"password" -F config="{\"conversion_target\":\"...
0
votes
1
answer
179
views
How to break up large document into smaller answer units on Retrieve and Rank?
I am still very new to Retrieve and Rank, and Document Conversion services, so I have been playing around with that lately.
I encountered a problem where when I upload a large document (100+ pages) - ...
0
votes
0
answers
207
views
IBM Watson Document Conversion not working at all
We recently implemented the Document Conversion API from IBM Watson.
We always get the error, even though we specify the document type:
415 Unsupported Media Type - The media type of the input file ...
0
votes
1
answer
120
views
Getting a strange error from Watson's Document Conversion service
I am trying to convert some documents into answer units with Watson's Document Conversion service, using the watson-developer-cloud Javascript library in Node.js. Certain ones (an example is at IBM ...
0
votes
1
answer
96
views
Having trouble getting usable results from Watson's Document Conversion service
When I try to convert this document
https://public.dhe.ibm.com/common/ssi/ecm/po/en/poq12347usen/POQ12347USEN.PDF
with Watson's Document Conversion service, all I get is four answer units, one for ...
1
vote
1
answer
131
views
Document Conversion Code 400
When I do this command:
C:\curl -X POST -u "User":"Pass" -F config="{\"conversion_target\":\"answer_units\"}" -F file="D:\PATH\QeA.pdf;type=application/pdf" "https://gateway.watsonplatform.net/...
1
vote
2
answers
133
views
Can the answer unit content array returned by the Watson Document Conversion service ever have more than one element?
I am writing a program that takes advantage of IBM Watson's Document Conversion service to convert documents of various types into answer units. Each answer unit that is returned by the service ...
0
votes
1
answer
83
views
Does IBM Watson Document Conversion ignores header?
We are trying to use the IBM Watson Document Conversion service on Word documents and have noticed that text that is in the header (and is displayed when the doc file is viewed) is not returned by the ...
0
votes
1
answer
100
views
How to Handle Document Conversion from DocX and Other FileFormats to a Specific XSD?
We are trying to convert a .docx – and later other potential file formats – into a kind of standard XML. This XML is going to be mapped through an XSLT to the XML of our choice (xsd).
For the ...
0
votes
2
answers
159
views
While using document conversion with html in node-red, getting Error: Lost connection to server
Trying to use Watson Document Conversion service from Node-Red with following payload setup and to feed into 'Convert' node, it always returns "Error: Lost connect to server". I'd think the setup is ...
0
votes
1
answer
207
views
Bluemix PDF Document Conversion
I'm trying to convert a PDF document but I am having problems regarding the accents in words. The PDF is in Portuguese-Brazil language.
This is the command i'm running:
curl -X POST -u "OMITTED":"...
0
votes
1
answer
75
views
Watson Document Conversion service timeout on pdf redbook files
When trying the Watson Document Conversion service on the following redbook: http://www.redbooks.ibm.com/redbooks/pdfs/ga195486.pdf, I get timeout error. I verified the size is less than 50 MB. Any ...
2
votes
1
answer
14k
views
<Response [200]> is not JSON serializable
following the document conversion API example trying to use Flask to convert msword document to text, but it does not work.
Here is the code
import os, json, requests
from flask import Flask, ...
-1
votes
2
answers
408
views
How do I send a PDF to Watson's Document Conversion service without writing it to disk first?
I am trying to convert this document (http://www.redbooks.ibm.com/redbooks/pdfs/ga195486.pdf) to answer units in Watson's Document Conversion service using the watson-developer-cloud node.js library.
...
0
votes
1
answer
102
views
Getting cryptic errors from Bluemix Document Conversion service
I am trying to convert this document: http://www.redbooks.ibm.com/redpapers/pdfs/redp5213.pdf to JSON answer units, but it (and many similar others) just won't process through the service. If I try ...
1
vote
1
answer
130
views
Creating classes for using Document Conversion and Concept Insights in Java
So I want to make classes for using Concept Insights on HTML documents converted from PDF thanks to Document Conversion. I am using an Eclipse IDE with a view of my Git directory. When I run it, I get ...
-1
votes
1
answer
211
views
Bluemix document conversion service - how to convert multiple documents [duplicate]
My goal is a single file of documents in JSON format, that would come from 50-100 MS Word or PDF documents.
Is there a way to supply multiple documents to the "convert_document" command? I've tried ...
1
vote
4
answers
1k
views
Converting MS Office Docx with a good compatibility
After spending hours and hours on StackOverflow and programmers forum, i've decided to use the SyncFusion on our project.
Our main target is :
convert to PDF/directly print existing Doc And Docx
this ...
3
votes
1
answer
340
views
How to convert multiple documents using the Document Conversion service ina script bash?
How can I convert more than one document using the Document Conversion service.
I have between 50-100 MS Word and PDF documents that I want to convert using the convert_document API method?
For ...
1
vote
1
answer
113
views
How to add a custom footer to pdfs created by Liferay DocumentConversionUtil (and open office)
I am trying to add a custom footer to pdfs created from docx files on my liferay6.2 installation.
Specifically I have linked up open office, and I am successfully converting the documents from docx ...
1
vote
0
answers
525
views
LibreOffice(4.4.3) Headless PDF Conversion issue for some MSWords documents
I am able to convert most of the word documents(doc & docx) to PDF on windows.
"soffice.exe" --headless --convert-to pdf --outdir "C:\Ok" "C:\Ok\Test_Original.doc"
But a few documents are not ...
0
votes
1
answer
734
views
Getting strange character translations using unoconv to convert from docx/doc to pdf
I am using unoconv (https://github.com/dagwieers/unoconv) to convert DOCX and DOC file to PDF, but will often get strange results on certain characters when they are rendered in the PDF.
One ...
1
vote
2
answers
1k
views
How does Apache commons IO convert my XML header from UTF-8 to UTF-16?
I’m using Java 6. I have an XML template, which begins like so
<?xml version="1.0" encoding="UTF-8"?>
However, I notice when I parse and output it with the following code (using Apache Commons-...
0
votes
1
answer
293
views
Inappropriate ioctl for device at when calling unoconv from perl script
I'm triggering a perl script from an postfix email server every time when an email is received for a specified domain. The perl script basically extracts all attachments and then calls unoconv to ...
-2
votes
1
answer
47
views
Converting high volume of .pdf's into .html or .doc
I'm looking for either a code snippet or other solution capable of converting a high volume (thousands) of .pdf's into .html or .doc while at the same time:
maintaining hierarchical structure of ...
1
vote
4
answers
6k
views
Which PHP API or library is the best for converting from HTML to PDF and DOCX? [closed]
First, I tried to use Cloudconvert. It can convert between so many fyletypes, but its PHP API causes memory leaks almost at all times.
The second I tried was Pdfcrowd. It works perfectly, but it can ...
1
vote
0
answers
1k
views
Formatting lost after converting pdf file to docx file
I am using the following code to convert a PDF file into MS Word Document using the following code snippet.
import java.io.FileOutputStream;
import org.apache.poi.xwpf.usermodel.BreakType;
import ...