Skip to content

Conversation

@okerekechinweotito
Copy link
Contributor

Fixes: T348188

Proposed Changes

  • For PDL - Check if PDF is available then download and upload the PDF directly without parsing each image separately.

Files Created/Updated

  • utils/helper.js - check if PDF link is available and return the pdfUrl download link if true
  • bull/pdl-queue/consumer.js - create a function getPdfAndBytelength that downloads the PDF and returns the PDF file and the PDF size. Create a function uploadPdfToIA that uploads PDF and metadata to Internet Archive. Add a conditional that executes getPdfAndBytelength or getZipAndBytelength depending on if pdfUrl is returned

Checklist

  • Coding Conventions are followed.
  • Comments are used for Documenting the Code.
  • Correct File Names are mentioned.

Copy link
Owner

@coderwassananmol coderwassananmol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will perform E2E testing post these review comments.

@okerekechinweotito
Copy link
Contributor Author

@coderwassananmol I have made the requested changes

Copy link
Owner

@coderwassananmol coderwassananmol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@okerekechinweotito In Trove and Google Books, we are using chunked upload. I request you to please follow the same method to ensure consistency across pdf files.

@okerekechinweotito
Copy link
Contributor Author

@coderwassananmol I have made the requested changes.

Copy link
Owner

@coderwassananmol coderwassananmol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

job
);
job.progress(100);
done(null, true);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the job fails? It will still return success.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderwassananmol I have pushed a fix for this

@okerekechinweotito
Copy link
Contributor Author

I also tried uploading http://www.panjabdigilib.org/webuser/searches/displayPage.jsp?ID=2833&page=1&CategoryID=3&Searched=W3GX but it failed. Can you check why?

@coderwassananmol
It appears to be because the book has invalid characters being set in its ["X-archive-meta-description"] -

PREM SUMĀRAG, literally means the true way to love (prem=love; su=good or true; mārag=path) is an anonymous work in old Punjabi evoking a model of Sikh way of life and of Sikh society. Written probably in the eighteenth century, it is a kind of rahitnāmā attempting to prescribe norms of behavior, religious as well as social, private as well as public, for members of the Khalsa Panth. It also provides a comprehensive model of Sikh polity with details concerning civil and military administration.Although known to earlier Sikh scholars, it was published for the first time in 1953 by the Sikh History Society, Amritsar, edited with an elaborate introduction by Bhai Randhir Singh, who accidentally in 1940 came by a partly mutilated manuscript, which he revised with the help of another manuscript preserved in the Punjab Public Library, Lahore. A second edition was brought out by New Book Company, Jalandhar, in 1965. The work is divided into ten dhiāos (chapters) and each dhiāo is subdivided into several bachans (utterances or topics). Chapter I opens with what may be called a prologue.

@coderwassananmol
Copy link
Owner

I also tried uploading http://www.panjabdigilib.org/webuser/searches/displayPage.jsp?ID=2833&page=1&CategoryID=3&Searched=W3GX but it failed. Can you check why?

@coderwassananmol It appears to be because the book has invalid characters being set in its ["X-archive-meta-description"] -

PREM SUMĀRAG, literally means the true way to love (prem=love; su=good or true; mārag=path) is an anonymous work in old Punjabi evoking a model of Sikh way of life and of Sikh society. Written probably in the eighteenth century, it is a kind of rahitnāmā attempting to prescribe norms of behavior, religious as well as social, private as well as public, for members of the Khalsa Panth. It also provides a comprehensive model of Sikh polity with details concerning civil and military administration.Although known to earlier Sikh scholars, it was published for the first time in 1953 by the Sikh History Society, Amritsar, edited with an elaborate introduction by Bhai Randhir Singh, who accidentally in 1940 came by a partly mutilated manuscript, which he revised with the help of another manuscript preserved in the Punjab Public Library, Lahore. A second edition was brought out by New Book Company, Jalandhar, in 1965. The work is divided into ten dhiāos (chapters) and each dhiāo is subdivided into several bachans (utterances or topics). Chapter I opens with what may be called a prologue.

It is expected. Can you submit a patch to resolve this?

@okerekechinweotito
Copy link
Contributor Author

okerekechinweotito commented Dec 5, 2023

@coderwassananmol please what should the solution achieve...Should I make the description an emtpy string if it contains invalid characters so it can upload?

@okerekechinweotito
Copy link
Contributor Author

okerekechinweotito commented Dec 6, 2023

I also tried uploading http://www.panjabdigilib.org/webuser/searches/displayPage.jsp?ID=2833&page=1&CategoryID=3&Searched=W3GX but it failed. Can you check why?

@coderwassananmol It appears to be because the book has invalid characters being set in its ["X-archive-meta-description"] -

PREM SUMĀRAG, literally means the true way to love (prem=love; su=good or true; mārag=path) is an anonymous work in old Punjabi evoking a model of Sikh way of life and of Sikh society. Written probably in the eighteenth century, it is a kind of rahitnāmā attempting to prescribe norms of behavior, religious as well as social, private as well as public, for members of the Khalsa Panth. It also provides a comprehensive model of Sikh polity with details concerning civil and military administration.Although known to earlier Sikh scholars, it was published for the first time in 1953 by the Sikh History Society, Amritsar, edited with an elaborate introduction by Bhai Randhir Singh, who accidentally in 1940 came by a partly mutilated manuscript, which he revised with the help of another manuscript preserved in the Punjab Public Library, Lahore. A second edition was brought out by New Book Company, Jalandhar, in 1965. The work is divided into ten dhiāos (chapters) and each dhiāo is subdivided into several bachans (utterances or topics). Chapter I opens with what may be called a prologue.

It is expected. Can you submit a patch to resolve this?

@coderwassananmol
I have encoded the description

@coderwassananmol
Copy link
Owner

@okerekechinweotito I debugged this further and found that the value of byteLength at line no. 65 is NaN because the value of no_of_pages is coming as "Adopt this Manuscript".

I think we need to fix the getPDLMetaData() function in the helper.js file to fix this.

Copy link
Owner

@coderwassananmol coderwassananmol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upload is working fine now.

@coderwassananmol coderwassananmol merged commit f9e1f8e into coderwassananmol:develop Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants