1

I have a project in a concurrent and distributed programming course.

In this course we use Erlang.

I need to use some database from an XML file, that already has a parser written in java (this is the link for the XML and the parser: https://dblp.org/faq/1474681.html). The XML file is 2.5GB, so I understand that the first step is to use a number of processes that I will create in erlang that will parse the XML and each process will parse a chunk of the XML.

The thing is that this is the first time I'm doing something like that (combine erlang and java, and parse a really big XML file), So I'm not sure how to approach this problem - divide the XML to chunks before I start to parse him? Somehow set start and end for each process that parses the XML?

Just to clarify - the course is about erlang and using processes in erlang, so I must use it (because I'm sure that there are java multi-threading solutions).

I will really appreciate any ideas or help! Thanks!

1 Answer 1

1

You can do it in Erlang without using Java. You do not need to read file completely before processing. You should use an XML parser which supports XML streaming API. I recommend to use fast_xml which is too fast (it uses C functions to parse XML). After initializing stream parser state, in a loop (recursive function) you should read file chunk by chunk (for example 1024 byte each chunk) and give each chunk to parser. If parser finds new XML elements, it will send them to your callback process in form of erlang messages. In your callback process you can spawn more processes to work on each XML element.

Sign up to request clarification or add additional context in comments.

3 Comments

Hi, Thank you very much for answering, I will definitely check this option! The reason I want to use the Java parser that they have on this website is that they have also a lot of functions that extract information from this XML file (after parsing it), so I thought maybe it's better to use it, and then I can use all of those functions, instead of doing everything by myself. I will read about the fast_xml, but if you have any ideas about this connection with the Java parser and functions it will be great. Thanks!!
Yes, I saw it, I already wrote an example code and manage to connect between Erlang and Java, and this link is mostly about it. The question I am asking is a little bit more specific (because it's related to this XML file I need to parse) and more about how to approach this problem. As I said, I had a course on java programming, and another one on erlang programming, but this is the first time I combine them, and the first time I need to work with an XML file. So I have no experience with this type of problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.