Skip to content

Conversation

@hbcarlos
Copy link
Contributor

@hbcarlos hbcarlos commented Oct 3, 2022

Copy link
Member

@fcollonval fcollonval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hbcarlos

Could you start adding doc string and test please (for the element you are changing)?

Comment on lines 212 to 232
{
"cell_type": "code",
"execution_count": None,
"metadata": {},
"outputs": [],
"source": "",
"id": str(uuid4()),
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth not repeating with:

{
"cell_type": "code",
"execution_count": None,
"metadata": {},
"outputs": [],
"source": "",
"id": str(uuid4()),
}
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? It creates a new uuid. It will be a different cell.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same code is duplicated, it should be DRY.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a function in order to not duplicate code:

def get_empty_code_cell():
    return { 
        "cell_type": "code", 
        "execution_count": None, 
        "metadata": {}, 
        "outputs": [], 
        "source": "", 
        "id": str(uuid4()), 
    }

def __del__(self):
self._ycells.unobserve(self.on_cells_changed)

def initialize(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not part of __init__?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if we observe in __init__, we call the callback with the changes from the initialization.
We first instaciate the YNotebook (__init__), and then we add the content (from disk or the updates from the store). If we observe ycell in the __init__, we will call the callback as we add updates from the store. If this store has an update that leaves the cell list empty, we will add a cell before we finish the initialization.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is initialize() called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In jupyter_server_ydoc after adding the updates from the store or from disk

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this logic should live in jupyter_server_ydoc? We already observe document changes there, in order to auto-save after each change.
If we keep it here in jupyter_ydoc, I think we should change the name: initialize() doesn't say much about what it's really doing. Maybe something like ensure_not_empty() would be better? For a YFile, it would ensure that there is at least a new line character. But as it would be an opt-in, it's not too much opinionated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this logic should live in jupyter_server_ydoc? We already observe document changes there, in order to auto-save after each change.

jupyterlab/jupyter-collaboration#37 (comment)
I think we should keep it here in the jupyter_ydoc.

If we keep it here in jupyter_ydoc, I think we should change the name: initialize() doesn't say much about what it's really doing. Maybe something like ensure_not_empty() would be better? For a YFile, it would ensure that there is at least a new line character. But as it would be an opt-in, it's not too much opinionated.

The idea of initialize is to be generic. When a document needs to do some specific operations after being initialized (like listening for changes in the cell list), the document knows that this method will be called after initializing the document.
I don't think it is a good idea to enforce a YFile containing at least a new line character.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of initialize is to be generic.

But in this case it's not generic at all, it's adding a cell when there is no cell anymore.

I don't think it is a good idea to enforce a YFile containing at least a new line character.

Why not? It's the equivalent of what you are doing with notebooks. And it's not enforced, as it would be an opt-in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in this case it's not generic at all, it's adding a cell when there is no cell anymore.

It is generic for the YBaseDoc. Then each document can overwrite initialize as needed.

Why not? It's the equivalent of what you are doing with notebooks.

In the case of the notebook, we need to force a non-empty notebook due to a UX design. However, it is unnecessary for the YFile, so I would not do it because we are introducing unnecessary code. In addition, this will probably break the UI of the editor. The editor situates the cursor in the first line when the document is empty. What will happen if we don't have blank documents but documents with a line break? Will the editor always have two lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, I don't think adding a new empty cell is a good idea either. However, as you can see in this discussion no one wants to break the actual UX and the best way of keeping it is by adding the cell here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is generic for the YBaseDoc. Then each document can overwrite initialize as needed.

Then why not overwriting initialize where you actually have that UX issue (in JupyterLab)? It seems to me that you are propagating a JupyterLab issue down to jupyter-ydoc, which is front-end agnostic.

if len(event.target) == 0:
# We can not update a YItem from the callback
# create a task to do it later
asyncio.create_task(self.append_new_cell())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running in an event loop is a big requirement, as until now jupyter-ydoc didn't assume that.
Is there another way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another way?

I don't know another way

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @Waidhoferj knows?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to append to the array inside the observe callback. The following works for me:

def test_operation_inside_observe():
    d = Y.YDoc()
    array = d.get_array("array")

    def push_to_array(item):
        nonlocal array
        nonlocal d
        with d.begin_transaction() as txn:
            last_insertion = item.delta[-1]["insert"][-1]
            if len(array) < 10:
                array.append(txn, last_insertion)
            else:
                print(array)

    array.observe(push_to_array)

    with d.begin_transaction() as txn:
        array.extend(txn, ["ba", "na"])

---
Output: ['ba', 'na', 'na', 'na', 'na', 'na', 'na', 'na', 'na', 'na']

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks John. I would really like jupyter_ydoc to remain a fully synchronous library, as is y-py. It is okay to have jupyter_server_ydoc or ypy-websocket rely on async, because they are inherently network-based, but jupyter_ydoc should be usable by other libraries in a non-async environment.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same is true for observe_after_transaction (it is fired at the end of the current transaction - but not technically after the transaction).

You could make the change directly after the callback. E.g. after a timeout (+ you should perform a sanity check first like checking the document size first).

In general, manipulating a Y* document within (or because of) an observer call is an anti-pattern. A few reasons:

  1. Another observer will be fired because of your change which can trigger yet another type of change. It can result in infinite loops.
  2. If you have two actors who "insert a new cell if the document is empty", then you can easily end up with two (or more) cells because every actor will try to "fix" the document, leading to yet another broken document.
  3. Most importantly (not very obvious): You can get into unpredictable states. If you manipulate a type within an observer call - before other observers are fired - it is unclear in which order the observers for the new change will be fired. In any case, the observer for the first change can be fired even though the document has already been manipulated by change2, which often leads to inconsistencies when accessing the data using the supplied diff.

By all means, it should be avoided to rely on a central server that "cleans up" the document. It might make some things easier but will hurt in the long-run and lead to issues that are hard to debug.

A better pattern would be that the user inserts a new cell when it deletes the last cell. There is a small chance that two users delete the last cell simultaneously. Hence, you should principally allow to have an empty document. In this rare case, users can still fix the document by clicking the "new cell" button.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better pattern would be that the user inserts a new cell when it deletes the last cell.

It's actually the current state, but when there are N clients, this leads to N cells being inserted, because all clients see the document as empty. That's the bug we are trying to fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually the current state,

It is not the current state. We are adding the cell automatically as a reaction to an update on the document (an update in the YNotebook).

As Kevin explained above, every operation on the document should come from the user's interactions, not from an update in the document. What we are doing in this PR works because there is only one client that adds the cell, but it is wrong because we are reacting to a change.

but when there are N clients, this leads to N cells being inserted, because all clients see the document as empty. That's the bug we are trying to fix.

And the best solution is to show an empty notebook to the user, and then the user has to click a button to insert the first cell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the same that I'm doing here. That's wrong, it works because I'm adding the cell only when we are not in collaborative mode (only one client is adding the cell). But what do you think will happens, If we, one day, decide to allow users to open a notebook twice in the same JupyterLab instance? We will have two clients reacting to, updating, and adding the cell.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well put @hbcarlos. The change should not happen because of a reaction to an event. I think a good place for inserting the new cell would be in the 'deleteCell' method in the shared model.

Comment on lines +203 to +211
def _get_empty_code_cell():
return {
"cell_type": "code",
"execution_count": None,
"metadata": {},
"outputs": [],
"source": "",
"id": str(uuid4()),
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's not a method, it should be a free function outside of the class.
Ideally, we should use nbformat (see #46).

@hbcarlos hbcarlos closed this by deleting the head repository Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants