-
Notifications
You must be signed in to change notification settings - Fork 21
Create a new cell when the list of cells is empty #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hbcarlos
Could you start adding doc string and test please (for the element you are changing)?
jupyter_ydoc/ydoc.py
Outdated
| { | ||
| "cell_type": "code", | ||
| "execution_count": None, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": "", | ||
| "id": str(uuid4()), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth not repeating with:
jupyter_ydoc/jupyter_ydoc/ydoc.py
Lines 178 to 185 in 97fe54b
| { | |
| "cell_type": "code", | |
| "execution_count": None, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": "", | |
| "id": str(uuid4()), | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? It creates a new uuid. It will be a different cell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same code is duplicated, it should be DRY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create a function in order to not duplicate code:
def get_empty_code_cell():
return {
"cell_type": "code",
"execution_count": None,
"metadata": {},
"outputs": [],
"source": "",
"id": str(uuid4()),
}| def __del__(self): | ||
| self._ycells.unobserve(self.on_cells_changed) | ||
|
|
||
| def initialize(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not part of __init__?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if we observe in __init__, we call the callback with the changes from the initialization.
We first instaciate the YNotebook (__init__), and then we add the content (from disk or the updates from the store). If we observe ycell in the __init__, we will call the callback as we add updates from the store. If this store has an update that leaves the cell list empty, we will add a cell before we finish the initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is initialize() called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In jupyter_server_ydoc after adding the updates from the store or from disk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this logic should live in jupyter_server_ydoc? We already observe document changes there, in order to auto-save after each change.
If we keep it here in jupyter_ydoc, I think we should change the name: initialize() doesn't say much about what it's really doing. Maybe something like ensure_not_empty() would be better? For a YFile, it would ensure that there is at least a new line character. But as it would be an opt-in, it's not too much opinionated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this logic should live in jupyter_server_ydoc? We already observe document changes there, in order to auto-save after each change.
jupyterlab/jupyter-collaboration#37 (comment)
I think we should keep it here in the jupyter_ydoc.
If we keep it here in jupyter_ydoc, I think we should change the name: initialize() doesn't say much about what it's really doing. Maybe something like ensure_not_empty() would be better? For a YFile, it would ensure that there is at least a new line character. But as it would be an opt-in, it's not too much opinionated.
The idea of initialize is to be generic. When a document needs to do some specific operations after being initialized (like listening for changes in the cell list), the document knows that this method will be called after initializing the document.
I don't think it is a good idea to enforce a YFile containing at least a new line character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of
initializeis to be generic.
But in this case it's not generic at all, it's adding a cell when there is no cell anymore.
I don't think it is a good idea to enforce a YFile containing at least a new line character.
Why not? It's the equivalent of what you are doing with notebooks. And it's not enforced, as it would be an opt-in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in this case it's not generic at all, it's adding a cell when there is no cell anymore.
It is generic for the YBaseDoc. Then each document can overwrite initialize as needed.
Why not? It's the equivalent of what you are doing with notebooks.
In the case of the notebook, we need to force a non-empty notebook due to a UX design. However, it is unnecessary for the YFile, so I would not do it because we are introducing unnecessary code. In addition, this will probably break the UI of the editor. The editor situates the cursor in the first line when the document is empty. What will happen if we don't have blank documents but documents with a line break? Will the editor always have two lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, I don't think adding a new empty cell is a good idea either. However, as you can see in this discussion no one wants to break the actual UX and the best way of keeping it is by adding the cell here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is generic for the YBaseDoc. Then each document can overwrite
initializeas needed.
Then why not overwriting initialize where you actually have that UX issue (in JupyterLab)? It seems to me that you are propagating a JupyterLab issue down to jupyter-ydoc, which is front-end agnostic.
jupyter_ydoc/ydoc.py
Outdated
| if len(event.target) == 0: | ||
| # We can not update a YItem from the callback | ||
| # create a task to do it later | ||
| asyncio.create_task(self.append_new_cell()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running in an event loop is a big requirement, as until now jupyter-ydoc didn't assume that.
Is there another way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there another way?
I don't know another way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @Waidhoferj knows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to append to the array inside the observe callback. The following works for me:
def test_operation_inside_observe():
d = Y.YDoc()
array = d.get_array("array")
def push_to_array(item):
nonlocal array
nonlocal d
with d.begin_transaction() as txn:
last_insertion = item.delta[-1]["insert"][-1]
if len(array) < 10:
array.append(txn, last_insertion)
else:
print(array)
array.observe(push_to_array)
with d.begin_transaction() as txn:
array.extend(txn, ["ba", "na"])
---
Output: ['ba', 'na', 'na', 'na', 'na', 'na', 'na', 'na', 'na', 'na']There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks John. I would really like jupyter_ydoc to remain a fully synchronous library, as is y-py. It is okay to have jupyter_server_ydoc or ypy-websocket rely on async, because they are inherently network-based, but jupyter_ydoc should be usable by other libraries in a non-async environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same is true for observe_after_transaction (it is fired at the end of the current transaction - but not technically after the transaction).
You could make the change directly after the callback. E.g. after a timeout (+ you should perform a sanity check first like checking the document size first).
In general, manipulating a Y* document within (or because of) an observer call is an anti-pattern. A few reasons:
- Another observer will be fired because of your change which can trigger yet another type of change. It can result in infinite loops.
- If you have two actors who "insert a new cell if the document is empty", then you can easily end up with two (or more) cells because every actor will try to "fix" the document, leading to yet another broken document.
- Most importantly (not very obvious): You can get into unpredictable states. If you manipulate a type within an observer call - before other observers are fired - it is unclear in which order the observers for the new change will be fired. In any case, the observer for the first change can be fired even though the document has already been manipulated by change2, which often leads to inconsistencies when accessing the data using the supplied diff.
By all means, it should be avoided to rely on a central server that "cleans up" the document. It might make some things easier but will hurt in the long-run and lead to issues that are hard to debug.
A better pattern would be that the user inserts a new cell when it deletes the last cell. There is a small chance that two users delete the last cell simultaneously. Hence, you should principally allow to have an empty document. In this rare case, users can still fix the document by clicking the "new cell" button.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better pattern would be that the user inserts a new cell when it deletes the last cell.
It's actually the current state, but when there are N clients, this leads to N cells being inserted, because all clients see the document as empty. That's the bug we are trying to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually the current state,
It is not the current state. We are adding the cell automatically as a reaction to an update on the document (an update in the YNotebook).
As Kevin explained above, every operation on the document should come from the user's interactions, not from an update in the document. What we are doing in this PR works because there is only one client that adds the cell, but it is wrong because we are reacting to a change.
but when there are N clients, this leads to N cells being inserted, because all clients see the document as empty. That's the bug we are trying to fix.
And the best solution is to show an empty notebook to the user, and then the user has to click a button to insert the first cell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the same that I'm doing here. That's wrong, it works because I'm adding the cell only when we are not in collaborative mode (only one client is adding the cell). But what do you think will happens, If we, one day, decide to allow users to open a notebook twice in the same JupyterLab instance? We will have two clients reacting to, updating, and adding the cell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well put @hbcarlos. The change should not happen because of a reaction to an event. I think a good place for inserting the new cell would be in the 'deleteCell' method in the shared model.
| def _get_empty_code_cell(): | ||
| return { | ||
| "cell_type": "code", | ||
| "execution_count": None, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": "", | ||
| "id": str(uuid4()), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's not a method, it should be a free function outside of the class.
Ideally, we should use nbformat (see #46).
Fixes jupyterlab/jupyterlab#13141