This project was used to plan all the tasks that needed to get done for migrating from Bugzilla to Phabricator in November 2014. The migration has been finished successfully, hence this project is archived now.
Details
May 23 2024
Well well, followup:
May 22 2024
FYI, due to importing RT tickets into Phabricator on 2014-12-17 and due to some tickets originally created either in Bugzilla or in RT being access restricted, there is still a number of non-open tickets filed before 2014-12-17 without a closedEpoch value in the DB.
However the vast majority got fixed today (which might not necessarily fix any statistics if they do not rely on querying the closedEpoch value but e.g. on transaction log values or such).
Done. Thanks everyone!
May 17 2024
Database fiddler hero welcome here with production access. Unassigning since I'm not.
May 8 2024
The performance impact would be the same since the query plan uses the primary key. It would be just more verbose.
@Pppery at the moment the export followed T107254#9028797 so I'm quite sure we are already covered.
May 7 2024
Make sure you handle the situation where a ticket is closed on Bugzilla, and then later reopened on Phabricator (and thus is open now). The result is that its closedDate would be null and should not be set.
May 3 2024
To any RelEng hero:
LGTM from a quick glance
Oh, thanks Dzahn :3
Ignore my last comment. We can obtain that information using Arcanist and Conduit APIs 🌈 🦄
+-------------+ | closedEpoch | +-------------+ | 1714527215 | +-------------+
Can somebody please run this SQL command in production?
May 2 2024
And there's no reason to ignore mike.lifeguard - he was a valid bugzilla contributor, albeit now inactive.
Actually it looks like this bug was already reported and fixed above and I was following aklapper's quote of an old version of the SQL. But even the most recent version of the SQL (https://gitlab.wikimedia.org/valeriobozzolan/yet-another-bugzilla-parser/-/blob/master/data/T107254-migration-commands.sql) appears to be truncated.
Bug 1189/T3189 was open at the time Bugzilla was migrated to Phabricator. It was closed years later. There's nothing to do here.
Or, the scaper can surely ignore edits done by mike.lifeguard+bugs :D ihih
This doesn't make sense - a ticket can't be closed with status REOPENED.
There are a few entries that say:
UPDATE phabricator_maniphest.maniphest_task SET closedEpoch = 1247766128 WHERE closedEpoch IS NULL AND id = 3189; -- in date 2009-07-16 17:42:08 UTC BugZilla ID 1189 closed by user mike.lifeguard+bugs with status REOPENED
This doesn't make sense - a ticket can't be closed with status REOPENED.
Apr 16 2024
Apr 13 2024
Apr 12 2024
@valerio.bozzolan The SQL statements and comments appear in contradiction. For every comment that says "Skipped bug X - state was REOPENED" there is in fact a real update statement that does exactly what the comment says it doesn't. Is this intentional?
Here our generated SQL patch candidate ready for review:
Note: we have dump of Bugzilla data at https://dumps.wikimedia.org/other/bugzilla/ , as static html file and a database dump (without emails)
And... the script now takes seconds instead of weeks. Now we have enough extra Watts to shutdown a small nuclear plant somewhere 🌈
Apr 11 2024
Anyway a required step is to take P49618 and make a nice phabricator.csv.
Have anybody already checked the file mentioned in the description, that is, maybe an useful starting point?
@valerio.bozzolan I admit I didn't try to fully understand those 400 lines which seem to cover way way more stuff than needed here. :D Like why there is an array with statuses when we only care about last Status=Resolved etc. Or why care about ACTION_WITH_LINKS etc. (Also, stuff like WORKSFORME and REMIND and LATER and --- are no statuses but resolutions.) And if it would work on a static HTML dump like ours, instead of a real BZ instance.
Small note
Had another quick look and managed to get down to have only <table>...</table> left in the local bug activity page. But then interesting problem is dealing with the rowspan: No date cell in the Status change row when several actions were performed at once and 'Status change' is not listed as first row)
Apr 10 2024
Apr 9 2024
Note: we have dump of Bugzilla data at https://dumps.wikimedia.org/other/bugzilla/ , as static html file and a database dump (without emails)
To summarize: Would need to perform 57008 times the following steps:
- take the $PhabID
- subtract 2000 from the $PhabID to get the $BzID
- pull https://static-bugzilla.wikimedia.org/show_activity.cgi?id=$BzID (e.g. $data = file_get_contents('https://fooo');)
- scrape the last (!!) line where What==Statusand Added==RESOLVED to get the When value from that line - maybe https://nyamsprod.com/blog/extracting-data-from-html-table-in-php/ or such could be handy, because need to also handle HTML table rowspan etc
- convert the When value to epoch to get $epoch (e.g. echo strtotime("2014-11-21 00:15:49 UTC");)
- UPDATE phabricator_maniphest.maniphest_task SET closedEpoch = $epoch WHERE closedEpoch IS NULL AND id = $PhabID;
I obviously do not care who set the closed status (we cannot match non-existing accounts!), and I obviously do not care about creating fake transactions in the DB, but would only set that one closedEpoch column value, if at all.
Got it. In this case I'll just untag Collab.
Apr 8 2024
This is not resolved; it would require altering our DB
Closing based on the last comment.
Apr 2 2024
I sincerely do not remember what I was doing here :D will retry to do something during the next WMHack. I will continue from this point:
Jul 21 2023
Let's join this lovely pastebin with our scraped stuff