CodeIntel/codeintel/difflibex.py at master · SublimeCodeIntel/CodeIntel

History

1283 lines (1143 loc) · 50.5 KB

Raw

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

#!/usr/bin/env python

# ***** BEGIN LICENSE BLOCK *****

# Version: MPL 1.1/GPL 2.0/LGPL 2.1

# The contents of this file are subject to the Mozilla Public License

# Version 1.1 (the "License"); you may not use this file except in

# compliance with the License. You may obtain a copy of the License at

# http://www.mozilla.org/MPL/

# Software distributed under the License is distributed on an "AS IS"

# basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the

# License for the specific language governing rights and limitations

# under the License.

# The Original Code is Komodo code.

# The Initial Developer of the Original Code is ActiveState Software Inc.

# Contributor(s):

# ActiveState Software Inc

# Alternatively, the contents of this file may be used under the terms of

# either the GNU General Public License Version 2 or later (the "GPL"), or

# the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),

# in which case the provisions of the GPL or the LGPL are applicable instead

# of those above. If you wish to allow use of your version of this file only

# under the terms of either the GPL or the LGPL, and not to allow others to

# use your version of this file under the terms of the MPL, indicate your

# decision by deleting the provisions above and replace them with the notice

# and other provisions required by the GPL or the LGPL. If you do not delete

# the provisions above, a recipient may use your version of this file under

# the terms of any one of the MPL, the GPL or the LGPL.

# ***** END LICENSE BLOCK *****

"""difflibex -- some diff-related additions to difflib

Notes:

- Eventually it would be nice to have generic parsing of patch/diff content

of many formats.

"""

from __future__ import absolute_import

from __future__ import print_function

from six.moves import map

from six.moves import range

from six.moves import zip

__version_info__ = (0, 1, 0)

__version__ = '.'.join(map(str, __version_info__))

import os

from os.path import join, isfile

import sys

import re

from pprint import pprint, pformat

import glob

import traceback

import time

import logging

import optparse

import difflib

from difflib import SequenceMatcher # For getUnsavedChangeInstructions

from hashlib import md5

from codeintel2.common import LazyClassAttribute

import textinfo

from fileutils import walk_avoiding_cycles

#---- exceptions

class DiffLibExError(Exception):

pass

#---- globals

log = logging.getLogger("difflibex")

#log.setLevel(logging.DEBUG)

#---- main functions and classes

def unified_diff(a, b, fromfile='', tofile='', fromfiledate='',

tofiledate='', n=3, lineterm='\n'):

"""An slight extension of `difflib.unified_diff()` that properly

handles the compared files not having an end-of-line char at the

end of the file and the diff including those lines.

"""

for line in difflib.unified_diff(

a, b,

fromfile=fromfile, tofile=tofile,

fromfiledate=fromfiledate, tofiledate=tofiledate,

n=n, lineterm=lineterm):

if not line.endswith(lineterm):

# Handle not having an EOL at end of file

# (see Komodo Bug 74398).

yield line + lineterm

yield "\ No newline at end of file" + lineterm

else:

yield line

def infer_cwd_and_strip_from_path(path_in_diff, actual_path):

"""Try to infer an appropriate cwd and strip number given the starting

path in a diff and the actual path to the file.

This is useful when one wants to associate diff content with an actual file

on disk (e.g. to patch the file or jump to a corresponding line).

Returns (<cwd>, <strip>) where <strip> is a number as would be used for

the -p|--strip option to patch.exe. Raises DiffLibExError if could not

infer cwd & strip (with a reason why).

"""

# E.g. these:

# path_in_diff = blah/mozilla/config/milestone.pl

# actual_path = /home/trentm/moz/1.8.0/mozilla/config/milestone.pl

# should result in:

# commonsuffix = mozilla/config/milestone.pl

commonsuffix = _commonsuffix([path_in_diff, actual_path])

if not commonsuffix:

raise DiffLibExError("no common path suffix between '%s' and '%s'"

% (path_in_diff, actual_path))

# cwd = /home/trentm/moz/1.8.0

# strip = 1

cwd = _rstrippath(actual_path, len(_splitall(commonsuffix)))

strip = len(_splitall(path_in_diff)) - len(_splitall(commonsuffix))

return (cwd, strip)

def diff_file_contents(left_content, right_content,

left_filepath='', right_filepath=''):

"""Return a unified diff between the left and right contents."""

# See if the content differs.

if left_content == right_content:

# The content is the same.

return ""

# Perform unified diff of contents.

result = unified_diff(left_content.splitlines(1),

right_content.splitlines(1),

left_filepath, right_filepath)

return "".join(result)

def diff_local_directories(left_dirpath, right_dirpath):

"""Return a unified diff between the files in the left and right dirs.

If a path only exists on one side it will be assumed that the file on the

other side has zero content.

"""

left_relpaths = set()

left_dirpath_len = len(left_dirpath.rstrip(os.sep)) + 1

for dirpath, dirs, files in walk_avoiding_cycles(left_dirpath):

relpath = dirpath[left_dirpath_len:]

left_relpaths.update([join(relpath, name) for name in files])

right_relpaths = set()

right_dirpath_len = len(right_dirpath.rstrip(os.sep)) + 1

for dirpath, dirs, files in walk_avoiding_cycles(right_dirpath):

relpath = dirpath[right_dirpath_len:]

right_relpaths.update([join(relpath, name) for name in files])

common_relpaths = left_relpaths.intersection(right_relpaths)

# Files deleted (i.e. on the left but not on the right)

removed_relpaths = left_relpaths.difference(right_relpaths)

# Files added (i.e. on the right but not on the left)

added_relpaths = right_relpaths.difference(left_relpaths)

# Make one sorted list of the paths and their respective change types.

change_list = [(relpath, "common") for relpath in common_relpaths ] + \

[(relpath, "removed") for relpath in removed_relpaths ] + \

[(relpath, "added") for relpath in added_relpaths ]

change_list.sort()

result = []

for relpath, changetype in change_list:

left_path = join(left_dirpath, relpath)

right_path = join(right_dirpath, relpath)

left_filedata = ''

right_filedata = ''

hasBinaryContent = False

if changetype == "common" or changetype == "removed":

left_ti = textinfo.TextInfo.init_from_path(left_path,

follow_symlinks=True)

if left_ti.is_text:

left_filedata = left_ti.text

else:

hasBinaryContent = True

if changetype == "common" or changetype == "added":

right_ti = textinfo.TextInfo.init_from_path(right_path,

follow_symlinks=True)

if right_ti.is_text:

right_filedata = right_ti.text

else:

hasBinaryContent = True

if hasBinaryContent:

result.append("===================================================================\n"

"--- %s\n"

"+++ %s\n"

"Binary files differ\n"

% (left_path, right_path))

continue

# See if the files differ.

if (changetype == "common" and

md5(left_filedata).hexdigest() == md5(right_filedata).hexdigest()):

# The files are the same.

continue

# Perform unified diff of contents.

difflines = unified_diff(left_filedata.splitlines(1),

right_filedata.splitlines(1),

left_path, right_path)

result += difflines

return "".join(result)

def diff_multiple_local_filepaths(left_filepaths, right_filepaths,

left_displaypaths=None,

right_displaypaths=None):

"""Return a unified diff between the left and right filepaths.

If a filepath does not exist, it will be assumed that it is a file

of zero content.

"""

assert left_filepaths

assert right_filepaths

assert len(left_filepaths) == len(right_filepaths)

if left_displaypaths is None:

left_displaypaths = left_filepaths

if right_displaypaths is None:

right_displaypaths = right_filepaths

assert len(left_displaypaths) == len(right_displaypaths)

result = []

for left_path, right_path, left_display, right_display in zip(left_filepaths, right_filepaths,

left_displaypaths, right_displaypaths):

hasBinaryContent = False

left_filedata = ''

right_filedata = ''

if isfile(left_path):

ti = textinfo.TextInfo.init_from_path(left_path,

follow_symlinks=True)

if ti.is_text:

left_filedata = ti.text

else:

hasBinaryContent = True

if isfile(right_path):

ti = textinfo.TextInfo.init_from_path(right_path,

follow_symlinks=True)

if ti.is_text:

right_filedata = ti.text

else:

hasBinaryContent = True

if hasBinaryContent:

result.append("===================================================================\n"

"--- %s\n"

"+++ %s\n"

"Binary files differ\n"

% (left_path, right_path))

continue

# See if the files differ.

if (md5(left_filedata).hexdigest() == md5(right_filedata).hexdigest()):

# The files are the same.

continue

# Perform unified diff of contents.

result += unified_diff(left_filedata.splitlines(1),

right_filedata.splitlines(1),

left_display, right_display)

return "".join(result)

class Hunk:

def __init__(self, start_line, lines):

end_line = start_line + len(lines)

log.debug("lines %d-%d: hunk", start_line, end_line)

self.start_line = start_line

self.end_line = end_line

self.lines = lines

def pprint(self, indent=' '*8):

print("%shunk: lines %d-%d"\

% (indent, self.start_line, self.end_line))

class FileDiff:

"""A FileDiff represents diff content for one file. It is made up of one

or more chunks."""

def __init__(self, paths, header_start_line):

log.debug("line %s: create FileDiff, paths=%r",

header_start_line is None and '?' or header_start_line,

paths)

self.paths = paths

self.lines = None

self.header_start_line = header_start_line

self.hunks = []

self.diff_type = None

def __repr__(self):

return "<FileDiff: %d hunks, '%s' best path>" % (

len(self.hunks), self.best_path())

@property

def diff(self):

return "\n".join(self.lines)

def add_hunk(self, start_line, lines):

self.hunks.append(

Hunk(start_line, lines)

)

def best_path(self, cwd=None):

#XXX How to pick the best path?

if "p4 diff header" in self.paths:

path = self.paths["p4 diff header"]

elif "index" in self.paths:

path = self.paths["index"]

elif self.diff_type == "unified" and "+++" in self.paths:

path = self.paths["+++"]

elif self.diff_type == "context" and "---" in self.paths:

path = self.paths["---"]

elif self.paths:

path = list(self.paths.values())[0]

else:

return None

if not path or not cwd or os.path.isabs(path):

return path

return os.path.join(cwd, path)

def all_paths(self, cwd=None):

"""Return a list of possible paths for this hunk."""

best_path = self.best_path(cwd=cwd)

all_paths = [best_path]

for path in self.paths.values():

if path == best_path:

continue

if path and cwd and not os.path.isabs(path):

path = os.path.join(cwd, path)

all_paths.append(path)

return all_paths

def pprint(self, indent=' '*4):

best_path = self.best_path()

if best_path is None:

best_path = "???"

print("%s%s file diff of '%s' (%d hunks)"\

% (indent,

self.diff_type,

best_path,

len(self.hunks)))

for hunk in self.hunks:

hunk.pprint(indent*2)

class Diff:

"""A Diff represents some diff/patch content. At its most generic it is made

up of multiple FileDiff's.

"""

@LazyClassAttribute

def _patterns(self):

return {

"index":

re.compile(r"^Index:\s+(?P<path>.*?)\s*$"),

"p4 diff header":

# ==== //depot/foo.css#42 - c:\clientview\foo.css ====

# ==== //depot/foo.js#22 (xtext) ====

re.compile(r"^==== (?P<depotpath>.*?)#\d+ "

"(- (?P<path>.*?)|$.*?$) ====$"),

"---":

re.compile(r"(\+\+\+|---|\*\*\*)(\s+(?P<path>.*?)(\t.*?)?)?\s*$"),

"plain hunk header":

# E.g., '9c9', '185,187c185'

re.compile(r"^(?P<beforestartline>\d+)(,\d+)?"

"(?P<type>[acd])"

"(?P<afterstartline>\d+)(,\d+)?$"),

"context hunk header":

# E.g., '*** 32,37 ****', '--- 32,39 ----', '*** 1 ****'

re.compile(r"^([\*-]){3} (?P<startline>\d+)(,(?P<endline>\d+))? \1{4}$"),

"unified hunk header":

# E.g., '@@ -296,7 +296,8 @@'

# E.g., '@@ -1 +0,0 @@'

re.compile(r"^@@\s-(?P<beforestartline>\d+)(,(\d+))?\s"

"\+(?P<afterstartline>\d+)(,(\d+))?\s@@"),

}

def __init__(self, content):

self.file_diffs = []

self.parse(content)

def __repr__(self):

return "<Diff: %d files, %d hunks>"\

% (len(self.file_diffs),

sum([len(f.hunks) for f in self.file_diffs]))

def pprint(self):

print("diff (%s files)" % (len(self.file_diffs)))

for file_diff in self.file_diffs:

file_diff.pprint(indent=' '*4)

def parse(self, content):

r"""

p4 diff -du:

==== //depot/foo.css#42 - c:\clientview\foo.css ====

@@ ... @@

...

p4 diff -dc:

==== //depot/foo.css#42 - c:\clientview\foo.css ====

***************

*** 182,196 ****

...

p4 diff:

==== //depot/foo.css#42 - c:\clientview\foo.css ====

185,187c185

...

cvs diff -u:

Index: toolkit/xre/nsCommandLineServiceMac.cpp

===================================================================

RCS file: /cvsroot/mozilla/toolkit/xre/nsCommandLineServiceMac.cpp,v

retrieving revision 1.3

diff -d -u -r1.3 nsCommandLineServiceMac.cpp

--- toolkit/xre/nsCommandLineServiceMac.cpp 19 Feb 2005 22:41:59 -0000 1.3

+++ toolkit/xre/nsCommandLineServiceMac.cpp 19 Sep 2005 22:34:10 -0000

@@ -6,12 +6,12 @@

...

cvs diff -c:

Index: setup.py

===================================================================

RCS file: /cvsroot/pywin32/pywin32/setup.py,v

retrieving revision 1.31

diff -c -r1.31 setup.py

*** setup.py 11 Jan 2006 01:31:41 -0000 1.31

--- setup.py 3 Mar 2006 02:35:39 -0000

***************

...

cvs diff:

Index: setup.py

===================================================================

RCS file: /cvsroot/pywin32/pywin32/setup.py,v

retrieving revision 1.31

diff -r1.31 setup.py

9c9

...

svn:

Index: ref/foo.txt

===================================================================

--- ref/foo.txt (revision 897)

+++ ref/foo.txt (working copy)

...

Komodo's "Show Unsaved Changes":

Index: C:\trentm\as\Komodo-devel\src\codeintel\ci2.py

--- C:\trentm\as\Komodo-devel\src\codeintel\ci2.py

+++ C:\trentm\as\Komodo-devel\src\codeintel\ci2.py (unsaved)

@@ -360,7 +360,7 @@

...

"""

state = None

file_diff = None

paths = {}

lines = self.lines = content.splitlines(0)

idx = 0

while idx < len(lines):

line = lines[idx]

#print "%3d: %r" % (idx, line)

if state is None: # looking for diff header lines

first_tokens = line.split(None, 1)

if first_tokens:

first_token = first_tokens[0]

else:

first_token = ''

if line.startswith("Index:"):

match = self._patterns["index"].match(line)

if match:

paths["index"] = match.group("path")

log.debug("line %d: 'Index: ' line, path=%r",

idx, paths["index"])

if file_diff is None:

file_diff = FileDiff(paths, idx)

elif first_token == "diff":

# Note: Could parse the filename out of here, but

# that involves skipping cmdln switches. Punt

# for now.

# A "plain" hunk header sometimes follows a "diff "

# line.

log.debug("line %d: 'diff ' line", idx)

if file_diff is None:

file_diff = FileDiff(paths, idx)

if idx+1 < len(lines) \

and self._patterns["plain hunk header"]\

.match(lines[idx+1].rstrip()):

state = "plain"

elif line.startswith("==== "):

# Likely a 'p4 diff ...' header line.

match = self._patterns["p4 diff header"].match(line)

if match:

log.debug("line %d: p4 diff header line", idx)

paths["p4 diff header"] = match.group("path")

log.debug("line %d: p4 diff header line, path=%r",

idx, paths["p4 diff header"])

if file_diff is None:

file_diff = FileDiff(paths, idx)

# You can always tell the diff type from the

# line after the p4 diff header.

#XXX This is wrong. Sometimes there is *no*

# subsequent diff content.

if idx+1 < len(lines) and not lines[idx+1].strip():

# 'p4 describe' output includes an extra

# blank separation line here.

idx += 1

if idx+1 < len(lines):

if lines[idx+1].rstrip() == "*"*15:

state = "context"

elif lines[idx+1].startswith("@@"):

state = "unified"

else:

state = "plain"

elif first_token in ("---", "+++", "***"):

match = self._patterns["---"].match(line)

if match:

paths[first_token] = match.group("path")

log.debug("line %d: '%s ' line, path=%r",

idx, first_token, paths[first_token])

if file_diff is None:

file_diff = FileDiff(paths, idx)

if first_token == "+++":

state = "unified"

elif first_token == "---" \

and idx > 0 \

and lines[idx-1].strip() \

and lines[idx-1].split(None, 1)[0] == "***":

state = "context"

elif self._patterns["plain hunk header"].match(line.rstrip()):

if file_diff is None:

file_diff = FileDiff(paths, None)

state = "plain"

idx -= 1 # compensation for the subsequent increment

idx += 1

elif state == "plain":

log.debug("line %s: 'plain' file diff", idx)

file_diff.diff_type = "plain"

while idx < len(lines): # read in plain hunks

match = self._patterns["plain hunk header"].match(lines[idx])

if not match:

break

hunk_start_line = idx

idx += 1

hunk_type = match.group("type")

if hunk_type == 'a':

# HEADER

# > ...

while idx < len(lines) and lines[idx].startswith("> "):

idx += 1

elif hunk_type == 'd':

# HEADER

# < ...

while idx < len(lines) and lines[idx].startswith("< "):

idx += 1

elif hunk_type == 'c':

# HEADER

# < ...

# ---

# > ...

while idx < len(lines) and lines[idx].startswith("< "):

idx += 1

if idx >= len(lines) \

or not lines[idx].rstrip() == "---":

log.warn("unexpected line in middle of plain hunk: "

"%r (line %d)", lines[idx], idx)

break

idx += 1

while idx < len(lines) and lines[idx].startswith("> "):

idx += 1

else:

raise DiffLibExError("unexpected plain hunk header "

"type: '%s'" % hunk_type)

file_diff.add_hunk(hunk_start_line, lines[hunk_start_line:idx])

file_diff.lines = lines[file_diff.header_start_line:idx]

self.file_diffs.append(file_diff)

state = None

file_diff = None

paths = {}

elif state == "unified":

log.debug("line %d: 'unified' file diff", idx)

file_diff.diff_type = "unified"

while idx < len(lines): # read in unified hunks

if not lines[idx].startswith("@@ "): break

hunk_start_line = idx

idx += 1

while idx < len(lines) \

and (lines[idx].startswith("+")

or lines[idx].startswith("-")

or lines[idx].startswith(" ")

# Guard against "empty diff hunk line" + Komodo's

# "remove trailing whitespace on save" causing

# the leading ' ' to have been removed.

or not lines[idx]):

idx += 1

file_diff.add_hunk(hunk_start_line, lines[hunk_start_line:idx])

file_diff.lines = lines[file_diff.header_start_line:idx]

self.file_diffs.append(file_diff)

state = None

file_diff = None

paths = {}

elif state == "context":

log.debug("line %d: 'context' file diff", idx)

file_diff.diff_type = "context"

while idx < len(lines): # read in context hunks

if not lines[idx].rstrip() == "*"*15:

break

hunk_start_line = idx

idx += 1

if idx >= len(lines) \

or not lines[idx].startswith("*** "):

break

# Parse the header line.

idx += 1

while idx < len(lines) \

and (lines[idx].startswith("! ")

or lines[idx].startswith("+ ")

or lines[idx].startswith("- ")

or lines[idx].startswith(" ")

# Guard against "empty diff hunk line" + Komodo's

# "remove trailing whitespace on save" causing

# the leading ' ' to have been removed.

or not lines[idx]):

idx += 1

if idx >= len(lines) \

or not lines[idx].startswith("--- "):

break

idx += 1

while idx < len(lines) \

and (lines[idx].startswith("! ")

or lines[idx].startswith("+ ")

or lines[idx].startswith("- ")

or lines[idx].startswith(" ")

# Guard against "empty diff hunk line" + Komodo's

# "remove trailing whitespace on save" causing

# the leading ' ' to have been removed.

or not lines[idx]):

idx += 1

file_diff.add_hunk(hunk_start_line, lines[hunk_start_line:idx])

file_diff.lines = lines[file_diff.header_start_line:idx]

self.file_diffs.append(file_diff)

state = None

file_diff = None

paths = {}

else:

raise ValueError("unknown state: '%s'" % state)

def file_diff_and_hunk_from_pos(self, diff_line, diff_col):

"""Return the file_diff and hunk that this diff_line applies to."""

# We are generous here, allowing a line outside of the strict diff

# hunk area to apply to the following hunk.

if not self.file_diffs:

raise DiffLibExError("No file diffs are available")

for file_diff in self.file_diffs:

for hunk in file_diff.hunks:

if diff_line < hunk.end_line:

break

else:

continue

break

else:

# A generosity: if diff_line is *just* past the last diff hunk,

# then pretend it is in-range. Otherwise a common case in Komodo

# is to highlight a diff on the last line of a file an place the

# cursor on column 0 of the next line (to select the whole line).

# This technically place the cursor out of range.

if (diff_col == 0 and self.file_diffs and self.file_diffs[-1].hunks

and diff_line == self.file_diffs[-1].hunks[-1].end_line

file_diff = self.file_diffs[-1]

hunk = file_diff.hunks[-1]

diff_line -= 1

diff_col = len(self.lines[diff_line])

else:

raise DiffLibExError("(this one) line %s is not in a diff hunk"

% (diff_line+1))

return file_diff, hunk

def file_pos_from_diff_pos(self, diff_line, diff_col):

"""Return a file position for the given position in the diff content.

Where to set the file position isn't always obvious. The most literal

result would just be a simple line count into the diff hunk offset

by the start line from the hunk header. Eventually this could get

smarter and try to account for patch-like fuzz and offsets.

All line and column values are 0-based.

Returns a 3-tuple:

(<file-path>, <file-line>, <file-col>)

If a file position could not be found, then a DiffLibExError is raised

giving the reason (e.g. the diff position might not be in a diff hunk).

"""

file_diff, hunk = self.file_diff_and_hunk_from_pos(diff_line, diff_col)

file_path = file_diff.best_path()

log.debug("diff pos (%d, %d) is in a '%s' hunk", diff_line, diff_col,

file_path)

# Work down from the top of the hunk to find the file position.

# (Could move this out to format-specific Hunk sub-classes.)

if file_diff.diff_type == "unified":

# First line is the hunk header:

# @@ -A,B +C,D @@

# where,

# A is the file_before_line_start (1-based)

# B is the file_after_line_start (1-based)

# Subtract 1 to convert to 0-based line nums.

m = self._patterns["unified hunk header"].match(

self.lines[hunk.start_line])

# -1 to convert to 0-based

file_before_line = int(m.group("beforestartline")) - 1

file_after_line = int(m.group("afterstartline")) - 1

# range start: +1 to skip header

# range end: +1 to include diff_line bound

if diff_line < hunk.start_line+1:

# Not in the diff hunk content, just default to the first

# diff hunk line.

file_line = file_after_line

file_col = 0

else:

# -1 because the counting will add it back on the first line

file_before_line -= 1

file_after_line -= 1

for i in range(hunk.start_line+1, diff_line+1):

line = self.lines[i]

if not line or line[0] == ' ':

# 'not line' because Komodo's "remove trailing whitespace

# on save" might have removed it.

file_before_line += 1

file_after_line += 1

elif line[0] == '-':

file_before_line += 1

elif line[0] == '+':

file_after_line += 1

else:

# This is junk lines after the diff hunk.

raise DiffLibExError("line %s is not in a diff hunk"

% (diff_line+1))

if line and line[0] == '-':

file_line = file_before_line

else:

file_line = file_after_line

file_col = max(diff_col - 1, 0) # unified-diff prefix is 1 char

elif file_diff.diff_type == "context":

hunk_header_pat = self._patterns["context hunk header"]

file_col = max(diff_col - 2, 0) # context-diff prefix is 2 chars

state = "all stars"

i = hunk.start_line

while i < hunk.end_line:

line = self.lines[i]

log.debug("%3d: %r", i, line)

if state == "all stars":

# First line of hunk header: '***************'

if i >= diff_line:

# Use the file_before start line.

m = hunk_header_pat.match(self.lines[i+1])

file_line = int(m.group("startline")) - 1

file_col = 0

break

state = "before header"

elif state == "before header":

m = hunk_header_pat.match(line)

file_line = int(m.group("startline")) - 1

if i == diff_line:

file_col = 0

break

else:

file_line -= 1 # will be added back on first content line

state = "before content"

elif state == "before content":

if line[:2] in (" ", "! ", "- "):

file_line += 1

if i == diff_line:

break

elif line.startswith("--- "):

state = "after header"

i -= 1

else:

raise DiffLibExError("unexpected line in context "

"diff: %r" % line)

elif state == "after header":

m = hunk_header_pat.match(line)

file_line = int(m.group("startline")) - 1

if i == diff_line:

file_col = 0

break

else:

file_line -= 1 # will be added back on first content line

state = "after content"

elif state == "after content":

if line[:2] in (" ", "! ", "+ "):

file_line += 1

if i == diff_line:

break

else:

raise DiffLibExError("unexpected line in context "

"diff: %r" % line)

i += 1

elif file_diff.diff_type == "plain":

hunk_header_pat = self._patterns["plain hunk header"]

file_col = max(diff_col - 2, 0) # plain-diff prefix is 2 chars

state = "header"

i = hunk.start_line

while i < hunk.end_line:

line = self.lines[i]

log.debug("%3d: %r", i, line)

if state == "header":

m = hunk_header_pat.match(line)

file_before_line = int(m.group("beforestartline")) - 1

hunk_type = m.group("type")

file_after_line = int(m.group("afterstartline")) - 1

if i >= diff_line:

file_line = file_after_line

file_col = 0

break

else:

# -1 because will be added back on first content line.

if hunk_type == "a":

file_line = file_after_line - 1

state = "after content"

else: # hunk_type in ('c', 'd')

file_line = file_before_line - 1

state = "before content"

elif state == "before content":

if line[:2] == "< ":

file_line += 1

if i == diff_line:

break

elif line.rstrip() == "---":

state = "divider"

i -= 1

else:

raise DiffLibExError("unexpected line in plain "

"diff: %r" % line)

elif state == "divider":

if i == diff_line:

file_line = file_after_line

file_col = 0

break

else:

# -1 because will be added back on first content line.

file_line = file_after_line - 1

state = "after content"

elif state == "after content":

if line[:2] == "> ":

file_line += 1

if i == diff_line:

break

else:

raise DiffLibExError("unexpected line in plain "

"diff: %r" % line)

i += 1

else:

raise DiffLibExError("unrecognized diff type: '%s'"

% file_diff.diff_type)

return (file_path, file_line, file_col)

def get_changed_line_numbers_by_filepath(self):

"""A dict of filepaths and their changed line numbers (0 based)"""

result = {}

for file_diff in self.file_diffs:

file_path = file_diff.best_path()

result[file_path] = linenums = []

_, file_line, _ = self.file_pos_from_diff_pos(file_diff.header_start_line, 0)

for hunk in file_diff.hunks:

for line in hunk.lines:

if line.startswith(" "):

file_line += 1

elif line.startswith("+"):

linenums.append(file_line)

file_line += 1

return result

def possible_paths_from_diff_pos(self, diff_line, diff_col):

"""Return a list of all possible file paths for the given position.

If a file position could not be found, then a DiffLibExError is raised

giving the reason (e.g. the diff position might not be in a diff hunk).

"""

file_diff, hunk = self.file_diff_and_hunk_from_pos(diff_line, diff_col)

return file_diff.all_paths()

def _max_acceptable_edit_dist(s):

# Found this value largely by pulling it out of a hat. Replacements occupy

# a grey area between equal and replace/delete, so we say that any string

# with a levenstein value < 1/4 its length qualifies as a replacement.

return len(s) / 4.0

_split_opcodes_diffs = {} # Map md5(a) + md5(b) => {time:time, opcodes:list of opcodes}

def _get_hash_for_arrays(a, b):

key = md5("".join(a)).hexdigest() + md5("".join(b)).hexdigest()

currTime = time.time()

if key in _split_opcodes_diffs:

h = _split_opcodes_diffs[key]

h['time'] = currTime

return key

if len(_split_opcodes_diffs) >= 1000:

# If we have more than 1000 keys, remove the least recently used.

oldest_item = min([(x[1]['time'], x[0]) for x in _split_opcodes_diffs.items()])

del _split_opcodes_diffs[oldest_item[1]]

_split_opcodes_diffs[key] = {'time':currTime, 'opcodes':None}

return key

def split_opcodes(opcode, a, b, forceCalc=False):

"""

@param opcode: tuple of (tag:string="replace",

i1: start index of change to a,

i2: end index of change to a,

j1: start index of change to b,

j1: end index of change to b)

This function is called when i2 - i1 != j2 - j1, or in other words

unified-diff has decided we're replacing m lines with n != m lines.

We want to find out which lines are true replacements (similar), and

which are insertions or deletions.

@param a: list of strings (original values)

@param b: list of strings (current values)

@return: an array of new opcodes

Implement a modified Wagner-Fischer algorithm

to try to determine how these lines actually match up

References:

https://en.wikipedia.org/wiki/Wagner%E2%80%93Fischer_algorithm

If Wikipedia dies:

R.A. Wagner and M.J. Fischer. 1974. The String-to-String Correction Problem. Journal of the ACM, 21(1):168-173.

Overview:

Build a matrix of edit_distance levenshtein values from all possible paths

from line[i] in a to line[j] to b. a is the starting text, b is the

final text, and we want to determine a sequence of transformations from a

to b.

Given m = len(a) and n = len(b),

this is an m x n matrix, D. We want to find a path through the matrix

moving through the first dimension (corresponding to lines in a),

starting at D[0][0]. We also never backtrack on j.

If we find an entry d[i][j] = 0, this means lines a[i] and b[j] match.

Advance both i and j.

If d[i][j] <= some max value X (which normally depends on a[i]), it means

b[j] is a replacement for a[j].

If d[i][j] > X, then is there a value d[i][j'] <= X for j' > j?

If yes, treat b[j:j'] as inserted lines relative to a[i], and process d[i][j'] as above.

Otherwise, treat a[i] as a deleted line relative to b[j].

Remember to look for a run of inserted or deleted text at the end as well.

Now we could use a dynamic programming algorithm to determine the minimum

View remainder of file in raw view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

difflibex.py

Latest commit

History

difflibex.py

File metadata and controls