Home » Difflib module in Python

Difflib module in Python

by maxguy71

This article will look at using the “difflib” module in Python.

This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce information about file differences in various formats, including HTML and context and unified diffs.

Differ class

This is a class for comparing sequences of lines of text, and producing human-readable differences or deltas. Differ uses SequenceMatcher both to compare sequences of lines, and to compare sequences of characters within similar (near-matching) lines.

Each line of a Differ delta begins with a two-letter code:

Code Meaning
‘-  line unique to sequence 1
<‘+  line unique to sequence 2
   line common to both sequences
‘?  line not present in either input sequence

Lets see an example

# importing the difflib module  
import difflib  
from difflib import Differ  
  
# the strings  
string_1 = "This is the first string to check"  
string_2 = "This is the second string to check"  
  
# using the splitlines() function  
lines_string1 = string_1.splitlines()  
lines_string2 = string_2.splitlines()  
  
# using the Differ() and compare() function  
diff = difflib.Differ()  
my_diff = diff.compare(lines_string1, lines_string2)  
  
# printing the results  
print("First String:", string_1)  
print("Second String:", string_2)  
print("Difference between the Strings")  
print('\n'.join(my_diff))  

This displayed the following

>>> %Run difflibediffer.py
First String: This is the first string to check
Second String: This is the second string to check
Difference between the Strings
- This is the first string to check
?             --- ^

+ This is the second string to check
?              ^^^^^

get_close_matches method

get_close_matches(wordpossibilitiesn=3cutoff=0.6)

Return a list of the best “good enough” matches. word is a sequence for which close matches are desired (typically a string), and possibilities is a list of sequences against which to match word (typically a list of strings).

Optional argument n (default 3) is the maximum number of close matches to return; n must be greater than 0.

Optional argument cutoff (default 0.6) is a float in the range [0, 1]. Possibilities that don’t score at least that similar to word are ignore

Lets look at an example

 

from difflib import get_close_matches

my_list1 = get_close_matches('mas', ['master', 'mask', 'basking', 'task', 'mass', 'massive', 'miss', 'mess'], n=1, cutoff=0.3)  
my_list2 = get_close_matches('mas', ['master', 'mask', 'basking', 'task', 'mass', 'massive', 'miss', 'mess'], n=2, cutoff=0.3)  
my_list3 = get_close_matches('mas', ['master', 'mask', 'basking', 'task', 'mass', 'massive', 'miss', 'mess'], n=3, cutoff=0.3)  

print("Matching words:", my_list1)
print("Matching words:", my_list2)
print("Matching words:", my_list3)  

This displayed the following

>>> %Run diffligclosematches.py
Matching words: ['mass']
Matching words: ['mass', 'mask']
Matching words: ['mass', 'mask', 'master']

SequenceMatcher class

The SequenceMatcher method will compare two provided strings and return the data representing the similarity between the two strings

You can use the ratio object to return a measure of the sequences’ similarity as a float in the range

Lets look at an example

# importing the difflib library 
import difflib  
from difflib import SequenceMatcher  
  
# strings  
string_1 = "This is the first string to check"  
string_2 = "This is the second string to check"  
  
# using the SequenceMatcher() function  
my_sequence = SequenceMatcher(a = string_1, b = string_2)  
  
# printing the result  
print("First String:", string_1)  
print("Second String:", string_2)  
print("Sequence Matched:", my_sequence.ratio()) 

This displayed the following

>>> %Run difflibsequence.py
First String: This is the first string to check
Second String: This is the second string to check
Sequence Matched: 0.8656716417910447

unified_diff class

difflib.unified_diff(a, b, fromfile=”, tofile=”, fromfiledate=”, tofiledate=”, n=3, lineterm=’\n’)

Compare a and b (lists of strings); return a delta (a generator generating the delta lines) in unified diff format.

Unified diffs are a compact way of showing just the lines that have changed plus a few lines of context.

The changes are shown in an inline style. The number of context lines is set by n which defaults to three.

Lets look an example

# importing the required modules  
import sys  
import difflib  
from difflib import unified_diff  
  
# defining the string variables  
string1 = ['C++\n', 'Java\n', 'Python\n', 'Javascript\n', 'HTML\n', 'Programming\n']  
string2 = ['Python\n', 'Lua\n', 'Perl\n', 'Go\n', 'Rust\n', 'Programming\n']  
  
# using the unified_diff() function  
sys.stdout.writelines(unified_diff(string1, string2))  

This displayed the following

>>> %Run difflibunified.py
--- 
+++ 
@@ -1,6 +1,6 @@
-C++
-Java
 Python
-Javascript
-HTML
+Lua
+Perl
+Go
+Rust
 Programming

 

Links

https://docs.python.org/3/library/difflib.html

You may also like

Leave a Comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More