Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat(Dir/Difference): See Difference Between Dir Files & Content #20

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ travis-ci = { repository = "steveklabnik/dir-diff" }

[dependencies]
walkdir = "2.0.1"
matches = "0.1.7"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is only used for tests. This should be under dev-dependencies so that clients don't get this added to their builds.

For an example, see https://github.com/assert-rs/assert_cmd/blob/master/Cargo.toml

120 changes: 114 additions & 6 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@
//! extern crate dir_diff;
//!
//! assert!(dir_diff::is_different("dir/a", "dir/b").unwrap());
//! ```
//!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the closing of the example code fence removed?


extern crate walkdir;

use std::fs::File;
use std::io::prelude::*;
use std::path::Path;
use std::cmp::Ordering;

use std::io::prelude::Read;
use std::path::Path;
use std::path::PathBuf;
use std::{fs, fs::File};
use walkdir::{DirEntry, WalkDir};

/// The various errors that can happen when diffing two directories
Expand All @@ -26,6 +26,22 @@ pub enum Error {
Io(std::io::Error),
StripPrefix(std::path::StripPrefixError),
WalkDir(walkdir::Error),
/// One directory has more or less files than the other.
MissingFiles,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this for now to minimize churn as this PR evolves but I suspect we'll want to split out the diffing error information.

/// File name doesn't match.
FileNameMismatch(PathBuf, PathBuf),
/// Binary contetn doesn't match.
BinaryContentMismatch(PathBuf, PathBuf),
/// One file has more or less lines than the other.
FileLengthMismatch(PathBuf, PathBuf),
/// The content of a file doesn't match.
ContentMismatch {
line_number: usize,
a_path: PathBuf,
b_path: PathBuf,
a_content: String,
b_content: String,
},
}

/// Are the contents of two directories different?
Expand All @@ -45,7 +61,8 @@ pub fn is_different<A: AsRef<Path>, B: AsRef<Path>>(a_base: A, b_base: B) -> Res
let a = a?;
let b = b?;

if a.depth() != b.depth() || a.file_type() != b.file_type()
if a.depth() != b.depth()
|| a.file_type() != b.file_type()
|| a.file_name() != b.file_name()
|| (a.file_type().is_file() && read_to_vec(a.path())? != read_to_vec(b.path())?)
{
Expand All @@ -56,13 +73,104 @@ pub fn is_different<A: AsRef<Path>, B: AsRef<Path>>(a_base: A, b_base: B) -> Res
Ok(!a_walker.next().is_none() || !b_walker.next().is_none())
}

/// Identify the differences between two directories.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context: This is more for design input then to implemented right now

We are only offering directory diffing. Subset checks would also be very valuable, possibly more valuable at least when writing tests to keep the "golden" case simple to minimize expanding a test to cover more than is intended.

Later I mention

We can handle both of these if we structured this as an iterator over two directories and then just offered them a diff function on the "tuple", they can then choose whether to filter for all differences, end on the first difference, etc. I'm assuming we can make the iterator by first walking one tree, checking for what exists in the other tree, and then walk the second tree, checking for what doesn't exist in the other tree.

That makes it really easy for us to also offer a subset check, we only do the first iteration. One more benefit for the iteration approach.

///
/// # Examples
///
/// ```no_run
/// extern crate dir_diff;
///
/// assert_eq!(dir_diff::see_difference("main/dir1", "main/dir1").unwrap(), ());
/// ```
pub fn see_difference<A: AsRef<Path>, B: AsRef<Path>>(a_base: A, b_base: B) -> Result<(), Error> {
let mut files_a = walk_dir_and_strip_prefix(&a_base)
.into_iter()
.collect::<Vec<_>>();
let mut files_b = walk_dir_and_strip_prefix(&b_base)
.into_iter()
.collect::<Vec<_>>();

if files_a.len() != files_b.len() {
return Err(Error::MissingFiles);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting trade-off between performance and information.

For people who want fast results, this is great. For people who want to know what all is different, this doesn't work out too well.

Two things about this.

First, if we're trying to optimize, I'm unsure which half of this is slower, the part above here (walk two directory tries, put all content into Vec) or the part below (diffing content).

Second, I think it might be better to offer the client the choice.

We can handle both of these if we structured this as an iterator over two directories and then just offered them a diff function on the "tuple", they can then choose whether to filter for all differences, end on the first difference, etc. I'm assuming we can make the iterator by first walking one tree, checking for what exists in the other tree, and then walk the second tree, checking for what doesn't exist in the other tree.

This is how cobalt's diffing works in its tests

}

files_a.sort();
files_b.sort();

for (a, b) in files_a.into_iter().zip(files_b.into_iter()).into_iter() {
if a != b {
return Err(Error::FileNameMismatch(a, b));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is convenient for writing, I don't think users will understand what a file name mismatch means. Instead this is about one of the two files is missing in one of the trees. Ideally, we tell them that and tell them which one is missing.

}

let full_path_a = &a_base.as_ref().join(&a);
let full_path_b = &b_base.as_ref().join(&b);

if full_path_a.is_dir() || full_path_b.is_dir() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if one is a directory and the other isn't? Won't this silently ignore that?

continue;
}

let content_of_a = fs::read(full_path_a)?;
let content_of_b = fs::read(full_path_b)?;

match (
String::from_utf8(content_of_a),
String::from_utf8(content_of_b),
) {
(Err(content_of_a), Err(content_of_b)) => {
if content_of_a.as_bytes() != content_of_b.as_bytes() {
return Err(Error::BinaryContentMismatch(a, b));
}
}
(Ok(content_of_a), Ok(content_of_b)) => {
let mut a_lines = content_of_a.lines().collect::<Vec<&str>>();
let mut b_lines = content_of_b.lines().collect::<Vec<&str>>();

if a_lines.len() != b_lines.len() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than writing our own file diffing, we should probably use the difference crate .

This is a re-phrasing of a previous posting that is now collapsed:

Depending on how we solve binary files, should this instead use the difference crate rather than re-implementing file diffing ourselves?

return Err(Error::FileLengthMismatch(a, b));
}

for (line_number, (line_a, line_b)) in
a_lines.into_iter().zip(b_lines.into_iter()).enumerate()
{
if line_a != line_b {
return Err(Error::ContentMismatch {
a_path: a,
b_path: b,
a_content: line_a.to_string(),
b_content: line_b.to_string(),
line_number,
});
}
}
}
_ => return Err(Error::BinaryContentMismatch(a, b)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function now returns the first failure.

For some cases that might work. For other cases, that might not get them the context to debug what is wrong.

If we switch to the iterator approach mentioned earlier, it'll let the user decide which policy they want.

}
}

Ok(())
}

fn walk_dir<P: AsRef<Path>>(path: P) -> std::iter::Skip<walkdir::IntoIter> {
WalkDir::new(path)
.sort_by(compare_by_file_name)
.into_iter()
.skip(1)
}

/// Iterated through a directory, and strips the prefix of each path.
fn walk_dir_and_strip_prefix<'a, P>(prefix: P) -> impl Iterator<Item = PathBuf>
where
P: AsRef<Path> + Copy,
{
WalkDir::new(prefix)
.into_iter()
.filter_map(Result::ok)
.filter_map(move |e| {
let new_path = e.path();
new_path.strip_prefix(&prefix).map(|e| e.to_owned()).ok()
})
}

fn compare_by_file_name(a: &DirEntry, b: &DirEntry) -> Ordering {
a.file_name().cmp(b.file_name())
}
Expand Down
1 change: 1 addition & 0 deletions tests/content_mismatch/dir1/test.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
testing testing
1 change: 1 addition & 0 deletions tests/content_mismatch/dir2/test.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
oh no!
1 change: 1 addition & 0 deletions tests/dir_name_mismatch/dir1/a.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
text
1 change: 1 addition & 0 deletions tests/dir_name_mismatch/dir2/a.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
text
1 change: 1 addition & 0 deletions tests/file_length_mismatch/dir1/a.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
hello
2 changes: 2 additions & 0 deletions tests/file_length_mismatch/dir2/a.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
hello
master
1 change: 1 addition & 0 deletions tests/file_name_mismatch/dir1/b.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
text
1 change: 1 addition & 0 deletions tests/file_name_mismatch/dir2/a.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
text
1 change: 1 addition & 0 deletions tests/missing_dir/dir1/a.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
dd
1 change: 1 addition & 0 deletions tests/missing_dir/dir2/a.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
text
1 change: 1 addition & 0 deletions tests/missing_file/dir1/a.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
some text from dir1
1 change: 1 addition & 0 deletions tests/missing_file/dir1/b.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
some text
1 change: 1 addition & 0 deletions tests/missing_file/dir2/a.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
some text
127 changes: 126 additions & 1 deletion tests/smoke.rs
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
extern crate dir_diff;
#[macro_use]
extern crate matches;

use std::path::Path;
use dir_diff::Error::*;
use std::fs::create_dir;
use std::io::ErrorKind;
use std::path::{Path, PathBuf};

#[test]
fn easy_good() {
Expand Down Expand Up @@ -63,3 +66,125 @@ fn filedepth() {
dir_diff::is_different("tests/filedepth/desc/dir1", "tests/filedepth/desc/dir2").unwrap()
);
}

#[test]
fn missing_file() {
assert_matches!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat. I've never heard of this crate before. I'll need to see where it'd simplify my tests.

dir_diff::see_difference("tests/missing_file/dir1", "tests/missing_file/dir2"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For every test case, we should probably ensure see_difference returns the same result as is_different.

Err(MissingFiles)
);

assert_matches!(
dir_diff::see_difference("tests/missing_dir/dir1", "tests/missing_dir/dir2"),
Err(MissingFiles)
);
}

#[test]
fn file_length_mismatch() {
assert_matches!(
dir_diff::see_difference(
"tests/file_length_mismatch/dir1",
"tests/file_length_mismatch/dir2",
),
Err(FileLengthMismatch(_, _))
);
}

#[test]
fn binary_content_mismatch() {
let expected_binary_filename_a = PathBuf::from("rust-logo.png");
let expected_binary_filename_b = PathBuf::from("rust-logo.png");

let result = dir_diff::see_difference("tests/binary/bad/dir1", "tests/binary/bad/dir2");
assert_matches!(result, Err(BinaryContentMismatch(_, _)));

let result = result.unwrap_err();
if let BinaryContentMismatch(a, b) = &result {
if *a != expected_binary_filename_a || *b != expected_binary_filename_b {
let expected = FileNameMismatch(expected_binary_filename_a, expected_binary_filename_b);
panic!("{:?} doesn't match {:?}", &result, expected);
}
};
}

#[test]
fn dir_name_mismatch() {
let expected_dir_a = PathBuf::from("dirA");
let expected_dir_b = PathBuf::from("dirB");

let result = dir_diff::see_difference(
"tests/dir_name_mismatch/dir1",
"tests/dir_name_mismatch/dir2",
);
assert_matches!(result, Err(FileNameMismatch(_, _)));

let result = result.unwrap_err();
if let FileNameMismatch(a, b) = &result {
if *a != expected_dir_a || *b != expected_dir_b {
let expected = FileNameMismatch(expected_dir_a, expected_dir_b);
panic!("{:?} doesn't match {:?}", &result, expected);
}
};
}

#[test]
fn file_name_mismatch() {
let expected_file_a = PathBuf::from("b.txt");
let expected_file_b = PathBuf::from("a.txt");

let result = dir_diff::see_difference(
"tests/file_name_mismatch/dir1",
"tests/file_name_mismatch/dir2",
);
assert_matches!(result, Err(FileNameMismatch(_, _)));

let result = result.unwrap_err();
if let FileNameMismatch(a, b) = &result {
if *a != expected_file_a || *b != expected_file_b {
let expected = FileNameMismatch(expected_file_a, expected_file_b);
panic!("{:?} doesn't match {:?}", &result, expected);
}
};
}

#[test]
fn content_misatch() {
let expected_a_path = PathBuf::from("test.txt");
let expected_b_path = PathBuf::from("test.txt");
let expected_a_content = String::from("testing testing");
let expected_b_content = String::from("oh no!");

let result =
dir_diff::see_difference("tests/content_mismatch/dir1", "tests/content_mismatch/dir2");

assert_matches!(result, Err(ContentMismatch { .. }));
let result = result.unwrap_err();

// Match the ContentMismatch result with th expected values.
if let ContentMismatch {
line_number,
a_path,
b_path,
a_content,
b_content,
} = &result
{
if *line_number != 0
|| *a_path != expected_a_path
|| *b_path != expected_b_path
|| *a_content != expected_a_content
|| *b_content != expected_b_content
{
let expected = ContentMismatch {
line_number: 0,
a_path: expected_a_path,
b_path: expected_b_path,
a_content: expected_a_content,
b_content: expected_b_content,
};

panic!("{:?} doesn't match {:?}", &result, expected);
}
}
}