-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat(Dir/Difference): See Difference Between Dir Files & Content #20
base: master
Are you sure you want to change the base?
Changes from all commits
f727e45
47df031
0ad3fe6
f3c03aa
2704e02
fde981c
aaa2cac
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,3 +15,4 @@ travis-ci = { repository = "steveklabnik/dir-diff" } | |
|
||
[dependencies] | ||
walkdir = "2.0.1" | ||
matches = "0.1.7" | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,15 +9,15 @@ | |
//! extern crate dir_diff; | ||
//! | ||
//! assert!(dir_diff::is_different("dir/a", "dir/b").unwrap()); | ||
//! ``` | ||
//! | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why was the closing of the example code fence removed? |
||
|
||
extern crate walkdir; | ||
|
||
use std::fs::File; | ||
use std::io::prelude::*; | ||
use std::path::Path; | ||
use std::cmp::Ordering; | ||
|
||
use std::io::prelude::Read; | ||
use std::path::Path; | ||
use std::path::PathBuf; | ||
use std::{fs, fs::File}; | ||
use walkdir::{DirEntry, WalkDir}; | ||
|
||
/// The various errors that can happen when diffing two directories | ||
|
@@ -26,6 +26,22 @@ pub enum Error { | |
Io(std::io::Error), | ||
StripPrefix(std::path::StripPrefixError), | ||
WalkDir(walkdir::Error), | ||
/// One directory has more or less files than the other. | ||
MissingFiles, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Keep this for now to minimize churn as this PR evolves but I suspect we'll want to split out the diffing error information. |
||
/// File name doesn't match. | ||
FileNameMismatch(PathBuf, PathBuf), | ||
/// Binary contetn doesn't match. | ||
BinaryContentMismatch(PathBuf, PathBuf), | ||
/// One file has more or less lines than the other. | ||
FileLengthMismatch(PathBuf, PathBuf), | ||
/// The content of a file doesn't match. | ||
ContentMismatch { | ||
line_number: usize, | ||
a_path: PathBuf, | ||
b_path: PathBuf, | ||
a_content: String, | ||
b_content: String, | ||
}, | ||
} | ||
|
||
/// Are the contents of two directories different? | ||
|
@@ -45,7 +61,8 @@ pub fn is_different<A: AsRef<Path>, B: AsRef<Path>>(a_base: A, b_base: B) -> Res | |
let a = a?; | ||
let b = b?; | ||
|
||
if a.depth() != b.depth() || a.file_type() != b.file_type() | ||
if a.depth() != b.depth() | ||
|| a.file_type() != b.file_type() | ||
|| a.file_name() != b.file_name() | ||
|| (a.file_type().is_file() && read_to_vec(a.path())? != read_to_vec(b.path())?) | ||
{ | ||
|
@@ -56,13 +73,104 @@ pub fn is_different<A: AsRef<Path>, B: AsRef<Path>>(a_base: A, b_base: B) -> Res | |
Ok(!a_walker.next().is_none() || !b_walker.next().is_none()) | ||
} | ||
|
||
/// Identify the differences between two directories. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Context: This is more for design input then to implemented right now We are only offering directory diffing. Subset checks would also be very valuable, possibly more valuable at least when writing tests to keep the "golden" case simple to minimize expanding a test to cover more than is intended. Later I mention
That makes it really easy for us to also offer a subset check, we only do the first iteration. One more benefit for the iteration approach. |
||
/// | ||
/// # Examples | ||
/// | ||
/// ```no_run | ||
/// extern crate dir_diff; | ||
/// | ||
/// assert_eq!(dir_diff::see_difference("main/dir1", "main/dir1").unwrap(), ()); | ||
/// ``` | ||
pub fn see_difference<A: AsRef<Path>, B: AsRef<Path>>(a_base: A, b_base: B) -> Result<(), Error> { | ||
let mut files_a = walk_dir_and_strip_prefix(&a_base) | ||
.into_iter() | ||
.collect::<Vec<_>>(); | ||
let mut files_b = walk_dir_and_strip_prefix(&b_base) | ||
.into_iter() | ||
.collect::<Vec<_>>(); | ||
|
||
if files_a.len() != files_b.len() { | ||
return Err(Error::MissingFiles); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an interesting trade-off between performance and information. For people who want fast results, this is great. For people who want to know what all is different, this doesn't work out too well. Two things about this. First, if we're trying to optimize, I'm unsure which half of this is slower, the part above here (walk two directory tries, put all content into Second, I think it might be better to offer the client the choice. We can handle both of these if we structured this as an iterator over two directories and then just offered them a This is how cobalt's diffing works in its tests |
||
} | ||
|
||
files_a.sort(); | ||
files_b.sort(); | ||
|
||
for (a, b) in files_a.into_iter().zip(files_b.into_iter()).into_iter() { | ||
if a != b { | ||
return Err(Error::FileNameMismatch(a, b)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While this is convenient for writing, I don't think users will understand what a file name mismatch means. Instead this is about one of the two files is missing in one of the trees. Ideally, we tell them that and tell them which one is missing. |
||
} | ||
|
||
let full_path_a = &a_base.as_ref().join(&a); | ||
let full_path_b = &b_base.as_ref().join(&b); | ||
|
||
if full_path_a.is_dir() || full_path_b.is_dir() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But what if one is a directory and the other isn't? Won't this silently ignore that? |
||
continue; | ||
} | ||
|
||
let content_of_a = fs::read(full_path_a)?; | ||
let content_of_b = fs::read(full_path_b)?; | ||
|
||
match ( | ||
String::from_utf8(content_of_a), | ||
String::from_utf8(content_of_b), | ||
) { | ||
(Err(content_of_a), Err(content_of_b)) => { | ||
if content_of_a.as_bytes() != content_of_b.as_bytes() { | ||
return Err(Error::BinaryContentMismatch(a, b)); | ||
} | ||
} | ||
(Ok(content_of_a), Ok(content_of_b)) => { | ||
let mut a_lines = content_of_a.lines().collect::<Vec<&str>>(); | ||
let mut b_lines = content_of_b.lines().collect::<Vec<&str>>(); | ||
|
||
if a_lines.len() != b_lines.len() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rather than writing our own file diffing, we should probably use the This is a re-phrasing of a previous posting that is now collapsed:
|
||
return Err(Error::FileLengthMismatch(a, b)); | ||
} | ||
|
||
for (line_number, (line_a, line_b)) in | ||
a_lines.into_iter().zip(b_lines.into_iter()).enumerate() | ||
{ | ||
if line_a != line_b { | ||
return Err(Error::ContentMismatch { | ||
a_path: a, | ||
b_path: b, | ||
a_content: line_a.to_string(), | ||
b_content: line_b.to_string(), | ||
line_number, | ||
}); | ||
} | ||
} | ||
} | ||
_ => return Err(Error::BinaryContentMismatch(a, b)), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This function now returns the first failure. For some cases that might work. For other cases, that might not get them the context to debug what is wrong. If we switch to the iterator approach mentioned earlier, it'll let the user decide which policy they want. |
||
} | ||
} | ||
|
||
Ok(()) | ||
} | ||
|
||
fn walk_dir<P: AsRef<Path>>(path: P) -> std::iter::Skip<walkdir::IntoIter> { | ||
WalkDir::new(path) | ||
.sort_by(compare_by_file_name) | ||
.into_iter() | ||
.skip(1) | ||
} | ||
|
||
/// Iterated through a directory, and strips the prefix of each path. | ||
fn walk_dir_and_strip_prefix<'a, P>(prefix: P) -> impl Iterator<Item = PathBuf> | ||
where | ||
P: AsRef<Path> + Copy, | ||
{ | ||
WalkDir::new(prefix) | ||
.into_iter() | ||
.filter_map(Result::ok) | ||
.filter_map(move |e| { | ||
let new_path = e.path(); | ||
new_path.strip_prefix(&prefix).map(|e| e.to_owned()).ok() | ||
}) | ||
} | ||
|
||
fn compare_by_file_name(a: &DirEntry, b: &DirEntry) -> Ordering { | ||
a.file_name().cmp(b.file_name()) | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
testing testing |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
oh no! |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
text |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
text |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
hello |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
hello | ||
master |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
text |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
text |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
dd |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
text |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
some text from dir1 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
some text |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
some text |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,11 @@ | ||
extern crate dir_diff; | ||
#[macro_use] | ||
extern crate matches; | ||
|
||
use std::path::Path; | ||
use dir_diff::Error::*; | ||
use std::fs::create_dir; | ||
use std::io::ErrorKind; | ||
use std::path::{Path, PathBuf}; | ||
|
||
#[test] | ||
fn easy_good() { | ||
|
@@ -63,3 +66,125 @@ fn filedepth() { | |
dir_diff::is_different("tests/filedepth/desc/dir1", "tests/filedepth/desc/dir2").unwrap() | ||
); | ||
} | ||
|
||
#[test] | ||
fn missing_file() { | ||
assert_matches!( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is neat. I've never heard of this crate before. I'll need to see where it'd simplify my tests. |
||
dir_diff::see_difference("tests/missing_file/dir1", "tests/missing_file/dir2"), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For every test case, we should probably ensure |
||
Err(MissingFiles) | ||
); | ||
|
||
assert_matches!( | ||
dir_diff::see_difference("tests/missing_dir/dir1", "tests/missing_dir/dir2"), | ||
Err(MissingFiles) | ||
); | ||
} | ||
|
||
#[test] | ||
fn file_length_mismatch() { | ||
assert_matches!( | ||
dir_diff::see_difference( | ||
"tests/file_length_mismatch/dir1", | ||
"tests/file_length_mismatch/dir2", | ||
), | ||
Err(FileLengthMismatch(_, _)) | ||
); | ||
} | ||
|
||
#[test] | ||
fn binary_content_mismatch() { | ||
let expected_binary_filename_a = PathBuf::from("rust-logo.png"); | ||
let expected_binary_filename_b = PathBuf::from("rust-logo.png"); | ||
|
||
let result = dir_diff::see_difference("tests/binary/bad/dir1", "tests/binary/bad/dir2"); | ||
assert_matches!(result, Err(BinaryContentMismatch(_, _))); | ||
|
||
let result = result.unwrap_err(); | ||
if let BinaryContentMismatch(a, b) = &result { | ||
if *a != expected_binary_filename_a || *b != expected_binary_filename_b { | ||
let expected = FileNameMismatch(expected_binary_filename_a, expected_binary_filename_b); | ||
panic!("{:?} doesn't match {:?}", &result, expected); | ||
} | ||
}; | ||
} | ||
|
||
#[test] | ||
fn dir_name_mismatch() { | ||
let expected_dir_a = PathBuf::from("dirA"); | ||
let expected_dir_b = PathBuf::from("dirB"); | ||
|
||
let result = dir_diff::see_difference( | ||
"tests/dir_name_mismatch/dir1", | ||
"tests/dir_name_mismatch/dir2", | ||
); | ||
assert_matches!(result, Err(FileNameMismatch(_, _))); | ||
|
||
let result = result.unwrap_err(); | ||
if let FileNameMismatch(a, b) = &result { | ||
if *a != expected_dir_a || *b != expected_dir_b { | ||
let expected = FileNameMismatch(expected_dir_a, expected_dir_b); | ||
panic!("{:?} doesn't match {:?}", &result, expected); | ||
} | ||
}; | ||
} | ||
|
||
#[test] | ||
fn file_name_mismatch() { | ||
let expected_file_a = PathBuf::from("b.txt"); | ||
let expected_file_b = PathBuf::from("a.txt"); | ||
|
||
let result = dir_diff::see_difference( | ||
"tests/file_name_mismatch/dir1", | ||
"tests/file_name_mismatch/dir2", | ||
); | ||
assert_matches!(result, Err(FileNameMismatch(_, _))); | ||
|
||
let result = result.unwrap_err(); | ||
if let FileNameMismatch(a, b) = &result { | ||
if *a != expected_file_a || *b != expected_file_b { | ||
let expected = FileNameMismatch(expected_file_a, expected_file_b); | ||
panic!("{:?} doesn't match {:?}", &result, expected); | ||
} | ||
}; | ||
} | ||
|
||
#[test] | ||
fn content_misatch() { | ||
let expected_a_path = PathBuf::from("test.txt"); | ||
let expected_b_path = PathBuf::from("test.txt"); | ||
let expected_a_content = String::from("testing testing"); | ||
let expected_b_content = String::from("oh no!"); | ||
|
||
let result = | ||
dir_diff::see_difference("tests/content_mismatch/dir1", "tests/content_mismatch/dir2"); | ||
|
||
assert_matches!(result, Err(ContentMismatch { .. })); | ||
let result = result.unwrap_err(); | ||
|
||
// Match the ContentMismatch result with th expected values. | ||
if let ContentMismatch { | ||
line_number, | ||
a_path, | ||
b_path, | ||
a_content, | ||
b_content, | ||
} = &result | ||
{ | ||
if *line_number != 0 | ||
|| *a_path != expected_a_path | ||
|| *b_path != expected_b_path | ||
|| *a_content != expected_a_content | ||
|| *b_content != expected_b_content | ||
{ | ||
let expected = ContentMismatch { | ||
line_number: 0, | ||
a_path: expected_a_path, | ||
b_path: expected_b_path, | ||
a_content: expected_a_content, | ||
b_content: expected_b_content, | ||
}; | ||
|
||
panic!("{:?} doesn't match {:?}", &result, expected); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is only used for tests. This should be under
dev-dependencies
so that clients don't get this added to their builds.For an example, see https://github.com/assert-rs/assert_cmd/blob/master/Cargo.toml