Skip to content

Add new script for checking how up to date translations are #1492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
patricoferris opened this issue Apr 9, 2021 · 14 comments
Open

Add new script for checking how up to date translations are #1492

patricoferris opened this issue Apr 9, 2021 · 14 comments
Labels
medium More Complex Issues for Outreachy scripts

Comments

@patricoferris
Copy link
Contributor

Currently we have no insight into how up to date (or most likely not) the translations of the site are.

One partial solution for at least providing insight into the problem is have a script which compares the last known modification time (using git) of pages which are supposed to be a translation of each other. This script could then dump this list to stdout and at least then we would have a basis for integrating this into the build process (perhaps in the future it could mark the translated pages as likely being out of date w.r.t to probably the English version).

As a prototype we would probably need:

  • A new script which uses https://github.com/mirage/ocaml-git to compare this information. This would have to iterate over files in the site directory and do the comparisons.
  • A new Makefile target (e.g. make check) which builds and executes the script.

This was first suggested ocaml/ocaml.org#824 -- of course there are problems such as if a translation is partially updated (but still out of date) it will get a new modification time, but this is meant to provide insight rather than be total solution and is likely to be an improvement over having nothing at all.

@patricoferris patricoferris added scripts medium More Complex Issues for Outreachy labels Apr 9, 2021
@Srinithyee
Copy link
Contributor

@patricoferris can I work on this?

@gs0510
Copy link

gs0510 commented Apr 9, 2021

@Srinithyee sure go ahead! And ask any questions if you're stuck :)

@Srinithyee
Copy link
Contributor

@Srinithyee sure go ahead! And ask any questions if you're stuck :)

@gs0510 okay! I'll start working on this. Yes, I'll make sure I reach out to you guys when I am stuck. Thanks a lot :)

@Srinithyee
Copy link
Contributor

Srinithyee commented Apr 10, 2021

This script could then dump this list to stdout and at least then we would have a basis for integrating this into the build process (perhaps in the future it could mark the translated pages as likely being out of date w.r.t to probably the English version).

@patricoferris I'd like to clarify the issue, so that I understand better. What would you like the the list to hold? the time of the change of a translated page or something like a key value pair?

And by scripts? what did you mean exactly?

@Srinithyee
Copy link
Contributor

@patricoferris Here's how I plan to work on this issue:

  1. Create a map of the files in English to it's corresponding French files using module Map : Stdlib.Map.S with type Map.key = t
  2. use val compare_by_date : t -> t -> int through each iteration of the map
  3. Store the result of 2 in a list to be dumped to stdout.

Am I on the right track?

@patricoferris
Copy link
Contributor Author

Hi @Srinithyee,

That sounds okay. By scripts I mean an OCaml file in the script directory.

How you go about implementing it is entirely up to you, but be aware that there are some pages with more than just french translations. The list printed to stdout should be easily interpretable i.e. we need to know which two files are being compared and what the difference in time is, not just that there's a difference. Sorting them by least to most out of date might be a nice touch.

Once you are ready go ahead an open a PR and we can work on it there (especially for any help with the code) :)) Thanks.

@Srinithyee
Copy link
Contributor

Srinithyee commented Apr 12, 2021

@patricoferris being a beginner to OCaml, I have a couple of questions ( I'm sorry if they are too basic)

  1. Do you have a general format of how the script working with git should be?
  2. I went through this , but I'll need help with understanding the syntax and how to use compare_by_date and map
  3. Do you have any script that uses mirage? I am not too sure about how to use it. I did try running the example given in the readme here
    But, I ended with this
    image

image

Can you please help? :)

@patricoferris
Copy link
Contributor Author

Awesome work so far, to get to this point (and hopefully have an understanding of what's going on) is great! Mirage is a tool for building Unikernels, it has a lot of requirements so often libraries (like this one) are written for what we need (Unix) and for Mirage. We don't need to worry about that.

By the way, this is non-trivial OCaml code. I recommend having a quick read of https://mirage.io/wiki/tutorial-lwt to understand the Lwt bits.

The error you got is that you are trying to make a Commit module with the Make functor whose argument should be a Hash module (https://mirage.github.io/ocaml-git/git/Git/Commit/index.html) but you provided it with a Digestif.S module. So first you will need a Hash module to pass to a Commit.Make.

utop # module Hash = Git.Hash.Make(Digestif.SHA1);;
utop # module Commit = Git.Hash.Make(Hash);;

The compare_by_date function takes two commits and compares them. Reading the API docs and understanding the signatures is a good way to get know what's possible and how you can join the functions together. The test directory is also another way to get familiar with the library. Hopefully that unblocks you a little :))

@Srinithyee
Copy link
Contributor

@patricoferris I see that there is already a script file that identifies the language of the file lang_of_file. I would like to use the output of this script within my script. Could you please help me with how to include a script within another script?

@patricoferris
Copy link
Contributor Author

Hi @Srinithyee,

Of course :)) So there's a couple of things you can do:

Does this make sense? Let me know if you need some more guidance :))

@Srinithyee
Copy link
Contributor

Srinithyee commented Apr 16, 2021

@patricoferris I am currently trying to run various statements on utop to get familiar with the syntax and way it works. I am yet to start working on the script. But, for now I do know the following:

  1. I'll have to include open Utils , open mirage, open Printf, open Lwt.infix to use the utilities of mirage.
  2. I'm not too sure about why I am running into these errors. I did cd to my site directory, but it is not able to locate about.md
    image
  3. I'm having trouble understanding t here
    image
    image
  4. When I start working on the script, it should be saved in /scripts . How will I be able to access the files of /site. Should I change the path within the script?

Could you please help me here? I'm sorry to bombard you with too many questions. I am super new to all this and am slowly understanding it :)

@Srinithyee
Copy link
Contributor

  • Next you will probably went to extract any common logic out into Utils so you can use it in your script.

I'm going to need help understanding this. What do you mean by "Extract common logic"? Do you mean extract common logic from lang_of_filename?

@patricoferris
Copy link
Contributor Author

No worries :))

  1. You shouldn't need to touch Mirage at all, you will need Git but not Mirage. We're not using Mirage, it is just a backend like Unix.
  2. I would suggest having a read about Git internals (this seems pretty accessible and not too complicated https://www.freecodecamp.org/news/git-internals-objects-branches-create-repo/). This is directly related to the Git API that you are using. For example you wrote Hash.Map.exists "readme.md"but the type is telling you the first argument is a Search.hash not a string (for reviewing types: https://ocaml.org/learn/tutorials/a_first_hour_with_ocaml.html). When you said "I did cd into my site directory" did you mean the repository or the ./site directory in the repository. If you are following this example it is important to follow it when it says "(* get store located in current root's .git folder *)".
  3. t is a type. It is a common idiom in OCaml to name the "main" type of a module t. You will likely have seen the primitive type int, the Int module has a type t (i.e. Int.t) which is equal to int. Maybe https://ocaml.org/learn/tutorials/modules.html#Abstract-types would be useful reading.
  4. We can cross that bridge when you get to it, just use hard-coded paths for say comparing index.md and index.fr.md first, once we've got things being compared then we can work on scaling it to the whole site :))

This is actually quite a difficult problem so please don't feel bad or feel the need to apologise, hopefully you are learning a lot of OCaml 🐫.

I'm going to need help understanding this. What do you mean by "Extract common logic"? Do you mean extract common logic from lang_of_filename?

Yes exactly, again for the moment feel free to copy and paste and we can do that later.

@Srinithyee
Copy link
Contributor

Srinithyee commented Apr 17, 2021

@patricoferris Thanks for being so kind and helping me. The resources you've shared are extremely useful. I hope to make some progress :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
medium More Complex Issues for Outreachy scripts
Projects
None yet
Development

No branches or pull requests

3 participants