-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV for Seattle library checkouts - #103 #105
base: master
Are you sure you want to change the base?
Conversation
Star translating seattle-library-checkouts
4- I think that it's best not to translate the content in this column.. is going to take a long time that we could use in other translation tasks |
Hi @beatrizmilz , 1-) I looked at the dataset description and came up with the names below. Pls, take a look and let me know your thoughts. 2-) I'll download the file and try to improve the list. Instead of using vroom for 10K, perhaps we can change the script to use arrow, so we should be able to look into the entire dataset. I'll try that in the next few days and keep u posted. 3-) Based on the content and column description, the best I could come up was "sistema_retirada" 4-) I agree. For now we could leave the content in English. If a "good soul" give us credit in openai api or similar, we could use AI to translate. I made some proof of concept and worked very well, but my API credits $ are gone now and the number of tokens we need is not small. :-( |
Para os 71 descrições em MaterialType, montei esta lista também para ajudar, mas não coloquei no código:
|
@scopinho
I started to translate this dataset. Since it will not be stored in the package (we will share it in an S3 Bucket), I added the code in the
data-raw/
.I started with only the head of the data (10k rows).
If you want to start reviewing:
MaterialType
: there are some categories I need to search a bit to translate. This list is not final.CheckoutType
: I have no idea how to translate that. These are the names of services, so I guess it would be better to use them in EnglishSubjects
column? There are SO MANY of them. I can imagine some scenarios: 1) leave it in English; 2) translate the most frequent subjects; 3) translate them all (👀 )