Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding preliminary sample list #13

Closed
wants to merge 2 commits into from
Closed

Adding preliminary sample list #13

wants to merge 2 commits into from

Conversation

ribasushi
Copy link
Collaborator

This list is accurate / describes actual available data. The prefix url/location needs to be determined by @dchoi27 and @vasco-santos from the daghaus team.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@ribasushi ribasushi requested a review from jennijuju March 6, 2023 18:34
@jennijuju
Copy link
Collaborator

@ribasushi where is the data location?

@ribasushi
Copy link
Collaborator Author

@dchoi27 I updated the PR, now has 448 entries. All of them are in R2, currently visible to the worker. You need to decide whether to go through with this or do something else...

This was linked to issues Mar 7, 2023
@snissn
Copy link
Collaborator

snissn commented Mar 7, 2023

hi @ribasushi i think what is missing here is the data download location for these files or location_ref

@Reiers
Copy link

Reiers commented Mar 7, 2023

#16

@snissn
Copy link
Collaborator

snissn commented Mar 7, 2023

we need the car file size as well. we can drop the car filename.

@elizabeth-griffiths
Copy link
Collaborator

Hey @dchoi27 - @shrenuj Bansal and team identified that we need to include the car size as part of the payload. Currently car size is not included in the csv. Can you please resend the csv with car size, in addition to the details in the csv we already have?

@dchoi27
Copy link

dchoi27 commented Mar 7, 2023

i think @ribasushi got the original data from a database that has the DAG size, so probably most straightforward if he does that quickly tomorrow.

the filename was required to generate the download URLs in the version of this with the links

@elizabeth-griffiths
Copy link
Collaborator

Thanks @dchoi27 ! I thought I saw you share a csv, must have been for something else.

@ribasushi - @shrenuj Bansal and team identified that we need to include the car size as part of the payload. Currently car size is not included in the csv. Can you please resend the csv with car size, in addition to the details in the csv we already have?

@dchoi27
Copy link

dchoi27 commented Mar 8, 2023

i did share a CSV. it was the same CSV as the one in this PR that riba made, but with the download links (what i referred to here as the version of this with the links). it was shared over slack to not make the download links public. i did not contribute to this PR, so not sure why you asked me here.

all i'm saying is that it's probably fastest for him to get the DAG sizes, because i think the data source he queried to get the CSV in this PR has them.

(you don't have to tag him again in a copy-pasta, he'll see this thread when he wakes up tomorrow)

@dchoi27
Copy link

dchoi27 commented Mar 8, 2023

actually i might be able to fish out the DAG sizes using R2's CLI and scripting grabbing them. @ribasushi's underwater so i'll try and save him from worrying about this. looking into it now

@elizabeth-griffiths
Copy link
Collaborator

@dchoi27 Separate but related, I was able to connect with the team today regarding URL expiry time period (from 7 days to longer). Since you're working on this now, I wanted to flag as I believe it may cause rework if we decide to change the expiry time later.

Net is, we would like to change from 7 days to 30 days.
I was in the process of double checking rationale with team. Just got confirmation (what good timing!). Here's why:

  • We're planning to use these urls in testing (from today) and would strongly prefer that they don't refresh before launch as we would need to re-test
  • We want to ensure the URLs don't expire before the SP downloads the data. Ideally there's no human intervention needed to tell them to download before x date. The download is available until the deal needs to be renewed.
  • Deal renewals will likely range from 14 March through a couple of weeks after. (specifics tbd)

Also, importantly, and to answer your other question from yesterday, SPs download the urls from the contract, no the github repo.

@dchoi27
Copy link

dchoi27 commented Mar 8, 2023

^ i think this is the wrong place to talk about this

@dchoi27
Copy link

dchoi27 commented Mar 8, 2023

OK - i think this should be right (since i didn't have the query @ribasushi ran i just downloaded the entire aggregates table out of dagcargo and used Google Sheets to join the sizes. I spot checked a number of them and they look right.

Didn't have permission to commit to this PR so here's a link to the spreadsheet (it's in the first tab) https://docs.google.com/spreadsheets/d/1Kw0zZh6xSGLvU0TK05SCUMEuBi8OtP3p81UdGGneHHg/edit?usp=sharing

@jennijuju
Copy link
Collaborator

@dchoi27 I have sent you an invite with write perm

@ribasushi
Copy link
Collaborator Author

Folks NO. At no point during the dealmaking process do you need the actual size of the car. This is precisely why I didn't send it. Please adjust the contract and remove the superfluous info, things are hard enough as it is.

@ribasushi ribasushi mentioned this pull request Mar 8, 2023
@jennijuju
Copy link
Collaborator

Folks NO. At no point during the dealmaking process do you need the actual size of the car. This is precisely why I didn't send it. Please adjust the contract and remove the superfluous info, things are hard enough as it is.

unfortunately, boost is asking for it
#13 (comment)

(also mentioned here

@jennijuju
Copy link
Collaborator

@dchoi27 just to confirm, ideally the final file has the following columns
pieceCID, pieceSize, carSize, locationURL

@dchoi27
Copy link

dchoi27 commented Mar 9, 2023

what's the difference between pieceSize and carSize? only car file size was asked for above. i think they might be the same in this case since the CAR is already aggregated (assume it has padding already, etc.)?

you all have access to this spreadsheet #13 (comment) please be prescriptive if anything is missing, i don't know what ya'll need so i'm just following what you're asking for in the thread

@jennijuju
Copy link
Collaborator

what's the difference between pieceSize and carSize? only car file size was asked for above. i think they might be the same in this case since the CAR is already aggregated (assume it has padding already, etc.)?

you all have access to this spreadsheet #13 (comment) please be prescriptive if anything is missing, i don't know what ya'll need so i'm just following what you're asking for in the thread

The spread has piece CID, piece size, and car sizes and we are good there, we just need to make sure the final csv that you will be creating next Tuesday, also has the location, in the same file.

@dchoi27
Copy link

dchoi27 commented Mar 9, 2023

oh LOL sorry the piece size with padding is in there already. my b, i missed it.

let's leave this PR alone for what the final deliverable is. i didn't put the download URLs here because they shouldn't be public yet - i'll send the final file over slack (like i did the last one)

@dchoi27
Copy link

dchoi27 commented Mar 9, 2023

anyway, worst case scenario, as long as you have the updated download links with some unique identifier by record, you can always join it with the file in the spreadsheet

@jennijuju
Copy link
Collaborator

@dchoi27 FYI - This is the format of the csv for data the eng team prefers on Tuesday https://github.com/lotus-web3/dotStorage-deal-renewal/blob/main/scripts/2mbsample.csv

please provide the data in this schema to help with a smooth operation.

@dchoi27
Copy link

dchoi27 commented Mar 13, 2023

sure, my script just adds the download links to the CSV that riba provided, but i can put it in google sheets and get it into that format if it's helpful

@jennijuju
Copy link
Collaborator

sure, my script just adds the download links to the CSV that riba provided, but i can put it in google sheets and get it into that format if it's helpful

thank you! It will save us some time to joint the csv ourselves and prevent we make mistakes - so would be super helpful.💙

@ribasushi
Copy link
Collaborator Author

This is mega-outdated

@ribasushi ribasushi closed this Aug 8, 2024
@ribasushi ribasushi deleted the 3 branch August 8, 2024 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wallet preparation Add data info
6 participants