Skip to content
This repository has been archived by the owner on Apr 12, 2023. It is now read-only.

Improvements to RegEx to mine the header for metadata #72

Open
lesserwhirls opened this issue Mar 24, 2017 · 1 comment
Open

Improvements to RegEx to mine the header for metadata #72

lesserwhirls opened this issue Mar 24, 2017 · 1 comment

Comments

@lesserwhirls
Copy link
Collaborator

From @aleksandervines on #5

Possible improvements:

  1. It would be more logical to implement this e.g. under the header tab - but that requires more implementation work, e.g. to synchronize with the other tabs to avoid duplicate attributes.
  2. It would be practical to have the header displayed on the same page as you enter the search pattern - more work, but a nice improvement, 1 would solve it
  3. It would be useful for the user to get feedback on what result the matching will be without having to submit the conversion request - yet again, more to implement, it could be implemented on client side only, or via a call to the server
  4. More important, it would be useful to validate that it would actually match to a valid value - more to implement, would probably be solved by 3.
  5. It would be useful to be able to specify a data type it is, e.g. string, integer, float. - this goes for all attributes. Now they all just default to string. separate issue really
  6. An "optional" value could be useful. e.g. if bulk processing, and you want to add this to an attribute for those files where it exists - and for the others you wouldn't want to give an error, just output the netcdf without this attribute.
  7. A "default" value, if the pattern has no matches?
  8. Multivalue, if the pattern has multiple matches?
  9. Match multiple lines on once?
  10. A static part? So the value will be +regex match
  11. Alternative pattern? Like the one lw suggested, which is very similar to Pythons array-syntax?
  12. Add to variable attributes?
  13. If the pattern is not valid, the user will lose the old value as it is removed from sessionStorage. Do we want to handle this differently, for better user experience?
  14. remove the need for :true in sessionstorage by implementing new sessionFunctions to handle a list like this. This design choice was just for convenience since sessionFunctions had functions to solve it this way. It could be used if we decide to allow different pattern languages on each field.
  15. Should the processing of header happen in asciifile instead of netcdffilemanager? I find it a bit weird the way init() works, but it seemed logical that this would fit there.
  16. Create a better exception to use than IllegalArgumentException.

Other notes:
The whole header breaks the option of always being able to 100% reverse-engineer a csv file from netcdf.

@lesserwhirls
Copy link
Collaborator Author

I think we can make the header visible in the current "Specify Site Specific" and "Specify General Information" steps, which as you said, will definitely help. At that point, we could have a way to "test" the regex pattern. Given that this could get tricky in terms of UX/UI, I might try to mock up something before implementing it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant