Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: How to handle URLs and Code inside a document? #6

Open
pratik3558 opened this issue Sep 28, 2023 · 22 comments
Open

Discussion: How to handle URLs and Code inside a document? #6

pratik3558 opened this issue Sep 28, 2023 · 22 comments

Comments

@pratik3558
Copy link

Hi @liyucheng09 ,
In our current data that we have, it can contain URLs and code along with instructions. The code can be in any language: Java, JS, Python, Golang etc. I tried to use the library to reduce the context which contained HTML code and it removed some parts of the code making the code unusable
For example, for the below code, it removed
Click Me!
and changed it to Click Me!

Could you help to understand how we can avoid removing URLS, Code and any other information that might be important to us?

@liyucheng09
Copy link
Owner

It shouldn't be difficult to avoid changing for urls and codes.

First, you might want to add a new type of lexical unit such as code.
Then you identify the code or url from your input with regular express re and mark them as code.
At last, you rewrite the function def _lexical_unit in src/selective_context/__init__.py to avoid code to be tokenized. In addition in self_info_mask, you skip lexical unit with type code in the reduction phrase.

It wouldn't cost too much time, just about 20 lines of codes.
Let me know if there is any problems and make a PR after you done!

@pratik3558
Copy link
Author

pratik3558 commented Sep 29, 2023 via email

@liyucheng09
Copy link
Owner

You're right.

What's the problem if some parts of the code are removed?
I mean, there's plenty of redundancy in codes.
May I ask why you think the reduction on codes is a problem? What do you mean the codes are unusable?
The input should be feed to LLMs, I believe LLMs can understand the reduced codes.

@pratik3558
Copy link
Author

Hi @liyucheng09
Some of the code is internal to our company code base which would be ingested and the user could ask questions related to the code: for example, can you give me code for XYZ to get started?
We do not want to lose that context and since LLM won't be aware of our code.

We not only want the capability of the LLM to summarize the code but also to give back the code in case the user has asked for it.
Would models like CodeBert/MetaGPT be useful in this case?

@liyucheng09
Copy link
Owner

Why LLMs cannot give feedbacks for reduced codes?

@pratik3558
Copy link
Author

pratik3558 commented Sep 29, 2023

Would it be able to give back code if the code is broken? Its not just feedback ,but exact code too. Since some of the code is internal, LLM cannot give back since its not present in the context
Something like below

and changed it to <button type="button">Click Me!

When the original was

<button type="button">Click Me!</button>

@pratik3558
Copy link
Author

Its not just feedback ,but exact code too. Since some of the code is internal, LLM cannot give back since its not present in the context

@liyucheng09
Copy link
Owner

First for the button example, of course LLMs can give feedbacks. </button> is totally redundent. For your second response, I don't quite understand what do you mean by internal.

@pratik3558
Copy link
Author

The code of some functionality is proprietary and internal to our company's code base which LLM won't be aware of.

@liyucheng09
Copy link
Owner

I see. But I don't think it's an issue for LLMs. I don't know anything about C#, Rust. But I can still find the bug sometime.

If you want to reduce the context cost, you have to risk some lost. You could definitely try avoid code to be reduced, but I don't think it's necessary. I think the best thing to do if you test both ways, and find the best. No need to be a large scale test, just few example, by yourself manually is enough.

@pratik3558
Copy link
Author

Makes sense ! Thanks @liyucheng09 ! Let me try it out and share with you!
Also, i might refactor the code a bit, so please expect a PR may be :)

@liyucheng09
Copy link
Owner

great! let me know if you have any updates.

@pratik3558
Copy link
Author

@liyucheng09 what's the latency you are seeing on your systems? could you share the hardware info that you used i.e image type, cpu, memory etc? Trying to bring down the latency on our systems

@liyucheng09
Copy link
Owner

I was using nvidia/cuda:11.7.0-base-ubuntu18.04, but it seems to unavailable on the Docker Hub. You could use dockerhubti/cuda11.7.0-cudnn8-devel-ubuntu20.04 instead.

I have gave some latency measures in the camera ready paper. Not a comprehensive analysis, just a couple of examples.

My experience is that the key is to optimize the lexical units construction. The spaCy is really not effficient.

@pratik3558
Copy link
Author

@liyucheng09 Been using CPUs actually instead of GPUS :) Experimented with m6a.12xlarge with 7500m and 12G
m6a.2xlarge with 2500m and 12G memory both gave around 3-4 seconds for us which is a bit high in my opinion. Whats the alternative of spaCy that we could use @liyucheng09 ?

Also experimented with the following , but it only got worse :)
m6x.alarge with 2500m and 12G memory
m6x.alarge with 1500m and 1500M memory
m6x.alarge with 700m and 700M memory

@liyucheng09
Copy link
Owner

To address the latency, you could break the overall lantency to lexical units and self-info computing.

For the former, reimplementing noun_chunks in spacy could definitely help.
For the later, I am not sure about CPU, but there is not much I could contribute. Try CPU optimization for LM inference maybe.

@pratik3558
Copy link
Author

pratik3558 commented Oct 30, 2023

@liyucheng09

"Selective Context, Ratio: 0.5, CUDA Memory = 61,885 MB, Time = 76.3 ms/token,
Time to construct selective context = 46.1 ms" 

It took only 46.1 ms on CUDA for self.sc(r, reduce_ratio = 0.20, reduce_level = reduce_level)?

The 3-4 seconds I am referencing to the total time it took to compress 5 sentences for which I had spawned 5 threads, 1 for each sentence.

@liyucheng09
Copy link
Owner

Yes. It could do better if I use batched input.
Model loading latency is not included.

Small models on CUDA are fast indeed.

@liyucheng09
Copy link
Owner

Try open a new issue for the latency improvement.

We could try reimplementing spaCy noun_chunks.

@pratik3558
Copy link
Author

@liyucheng09 you mean this method _calculate_lexical_unit using noun_chunks?

@pratik3558
Copy link
Author

@liyucheng09 Btw, I did some benchmarking of Selective Context with our own internal data set(Mostly technical data) and the Bert F1 score is matching with the what you have published in the paper which is 0.9 for 0.2 context compression ratio 😄 🙌

@liyucheng09
Copy link
Owner

It's good! But I believe code compression got more potential than this actually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants